Re: [PATCH v2 1/3] powerpc/fadump: make is_fadump_active() visible for exporting vmcore

2023-09-06 Thread Baoquan He
On 09/06/23 at 12:06am, Hari Bathini wrote:
> Include asm/fadump.h in asm/kexec.h to make it visible while exporting
> vmcore. Also, update is_fadump_active() to return boolean instead of
> integer for better readability. The change will be used in the next
> patch to ensure vmcore is exported when fadump is active.
> 
> Signed-off-by: Hari Bathini 

Thanks, Hari. The whole series looks good to me.

Acked-by: Baoquan He 

Since it's a power specific change, should be picked into powerpc tree?

Thanks
Baoquan

> ---
> 
> Changes in v2:
> * New patch based on Baoquan's suggestion to use is_fadump_active()
>   instead of introducing new function is_crashdump_kernel().
> 
> 
>  arch/powerpc/include/asm/fadump.h | 4 ++--
>  arch/powerpc/include/asm/kexec.h  | 8 ++--
>  arch/powerpc/kernel/fadump.c  | 4 ++--
>  3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/fadump.h 
> b/arch/powerpc/include/asm/fadump.h
> index 526a6a647312..27b74a7e2162 100644
> --- a/arch/powerpc/include/asm/fadump.h
> +++ b/arch/powerpc/include/asm/fadump.h
> @@ -15,13 +15,13 @@ extern int crashing_cpu;
>  
>  extern int is_fadump_memory_area(u64 addr, ulong size);
>  extern int setup_fadump(void);
> -extern int is_fadump_active(void);
> +extern bool is_fadump_active(void);
>  extern int should_fadump_crash(void);
>  extern void crash_fadump(struct pt_regs *, const char *);
>  extern void fadump_cleanup(void);
>  
>  #else/* CONFIG_FA_DUMP */
> -static inline int is_fadump_active(void) { return 0; }
> +static inline bool is_fadump_active(void) { return false; }
>  static inline int should_fadump_crash(void) { return 0; }
>  static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
>  static inline void fadump_cleanup(void) { }
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index a1ddba01e7d1..b760ef459234 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -51,6 +51,7 @@
>  
>  #ifndef __ASSEMBLY__
>  #include 
> +#include 
>  
>  typedef void (*crash_shutdown_t)(void);
>  
> @@ -99,10 +100,13 @@ void relocate_new_kernel(unsigned long indirection_page, 
> unsigned long reboot_co
>  
>  void kexec_copy_flush(struct kimage *image);
>  
> -#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
> +#if defined(CONFIG_CRASH_DUMP)
> +#define is_fadump_active is_fadump_active
> +#if defined(CONFIG_PPC_RTAS)
>  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
>  #define crash_free_reserved_phys_range crash_free_reserved_phys_range
> -#endif
> +#endif /* CONFIG_PPC_RTAS */
> +#endif /* CONFIG_CRASH_DUMP */
>  
>  #ifdef CONFIG_KEXEC_FILE
>  extern const struct kexec_file_ops kexec_elf64_ops;
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 3ff2da7b120b..5682a65e8326 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -187,9 +187,9 @@ int should_fadump_crash(void)
>   return 1;
>  }
>  
> -int is_fadump_active(void)
> +bool is_fadump_active(void)
>  {
> - return fw_dump.dump_active;
> + return !!fw_dump.dump_active;
>  }
>  
>  /*
> -- 
> 2.41.0
> 



Re: [RFC PATCH v11 13/29] KVM: Add transparent hugepage support for dedicated guest memory

2023-09-06 Thread Paolo Bonzini
On Fri, Jul 21, 2023 at 7:13 PM Sean Christopherson  wrote:
> On Fri, Jul 21, 2023, Paolo Bonzini wrote:
> > On 7/19/23 01:44, Sean Christopherson wrote:
> > > @@ -413,6 +454,9 @@ int kvm_gmem_create(struct kvm *kvm, struct 
> > > kvm_create_guest_memfd *args)
> > > u64 flags = args->flags;
> > > u64 valid_flags = 0;
> > > +   if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> > > +   valid_flags |= KVM_GUEST_MEMFD_ALLOW_HUGEPAGE;
> > > +
> >
> > I think it should be always allowed.  The outcome would just be "never have
> > a hugepage" if thp is not enabled in the kernel.
>
> I don't have a strong preference.  My thinking was that userspace would 
> probably
> rather have an explicit error, as opposed to silently running with a 
> misconfigured
> setup.

Considering that is how madvise(MADV_HUGEPAGE) behaves, your patch is
good. I disagree but consistency is better.

Paolo



Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET

2023-09-06 Thread Sean Christopherson
On Wed, Sep 06, 2023, David Stevens wrote:
> On Wed, Sep 6, 2023 at 9:45 AM Sean Christopherson  wrote:
> >
> > On Tue, Sep 05, 2023, David Stevens wrote:
> > > For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier
> > > is set, then we're all good here. If guarded_by_mmu_notifier is not
> > > set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is
> > > set. For struct page memory, we're safe because KVM will hold a
> > > reference as long as it's still using the page. For non struct page
> > > memory, we're not safe - this is where the breaking change of
> > > allow_unsafe_mappings would go. Note that for non-refcounted struct
> > > page, we can't use the allow_unsafe_mappings escape hatch. Since
> > > FOLL_GET was requested, if we returned such a page, then the caller
> > > would eventually corrupt the page refcount via kvm_release_pfn.
> >
> > Yes we can.  The caller simply needs to be made aware of 
> > is_refcounted_page.   I
> > didn't include that in the snippet below because I didn't want to write the 
> > entire
> > patch.  The whole point of adding is_refcounted_page is so that callers can
> > identify exactly what type of page was at the end of the trail that was 
> > followed.
> 
> Are you asking me to completely migrate every caller of any gfn_to_pfn
> variant to __kvm_follow_pfn, so that they can respect
> is_refcounted_page? That's the only way to make it safe for
> allow_unsafe_mappings to apply to non-refcounted pages. That is
> decidedly not simple. Or is kvm_vcpu_map the specific call site you
> care about? At best, I can try to migrate x86, and then just add some
> sort of compatibility shim for other architectures that rejects
> non-refcounted pages.

Ah, I see your conundrum.  No, I don't think it's reasonable to require you to
convert all users in every architecture.  I'll still ask, just in case you're
feeling generous, but it's not a requirement :-)

The easiest way forward I can think of is to add yet another flag to 
kvm_follow_pfn,
e.g. allow_non_refcounted_struct_page, to communicate whether or not the caller
has been enlightened to play nice with non-refcounted struct page memory.  We'll
need that flag no matter what, otherwise we'd have to convert all users in a 
single
patch (LOL).  Then your series can simply stop at a reasonable point, e.g. 
convert
all x86 usage (including kvm_vcpu_map(), and leave converting everything else to
future work.

E.g. I think this would be the outro of hva_to_pfn_remapped():

if (!page)
goto out;

if (get_page_unless_zero(page))
WARN_ON_ONCE(kvm_follow_refcounted_pfn(foll, page) != pfn);
 out:
pte_unmap_unlock(ptep, ptl);

/*
 * TODO: Drop allow_non_refcounted_struct_page once all callers have
 * been taught to play nice with non-refcounted tail pages.
 */
if (page && !foll->is_refcounted_page &&
!foll->allow_non_refcounted_struct_page)
r = -EFAULT
else if (!foll->is_refcounted_page && !foll->guarded_by_mmu_notifier &&
 !allow_unsafe_mappings)
r = -EFAULT;
else
   *p_pfn = pfn;

return r;

> > > Property 3 would be nice, but we've already concluded that guarding
> > > all translations with mmu notifiers is infeasible. So maintaining
> > > property 2 is the best we can hope for.
> >
> > No, #3 is just a variant of #2.  Unless you're talking about not making 
> > guarantees
> > about guest accesses being ordered with respect to VMA/memslot updates, but 
> > I
> > don't think that's the case.
> 
> I'm talking about the fact that kvm_vcpu_map is busted with respect to
> updates to VMA updates. It won't corrupt host memory because the
> mapping keeps a reference to the page, but it will continue to use
> stale translations.

True.  But barring some crazy paravirt use case, userspace modifying a mapping
that is in active use is inherently broken, the guest will have no idea that 
memory
just got yanked away.

Hmm, though I suppose userspace could theoretically mprotect() a mapping to be
read-only, which would "work" for mmu_notifier paths but not kvm_vcpu_map().  
But
KVM doesn't provide enough information on -EFAULT for userspace to do anything 
in
response to a write to read-only memory, so in practice that's likely inherently
broken too.

> From [1], it sounds like you've granted that fixing that is not feasible, so
> I just wanted to make sure that this isn't the "unsafe" referred to by
> allow_unsafe_mappings.

Right, this is not the "unsafe" I'm referring to.

> [1] https://lore.kernel.org/all/zbeeqtmtnpaeq...@google.com/


Re: [PATCH 3/8] arch/x86: Remove sentinel elem from ctl_table arrays

2023-09-06 Thread Ingo Molnar


* Dave Hansen  wrote:

> On 9/6/23 03:03, Joel Granados via B4 Relay wrote:
> > This commit comes at the tail end of a greater effort to remove the
> > empty elements at the end of the ctl_table arrays (sentinels) which
> > will reduce the overall build time size of the kernel and run time
> > memory bloat by ~64 bytes per sentinel (further information Link :
> > https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)
> > 
> > Remove sentinel element from sld_sysctl and itmt_kern_table.
> 
> There's a *LOT* of content to read for a reviewer to figure out what's
> going on here between all the links.  I would have appreciated one more
> sentence here, maybe:
> 
>   This is now safe because the sysctl registration code
>   (register_sysctl()) implicitly uses ARRAY_SIZE() in addition
>   to checking for a sentinel.
> 
> That needs to be more prominent _somewhere_.  Maybe here, or maybe in
> the cover letter, but _somewhere_.
> 
> That said, feel free to add this to the two x86 patches:
> 
> Acked-by: Dave Hansen  # for x86

Absolutely needs to be in the title as well, something like:

   arch/x86: Remove now superfluous sentinel elem from ctl_table arrays

With that propagated into the whole series:

   Reviewed-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v2 2/5] fbdev: Replace fb_pgprotect() with fb_pgprot_device()

2023-09-06 Thread Arnd Bergmann
On Wed, Sep 6, 2023, at 10:35, Thomas Zimmermann wrote:
> Rename the fbdev mmap helper fb_pgprotect() to fb_pgprot_device().
> The helper sets VMA page-access flags for framebuffers in device I/O
> memory. The new name follows pgprot_device(), which does the same for
> arbitrary devices.
>
> Also clean up the helper's parameters and return value. Instead of
> the VMA instance, pass the individial parameters separately: existing
> page-access flags, the VMAs start and end addresses and the offset
> in the underlying device memory rsp file. Return the new page-access
> flags. These changes align fb_pgprot_device() closer with pgprot_device.
>
> Signed-off-by: Thomas Zimmermann 

This makes sense as a cleanup, but I'm not sure the new naming is helpful.

The 'pgprot_device' permissions are based on Arm's memory attributes,
which have slightly different behavior for "device", "uncached" and
"writecombine" mappings. I think simply calling this one pgprot_fb()
or fb_pgprot() would be less confusing, since depending on the architecture
it appears to give either uncached or writecombine mappings but not
"device" on the architectures where this is different.

  Arnd


Re: [PATCH RFC] powerpc/rtas: Make it possible to disable sys_rtas

2023-09-06 Thread Nathan Lynch
Michal Suchanek  writes:

> Additional patch suggestion to go with the rtas devices:
>
> ---
>
> With most important rtas functions available through different
> interfaces the sys_rtas interface can be disabled completely.
>
> Do not remove it for now to make it possible to run older versions of
> userspace tools that don't support other interfaces.

Thanks. I hope making sys_rtas on/off-configurable will make sense
eventually, and I expect this series to get us closer to that. But to me
it seems too early and too coarse. A kernel built with RTAS_SYSCALL=n is
not something I'd want to support or run in production soon. It would
break too many known use cases, and likely some unknown ones as well.

It could be more useful in the near term to construct a configurable
list of RTAS functions that sys_rtas is allowed to expose.

Something like:

if PPC_RTAS

config RTAS_SYSCALL_ALLOWS_SET_INDICATOR
bool "sys_rtas allows calling set-indicator"
default y

config RTAS_SYSCALL_ALLOWS_GET_SENSOR_STATE
bool "sys_rtas allows calling get-sensor-state"
default y

config RTAS_SYSCALL_ALLOWS_GET_VPD
bool "sys_rtas allows calling ibm,get-vpd"
default y

... etc etc

endif

Distro kernels could configure their allowed set of calls according to
the capabilities of the user space components they ship, with the
expectation that they will be able to shrink that set as user space
adopts the preferred ABIs over time.

That's just a sketch of an idea though, and I'm not sure it needs to be
part of this series.


[RFC PATCH] powerpc: Make crashing cpu to be discovered first in kdump kernel.

2023-09-06 Thread Mahesh Salgaonkar
The kernel boot parameter 'nr_cpus=' allows one to specify number of
possible cpus in the system. In the normal scenario the first cpu (cpu0)
that shows up is the boot cpu and hence it gets covered under nr_cpus
limit.

But this assumption is broken in kdump scenario where kdump kernel after a
crash can boot up on an non-zero boot cpu. The paca structure allocation
depends on value of nr_cpus and is indexed using logical cpu ids. The cpu
discovery code brings up the cpus as they appear sequentially on device
tree and assigns logical cpu ids starting from 0. This definitely becomes
an issue if boot cpu id > nr_cpus. When this occurs it results into

In past there were proposals to fix this by making changes to cpu discovery
code to identify non-zero boot cpu and map it to logical cpu 0. However,
the changes were very invasive, making discovery code more complicated and
risky.

Considering that the non-zero boot cpu scenario is more specific to kdump
kernel, limiting the changes in panic/crash kexec path would probably be a
best approach to have.

Hence proposed change is, in crash kexec path, move the crashing cpu's
device node to the first position under '/cpus' node, which will make the
crashing cpu to be discovered as part of the first core in kdump kernel.

In order to accommodate boot cpu for the case where boot_cpuid > nr_cpu_ids,
align up the nr_cpu_ids to SMT threads in early_init_dt_scan_cpus(). This
will allow kdump kernel to work with nr_cpus=X where X will be aligned up
in multiple of SMT threads per core.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/include/asm/kexec.h  |1 
 arch/powerpc/kernel/prom.c|   13 
 arch/powerpc/kexec/core_64.c  |  128 +
 arch/powerpc/kexec/file_load_64.c |2 -
 4 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a1ddba01e7d13..f5a6f4a1b8eb0 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -144,6 +144,7 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage 
*image);
 int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
unsigned long initrd_load_addr,
unsigned long initrd_len, const char *cmdline);
+int add_node_props(void *fdt, int node_offset, const struct device_node *dn);
 #endif /* CONFIG_PPC64 */
 
 #endif /* CONFIG_KEXEC_FILE */
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b1..c2d4f55042d72 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -322,6 +322,9 @@ static void __init check_cpu_feature_properties(unsigned 
long node)
}
 }
 
+/* align addr on a size boundary - adjust address up */
+#define _ALIGN_UP(addr, size)   
(((addr)+((size)-1))&(~((typeof(addr))(size)-1)))
+
 static int __init early_init_dt_scan_cpus(unsigned long node,
  const char *uname, int depth,
  void *data)
@@ -348,6 +351,16 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
 
nthreads = len / sizeof(int);
 
+   /*
+* Align nr_cpu_ids to correct SMT value. This will help us to allocate
+* pacas correctly to accomodate boot_cpu != 0 scenario e.g. in kdump
+* kernel the boot cpu can be any cpu between 0 through nthreads.
+*/
+   if (nr_cpu_ids % nthreads) {
+   nr_cpu_ids = _ALIGN_UP(nr_cpu_ids, nthreads);
+   pr_info("Aligned nr_cpus to SMT=%d, nr_cpu_ids = %d\n", 
nthreads, nr_cpu_ids);
+   }
+
/*
 * Now see if any of these threads match our boot cpu.
 * NOTE: This must match the parsing done in smp_setup_cpu_maps.
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index a79e28c91e2be..168bef43e22c2 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -298,6 +299,119 @@ extern void kexec_sequence(void *newstack, unsigned long 
start,
   void (*clear_all)(void),
   bool copy_with_mmu_off) __noreturn;
 
+/*
+ * Move the crashing cpus FDT node as the first node under '/cpus' node.
+ *
+ * - Get the FDT segment from the crash image segments.
+ * - Locate the crashing CPUs fdt subnode 'A' under '/cpus' node.
+ * - Now locate the crashing cpu device node 'B' from of_root device tree.
+ * - Delete the crashing cpu FDT node 'A' from kexec FDT segment.
+ * - Insert the crashing cpu device node 'B' into kexec FDT segment as first
+ *   subnode under '/cpus' node.
+ */
+static void move_crashing_cpu(struct kimage *image)
+{
+   void *fdt, *ptr;
+   const char *pathp = NULL;
+   unsigned long mem;
+   const __be32 *intserv;
+   struct device_node *dn;
+   bool first_node = true;
+   

[PATCH v2 3/5] arch/powerpc: Remove trailing whitespaces

2023-09-06 Thread Thomas Zimmermann
Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/machdep.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 4f6e7d7ee388..933465ed4c43 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -10,7 +10,7 @@
 #include 
 
 struct pt_regs;
-struct pci_bus;
+struct pci_bus;
 struct device_node;
 struct iommu_table;
 struct rtc_time;
@@ -78,8 +78,8 @@ struct machdep_calls {
unsigned char   (*nvram_read_val)(int addr);
void(*nvram_write_val)(int addr, unsigned char val);
ssize_t (*nvram_write)(char *buf, size_t count, loff_t *index);
-   ssize_t (*nvram_read)(char *buf, size_t count, loff_t *index);  
-   ssize_t (*nvram_size)(void);
+   ssize_t (*nvram_read)(char *buf, size_t count, loff_t *index);
+   ssize_t (*nvram_size)(void);
void(*nvram_sync)(void);
 
/* Exception handlers */
@@ -102,9 +102,9 @@ struct machdep_calls {
 */
long(*feature_call)(unsigned int feature, ...);
 
-   /* Get legacy PCI/IDE interrupt mapping */ 
+   /* Get legacy PCI/IDE interrupt mapping */
int (*pci_get_legacy_ide_irq)(struct pci_dev *dev, int 
channel);
-   
+
/* Get access protection for /dev/mem */
pgprot_t(*phys_mem_access_prot)(struct file *file,
unsigned long pfn,
-- 
2.42.0



[PATCH v2 2/5] fbdev: Replace fb_pgprotect() with fb_pgprot_device()

2023-09-06 Thread Thomas Zimmermann
Rename the fbdev mmap helper fb_pgprotect() to fb_pgprot_device().
The helper sets VMA page-access flags for framebuffers in device I/O
memory. The new name follows pgprot_device(), which does the same for
arbitrary devices.

Also clean up the helper's parameters and return value. Instead of
the VMA instance, pass the individial parameters separately: existing
page-access flags, the VMAs start and end addresses and the offset
in the underlying device memory rsp file. Return the new page-access
flags. These changes align fb_pgprot_device() closer with pgprot_device.

Signed-off-by: Thomas Zimmermann 
---
 arch/ia64/include/asm/fb.h   | 15 +++
 arch/m68k/include/asm/fb.h   | 19 ++-
 arch/mips/include/asm/fb.h   | 11 +--
 arch/powerpc/include/asm/fb.h| 13 +
 arch/sparc/include/asm/fb.h  | 15 +--
 arch/x86/include/asm/fb.h| 10 ++
 arch/x86/video/fbdev.c   | 15 ---
 drivers/video/fbdev/core/fb_chrdev.c |  3 ++-
 include/asm-generic/fb.h | 12 ++--
 9 files changed, 58 insertions(+), 55 deletions(-)

diff --git a/arch/ia64/include/asm/fb.h b/arch/ia64/include/asm/fb.h
index 1717b26fd423..2fbad4a9fc15 100644
--- a/arch/ia64/include/asm/fb.h
+++ b/arch/ia64/include/asm/fb.h
@@ -8,17 +8,16 @@
 
 #include 
 
-struct file;
-
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_t fb_pgprot_device(pgprot_t prot,
+   unsigned long vm_start, unsigned long 
vm_end,
+   unsigned long offset)
 {
-   if (efi_range_is_wc(vma->vm_start, vma->vm_end - vma->vm_start))
-   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   if (efi_range_is_wc(vm_start, vm_end - vm_start))
+   return pgprot_writecombine(prot);
else
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   return pgprot_noncached(prot);
 }
-#define fb_pgprotect fb_pgprotect
+#define fb_pgprot_device fb_pgprot_device
 
 static inline void fb_memcpy_fromio(void *to, const volatile void __iomem 
*from, size_t n)
 {
diff --git a/arch/m68k/include/asm/fb.h b/arch/m68k/include/asm/fb.h
index 24273fc7ad91..4acdf5b62871 100644
--- a/arch/m68k/include/asm/fb.h
+++ b/arch/m68k/include/asm/fb.h
@@ -5,26 +5,27 @@
 #include 
 #include 
 
-struct file;
-
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_t fb_pgprot_device(pgprot_t prot,
+   unsigned long vm_start, unsigned long 
vm_end,
+   unsigned long offset)
 {
 #ifdef CONFIG_MMU
 #ifdef CONFIG_SUN3
-   pgprot_val(vma->vm_page_prot) |= SUN3_PAGE_NOCACHE;
+   pgprot_val(prot) |= SUN3_PAGE_NOCACHE;
 #else
if (CPU_IS_020_OR_030)
-   pgprot_val(vma->vm_page_prot) |= _PAGE_NOCACHE030;
+   pgprot_val(prot) |= _PAGE_NOCACHE030;
if (CPU_IS_040_OR_060) {
-   pgprot_val(vma->vm_page_prot) &= _CACHEMASK040;
+   pgprot_val(prot) &= _CACHEMASK040;
/* Use no-cache mode, serialized */
-   pgprot_val(vma->vm_page_prot) |= _PAGE_NOCACHE_S;
+   pgprot_val(prot) |= _PAGE_NOCACHE_S;
}
 #endif /* CONFIG_SUN3 */
 #endif /* CONFIG_MMU */
+
+   return prot;
 }
-#define fb_pgprotect fb_pgprotect
+#define fb_pgprot_device fb_pgprot_device
 
 #include 
 
diff --git a/arch/mips/include/asm/fb.h b/arch/mips/include/asm/fb.h
index 18b7226403ba..98e63d14a71f 100644
--- a/arch/mips/include/asm/fb.h
+++ b/arch/mips/include/asm/fb.h
@@ -3,14 +3,13 @@
 
 #include 
 
-struct file;
-
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_t fb_pgprot_device(pgprot_t prot,
+   unsigned long vm_start, unsigned long 
vm_end,
+   unsigned long offset)
 {
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   return pgprot_noncached(prot);
 }
-#define fb_pgprotect fb_pgprotect
+#define fb_pgprot_device fb_pgprot_device
 
 /*
  * MIPS doesn't define __raw_ I/O macros, so the helpers
diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h
index 61e3b8806db6..3c7486323178 100644
--- a/arch/powerpc/include/asm/fb.h
+++ b/arch/powerpc/include/asm/fb.h
@@ -2,23 +2,20 @@
 #ifndef _ASM_FB_H_
 #define _ASM_FB_H_
 
-#include 
-
 #include 
 
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_t fb_pgprot_device(pgprot_t prot,
+   

[PATCH v2 1/5] fbdev: Avoid file argument in fb_pgprotect()

2023-09-06 Thread Thomas Zimmermann
Only PowerPC's fb_pgprotect() needs the file argument, although
the implementation does not use it. Pass NULL to the internal
helper in preparation of further updates. A later patch will remove
the file parameter from fb_pgprotect().

While at it, replace the shift operation with PHYS_PFN().

Suggested-by: Christophe Leroy 
Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/fb.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h
index 5f1a2e5f7654..61e3b8806db6 100644
--- a/arch/powerpc/include/asm/fb.h
+++ b/arch/powerpc/include/asm/fb.h
@@ -9,7 +9,12 @@
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
 {
-   vma->vm_page_prot = phys_mem_access_prot(file, off >> PAGE_SHIFT,
+   /*
+* PowerPC's implementation of phys_mem_access_prot() does
+* not use the file argument. Set it to NULL in preparation
+* of later updates to the interface.
+*/
+   vma->vm_page_prot = phys_mem_access_prot(NULL, PHYS_PFN(off),
 vma->vm_end - vma->vm_start,
 vma->vm_page_prot);
 }
-- 
2.42.0



[PATCH v2 0/5] ppc, fbdev: Clean up fbdev mmap helper

2023-09-06 Thread Thomas Zimmermann
Clean up and rename fb_pgprotect() to work without struct file. Then
refactor the implemnetation for PowerPC. This change has been discussed
at [1] in the context of refactoring fbdev's mmap code.

The first two patches update fbdev and replace fbdev's fb_pgprotect()
with fb_pgprot_device() on all architectures. The new helper's stream-
lined interface enables more refactoring within fbdev's mmap
implementation.

Patches 3 to 5 adapt PowerPC's internal interfaces to provide
phys_mem_access_prot() that works without struct file. Neither the
architecture code or fbdev helpers need the parameter.

v2:
* reorder patches to simplify merging (Michael)

[1] 
https://lore.kernel.org/linuxppc-dev/5501ba80-bdb0-6344-16b0-0466a950f...@suse.com/

Thomas Zimmermann (5):
  fbdev: Avoid file argument in fb_pgprotect()
  fbdev: Replace fb_pgprotect() with fb_pgprot_device()
  arch/powerpc: Remove trailing whitespaces
  arch/powerpc: Remove file parameter from phys_mem_access_prot code
  arch/powerpc: Call internal __phys_mem_access_prot() in fbdev code

 arch/ia64/include/asm/fb.h| 15 +++
 arch/m68k/include/asm/fb.h| 19 ++-
 arch/mips/include/asm/fb.h| 11 +--
 arch/powerpc/include/asm/book3s/pgtable.h | 10 --
 arch/powerpc/include/asm/fb.h | 13 +
 arch/powerpc/include/asm/machdep.h| 13 ++---
 arch/powerpc/include/asm/nohash/pgtable.h | 10 --
 arch/powerpc/include/asm/pci.h|  4 +---
 arch/powerpc/kernel/pci-common.c  |  3 +--
 arch/powerpc/mm/mem.c |  8 
 arch/sparc/include/asm/fb.h   | 15 +--
 arch/x86/include/asm/fb.h | 10 ++
 arch/x86/video/fbdev.c| 15 ---
 drivers/video/fbdev/core/fb_chrdev.c  |  3 ++-
 include/asm-generic/fb.h  | 12 ++--
 15 files changed, 86 insertions(+), 75 deletions(-)

-- 
2.42.0



[PATCH v2 5/5] arch/powerpc: Call internal __phys_mem_access_prot() in fbdev code

2023-09-06 Thread Thomas Zimmermann
Call __phys_mem_access_prot() from the fbdev mmap helper
fb_pgprot_device(). Allows to avoid the file argument of
NULL.

Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/fb.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h
index 3c7486323178..8e6a7fc4ae86 100644
--- a/arch/powerpc/include/asm/fb.h
+++ b/arch/powerpc/include/asm/fb.h
@@ -8,12 +8,7 @@ static inline pgprot_t fb_pgprot_device(pgprot_t prot,
unsigned long vm_start, unsigned long 
vm_end,
unsigned long offset)
 {
-   /*
-* PowerPC's implementation of phys_mem_access_prot() does
-* not use the file argument. Set it to NULL in preparation
-* of later updates to the interface.
-*/
-   return phys_mem_access_prot(NULL, PHYS_PFN(offset), vm_end - vm_start, 
prot);
+   return __phys_mem_access_prot(PHYS_PFN(offset), vm_end - vm_start, 
prot);
 }
 #define fb_pgprot_device fb_pgprot_device
 
-- 
2.42.0



[PATCH v2 4/5] arch/powerpc: Remove file parameter from phys_mem_access_prot code

2023-09-06 Thread Thomas Zimmermann
Remove 'file' parameter from struct machdep_calls.phys_mem_access_prot
and its implementation in pci_phys_mem_access_prot(). The file is not
used on PowerPC. By removing it, a later patch can simplify fbdev's
mmap code, which uses phys_mem_access_prot() on PowerPC.

Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/book3s/pgtable.h | 10 --
 arch/powerpc/include/asm/machdep.h|  3 +--
 arch/powerpc/include/asm/nohash/pgtable.h | 10 --
 arch/powerpc/include/asm/pci.h|  4 +---
 arch/powerpc/kernel/pci-common.c  |  3 +--
 arch/powerpc/mm/mem.c |  8 
 6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h 
b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae..84e36a572641 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -20,9 +20,15 @@ extern void set_pte_at(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep,
 extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long 
address,
 pte_t *ptep, pte_t entry, int dirty);
 
+extern pgprot_t __phys_mem_access_prot(unsigned long pfn, unsigned long size,
+  pgprot_t vma_prot);
+
 struct file;
-extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
-unsigned long size, pgprot_t vma_prot);
+static inline pgprot_t phys_mem_access_prot(struct file *file, unsigned long 
pfn,
+   unsigned long size, pgprot_t 
vma_prot)
+{
+   return __phys_mem_access_prot(pfn, size, vma_prot);
+}
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 
 void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, 
pte_t *ptep);
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 933465ed4c43..d31a5ec1550d 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -106,8 +106,7 @@ struct machdep_calls {
int (*pci_get_legacy_ide_irq)(struct pci_dev *dev, int 
channel);
 
/* Get access protection for /dev/mem */
-   pgprot_t(*phys_mem_access_prot)(struct file *file,
-   unsigned long pfn,
+   pgprot_t(*phys_mem_access_prot)(unsigned long pfn,
unsigned long size,
pgprot_t vma_prot);
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index a6cb6f92..90366b0b3ad9 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -246,9 +246,15 @@ extern int ptep_set_access_flags(struct vm_area_struct 
*vma, unsigned long addre
 
 #define pgprot_writecombine pgprot_noncached_wc
 
+extern pgprot_t __phys_mem_access_prot(unsigned long pfn, unsigned long size,
+  pgprot_t vma_prot);
+
 struct file;
-extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
-unsigned long size, pgprot_t vma_prot);
+static inline pgprot_t phys_mem_access_prot(struct file *file, unsigned long 
pfn,
+   unsigned long size, pgprot_t 
vma_prot)
+{
+   return __phys_mem_access_prot(pfn, size, vma_prot);
+}
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 289f1ec85bc5..34ed4d51c546 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -104,9 +104,7 @@ extern void of_scan_pci_bridge(struct pci_dev *dev);
 extern void of_scan_bus(struct device_node *node, struct pci_bus *bus);
 extern void of_rescan_bus(struct device_node *node, struct pci_bus *bus);
 
-struct file;
-extern pgprot_tpci_phys_mem_access_prot(struct file *file,
-unsigned long pfn,
+extern pgprot_tpci_phys_mem_access_prot(unsigned long pfn,
 unsigned long size,
 pgprot_t prot);
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index e88d7c9feeec..73f12a17e572 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -521,8 +521,7 @@ int pci_iobar_pfn(struct pci_dev *pdev, int bar, struct 
vm_area_struct *vma)
  * PCI device, it tries to find the PCI device first and calls the
  * above routine
  */
-pgprot_t pci_phys_mem_access_prot(struct file *file,
- unsigned long pfn,
+pgprot_t pci_phys_mem_access_prot(unsigned long pfn,
  unsigned long size,
  pgprot_t prot)
 {
diff --git 

Re: [PATCH 3/8] arch/x86: Remove sentinel elem from ctl_table arrays

2023-09-06 Thread Dave Hansen
On 9/6/23 03:03, Joel Granados via B4 Relay wrote:
> This commit comes at the tail end of a greater effort to remove the
> empty elements at the end of the ctl_table arrays (sentinels) which
> will reduce the overall build time size of the kernel and run time
> memory bloat by ~64 bytes per sentinel (further information Link :
> https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)
> 
> Remove sentinel element from sld_sysctl and itmt_kern_table.

There's a *LOT* of content to read for a reviewer to figure out what's
going on here between all the links.  I would have appreciated one more
sentence here, maybe:

This is now safe because the sysctl registration code
(register_sysctl()) implicitly uses ARRAY_SIZE() in addition
to checking for a sentinel.

That needs to be more prominent _somewhere_.  Maybe here, or maybe in
the cover letter, but _somewhere_.

That said, feel free to add this to the two x86 patches:

Acked-by: Dave Hansen  # for x86


Re: [PATCH v2 1/3] perf vendor events: Update JSON/events for power10 platform

2023-09-06 Thread Arnaldo Carvalho de Melo
Em Tue, Sep 05, 2023 at 05:10:37PM +0530, Kajol Jain escreveu:
> Update JSON/Events list with data-source events for power10 platform.

Thanks, applied the series.

- Arnaldo

 
> Signed-off-by: Kajol Jain 
> ---
>  .../arch/powerpc/power10/datasource.json  | 1282 +
>  .../arch/powerpc/power10/others.json  |   10 -
>  .../arch/powerpc/power10/translation.json |5 -
>  3 files changed, 1282 insertions(+), 15 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> 
> ---
> Changelog:
> v1->v2
> - Split first patch as its bounce from
>   linux-perf-us...@vger.kernel.org mailing list because of 
>   'Message too long (>10 chars)' error.
> ---
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json 
> b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> new file mode 100644
> index ..12cfb9785433
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> @@ -0,0 +1,1282 @@
> +[
> +  {
> +"EventCode": "0x200FE",
> +"EventName": "PM_DATA_FROM_L2MISS",
> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L2 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x300FE",
> +"EventName": "PM_DATA_FROM_L3MISS",
> +"BriefDescription": "The processor's L1 data cache was reloaded from 
> beyond the local core's L3 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x400FE",
> +"EventName": "PM_DATA_FROM_MEMORY",
> +"BriefDescription": "The processor's data cache was reloaded from local, 
> remote, or distant memory due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x0003C040",
> +"EventName": "PM_INST_FROM_L2",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> the local core's L2 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x00034000C040",
> +"EventName": "PM_DATA_FROM_L2",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local core's L2 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x00030010C040",
> +"EventName": "PM_INST_FROM_L2_ALL",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> the local core's L2 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x00034020C040",
> +"EventName": "PM_DATA_FROM_L2_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local core's L2 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x003FC040",
> +"EventName": "PM_INST_FROM_L1MISS",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> a source beyond the local core's L1 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x003F4000C040",
> +"EventName": "PM_DATA_FROM_L1MISS",
> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L1 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x003F0010C040",
> +"EventName": "PM_INST_FROM_L1MISS_ALL",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> a source beyond the local core's L1 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x003F4020C040",
> +"EventName": "PM_DATA_FROM_L1MISS_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L1 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x4000C040",
> +"EventName": "PM_DATA_FROM_L2_NO_CONFLICT",
> +"BriefDescription": "The processor's L1 data cache was reloaded without 
> dispatch conflicts with data NOT in the MEPF state from the local core's L2 
> due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x4020C040",
> +"EventName": "PM_DATA_FROM_L2_NO_CONFLICT_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded without 
> dispatch conflicts with data NOT in the MEPF state from the local core's L2 
> due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x00404000C040",
> +"EventName": "PM_DATA_FROM_L2_MEPF",
> +"BriefDescription": "The processor's L1 data cache was reloaded with 
> data in the MEPF state without dispatch conflicts from the local core's L2 
> due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x00404020C040",
> +"EventName": "PM_DATA_FROM_L2_MEPF_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded with 
> data in the MEPF state without dispatch conflicts from the local core's L2 
> due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x00804000C040",
> +"EventName": "PM_DATA_FROM_L2_LDHITST_CONFLICT",
> +"BriefDescription": "The processor's L1 data cache was reloaded 

Re: [PATCH v2 2/3] perf vendor events: Update JSON/events for power10 platform

2023-09-06 Thread Arnaldo Carvalho de Melo
Em Tue, Sep 05, 2023 at 05:10:38PM +0530, Kajol Jain escreveu:
> Update JSON/Events list with additional data-source events
> for power10 platform.

I changed the cset title to:

"perf vendor events power10: Add extra data-source events"

As it was exactly the same as the first, so when someone does a 'git log
--oneline' it looks like a straight dup.

Please try to provide descriptive subjects.

- Arnaldo
 
> Signed-off-by: Kajol Jain 
> ---
>  .../arch/powerpc/power10/datasource.json  | 505 ++
>  1 file changed, 505 insertions(+)
> 
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json 
> b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> index 12cfb9785433..6b0356f2d301 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> @@ -1278,5 +1278,510 @@
>  "EventCode": "0x0A424000C142",
>  "EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_MOD",
>  "BriefDescription": "The processor's L1 data cache was reloaded with a 
> line in the M (exclusive) state from another core's L2 or L3 on the same chip 
> in a different regent due to a demand miss for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x0A424020C142",
> +"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_MOD_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded with a 
> line in the M (exclusive) state from another core's L2 or L3 on the same chip 
> in a different regent due to a demand miss or prefetch reload for a marked 
> instruction."
> +  },
> +  {
> +"EventCode": "0x0A03C142",
> +"EventName": "PM_MRK_INST_FROM_NON_REGENT_L2L3",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> another core's L2 or L3 on the same chip in a different regent due to a 
> demand miss for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x0A034000C142",
> +"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3",
> +"BriefDescription": "The processor's L1 data cache was reloaded from 
> another core's L2 or L3 on the same chip in a different regent due to a 
> demand miss for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x0A030010C142",
> +"EventName": "PM_MRK_INST_FROM_NON_REGENT_L2L3_ALL",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> another core's L2 or L3 on the same chip in a different regent due to a 
> demand miss or prefetch reload for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x0A034020C142",
> +"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from 
> another core's L2 or L3 on the same chip in a different regent due to a 
> demand miss or prefetch reload for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x0941C142",
> +"EventName": "PM_MRK_INST_FROM_LMEM",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> the local chip's memory due to a demand miss for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x09404000C142",
> +"EventName": "PM_MRK_DATA_FROM_LMEM",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local chip's memory due to a demand miss for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x09410010C142",
> +"EventName": "PM_MRK_INST_FROM_LMEM_ALL",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> the local chip's memory due to a demand miss or prefetch reload for a marked 
> instruction."
> +  },
> +  {
> +"EventCode": "0x09404020C142",
> +"EventName": "PM_MRK_DATA_FROM_LMEM_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local chip's memory due to a demand miss or prefetch reload for a marked 
> instruction."
> +  },
> +  {
> +"EventCode": "0x09804000C142",
> +"EventName": "PM_MRK_DATA_FROM_L_OC_CACHE",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local chip's OpenCAPI cache due to a demand miss for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x09804020C142",
> +"EventName": "PM_MRK_DATA_FROM_L_OC_CACHE_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local chip's OpenCAPI cache due to a demand miss or prefetch reload for a 
> marked instruction."
> +  },
> +  {
> +"EventCode": "0x09C04000C142",
> +"EventName": "PM_MRK_DATA_FROM_L_OC_MEM",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local chip's OpenCAPI memory due to a demand miss for a marked instruction."
> +  },
> +  {
> +"EventCode": "0x09C04020C142",
> +"EventName": "PM_MRK_DATA_FROM_L_OC_MEM_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local chip's 

Re: [PATCH v2 1/3] perf vendor events: Update JSON/events for power10 platform

2023-09-06 Thread Arnaldo Carvalho de Melo
Em Tue, Sep 05, 2023 at 05:10:37PM +0530, Kajol Jain escreveu:
> Update JSON/Events list with data-source events for power10 platform.

Next time could you please provide some pointer to the document from
where these metrics came if it is available online?

- Arnaldo
 
> Signed-off-by: Kajol Jain 
> ---
>  .../arch/powerpc/power10/datasource.json  | 1282 +
>  .../arch/powerpc/power10/others.json  |   10 -
>  .../arch/powerpc/power10/translation.json |5 -
>  3 files changed, 1282 insertions(+), 15 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> 
> ---
> Changelog:
> v1->v2
> - Split first patch as its bounce from
>   linux-perf-us...@vger.kernel.org mailing list because of 
>   'Message too long (>10 chars)' error.
> ---
> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json 
> b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> new file mode 100644
> index ..12cfb9785433
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
> @@ -0,0 +1,1282 @@
> +[
> +  {
> +"EventCode": "0x200FE",
> +"EventName": "PM_DATA_FROM_L2MISS",
> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L2 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x300FE",
> +"EventName": "PM_DATA_FROM_L3MISS",
> +"BriefDescription": "The processor's L1 data cache was reloaded from 
> beyond the local core's L3 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x400FE",
> +"EventName": "PM_DATA_FROM_MEMORY",
> +"BriefDescription": "The processor's data cache was reloaded from local, 
> remote, or distant memory due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x0003C040",
> +"EventName": "PM_INST_FROM_L2",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> the local core's L2 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x00034000C040",
> +"EventName": "PM_DATA_FROM_L2",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local core's L2 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x00030010C040",
> +"EventName": "PM_INST_FROM_L2_ALL",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> the local core's L2 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x00034020C040",
> +"EventName": "PM_DATA_FROM_L2_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from the 
> local core's L2 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x003FC040",
> +"EventName": "PM_INST_FROM_L1MISS",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> a source beyond the local core's L1 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x003F4000C040",
> +"EventName": "PM_DATA_FROM_L1MISS",
> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L1 due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x003F0010C040",
> +"EventName": "PM_INST_FROM_L1MISS_ALL",
> +"BriefDescription": "The processor's instruction cache was reloaded from 
> a source beyond the local core's L1 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x003F4020C040",
> +"EventName": "PM_DATA_FROM_L1MISS_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded from a 
> source beyond the local core's L1 due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x4000C040",
> +"EventName": "PM_DATA_FROM_L2_NO_CONFLICT",
> +"BriefDescription": "The processor's L1 data cache was reloaded without 
> dispatch conflicts with data NOT in the MEPF state from the local core's L2 
> due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x4020C040",
> +"EventName": "PM_DATA_FROM_L2_NO_CONFLICT_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded without 
> dispatch conflicts with data NOT in the MEPF state from the local core's L2 
> due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x00404000C040",
> +"EventName": "PM_DATA_FROM_L2_MEPF",
> +"BriefDescription": "The processor's L1 data cache was reloaded with 
> data in the MEPF state without dispatch conflicts from the local core's L2 
> due to a demand miss."
> +  },
> +  {
> +"EventCode": "0x00404020C040",
> +"EventName": "PM_DATA_FROM_L2_MEPF_ALL",
> +"BriefDescription": "The processor's L1 data cache was reloaded with 
> data in the MEPF state without dispatch conflicts from the local core's L2 
> due to a demand miss or prefetch reload."
> +  },
> +  {
> +"EventCode": "0x00804000C040",
> +"EventName": 

[PATCH 1/8] S390: Remove sentinel elem from ctl_table arrays

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Remove the sentinel element from appldata_table, s390dbf_table,
topology_ctl_table, cmm_table and page_table_sysctl. Reduced the
memory allocation in appldata_register_ops by 1 effectively removing the
sentinel from ops->ctl_table.

Signed-off-by: Joel Granados 
---
 arch/s390/appldata/appldata_base.c | 6 ++
 arch/s390/kernel/debug.c   | 3 +--
 arch/s390/kernel/topology.c| 3 +--
 arch/s390/mm/cmm.c | 3 +--
 arch/s390/mm/pgalloc.c | 3 +--
 5 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/s390/appldata/appldata_base.c 
b/arch/s390/appldata/appldata_base.c
index 3b0994625652..872a644b1fd1 100644
--- a/arch/s390/appldata/appldata_base.c
+++ b/arch/s390/appldata/appldata_base.c
@@ -62,8 +62,7 @@ static struct ctl_table appldata_table[] = {
.procname   = "interval",
.mode   = S_IRUGO | S_IWUSR,
.proc_handler   = appldata_interval_handler,
-   },
-   { },
+   }
 };
 
 /*
@@ -351,8 +350,7 @@ int appldata_register_ops(struct appldata_ops *ops)
if (ops->size > APPLDATA_MAX_REC_SIZE)
return -EINVAL;
 
-   /* The last entry must be an empty one */
-   ops->ctl_table = kcalloc(2, sizeof(struct ctl_table), GFP_KERNEL);
+   ops->ctl_table = kcalloc(1, sizeof(struct ctl_table), GFP_KERNEL);
if (!ops->ctl_table)
return -ENOMEM;
 
diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c
index a85e0c3e7027..150e2bfff0b3 100644
--- a/arch/s390/kernel/debug.c
+++ b/arch/s390/kernel/debug.c
@@ -977,8 +977,7 @@ static struct ctl_table s390dbf_table[] = {
.maxlen = sizeof(int),
.mode   = S_IRUGO | S_IWUSR,
.proc_handler   = s390dbf_procactive,
-   },
-   { }
+   }
 };
 
 static struct ctl_table_header *s390dbf_sysctl_header;
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 68adf1de..9dcfac416669 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -635,8 +635,7 @@ static struct ctl_table topology_ctl_table[] = {
.procname   = "topology",
.mode   = 0644,
.proc_handler   = topology_ctl_handler,
-   },
-   { },
+   }
 };
 
 static int __init topology_init(void)
diff --git a/arch/s390/mm/cmm.c b/arch/s390/mm/cmm.c
index f47515313226..8937aa7090b3 100644
--- a/arch/s390/mm/cmm.c
+++ b/arch/s390/mm/cmm.c
@@ -331,8 +331,7 @@ static struct ctl_table cmm_table[] = {
.procname   = "cmm_timeout",
.mode   = 0644,
.proc_handler   = cmm_timeout_handler,
-   },
-   { }
+   }
 };
 
 #ifdef CONFIG_CMM_IUCV
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 07fc660a24aa..e8cecd31715f 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -29,8 +29,7 @@ static struct ctl_table page_table_sysctl[] = {
.proc_handler   = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
-   },
-   { }
+   }
 };
 
 static int __init page_table_register_sysctl(void)

-- 
2.30.2



[PATCH 2/8] arm: Remove sentinel elem from ctl_table arrays

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Removed the sentinel as well as the explicit size from ctl_isa_vars. The
size is redundant as the initialization sets it.
Changed insn_emulation->sysctl from a 2 element array of struct
ctl_table to a simple struct. This has no consequence for the sysctl
registration as it is forwarded as a pointer.
Removed sentinel from sve_defatul_vl_table, sme_default_vl_table,
tagged_addr_sysctl_table and armv8_pmu_sysctl_table.

Signed-off-by: Joel Granados 
---
 arch/arm/kernel/isa.c| 4 ++--
 arch/arm64/kernel/armv8_deprecated.c | 8 +++-
 arch/arm64/kernel/fpsimd.c   | 6 ++
 arch/arm64/kernel/process.c  | 3 +--
 drivers/perf/arm_pmuv3.c | 3 +--
 5 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/isa.c b/arch/arm/kernel/isa.c
index 20218876bef2..0b9c28077092 100644
--- a/arch/arm/kernel/isa.c
+++ b/arch/arm/kernel/isa.c
@@ -16,7 +16,7 @@
 
 static unsigned int isa_membase, isa_portbase, isa_portshift;
 
-static struct ctl_table ctl_isa_vars[4] = {
+static struct ctl_table ctl_isa_vars[] = {
{
.procname   = "membase",
.data   = _membase, 
@@ -35,7 +35,7 @@ static struct ctl_table ctl_isa_vars[4] = {
.maxlen = sizeof(isa_portshift),
.mode   = 0444,
.proc_handler   = proc_dointvec,
-   }, {}
+   }
 };
 
 static struct ctl_table_header *isa_sysctl_header;
diff --git a/arch/arm64/kernel/armv8_deprecated.c 
b/arch/arm64/kernel/armv8_deprecated.c
index e459cfd33711..dd6ce86d4332 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -52,10 +52,8 @@ struct insn_emulation {
int min;
int max;
 
-   /*
-* sysctl for this emulation + a sentinal entry.
-*/
-   struct ctl_table sysctl[2];
+   /* sysctl for this emulation */
+   struct ctl_table sysctl;
 };
 
 #define ARM_OPCODE_CONDTEST_FAIL   0
@@ -558,7 +556,7 @@ static void __init register_insn_emulation(struct 
insn_emulation *insn)
update_insn_emulation_mode(insn, INSN_UNDEF);
 
if (insn->status != INSN_UNAVAILABLE) {
-   sysctl = >sysctl[0];
+   sysctl = >sysctl;
 
sysctl->mode = 0644;
sysctl->maxlen = sizeof(int);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 91e44ac7150f..db3ad1ba8272 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -588,8 +588,7 @@ static struct ctl_table sve_default_vl_table[] = {
.mode   = 0644,
.proc_handler   = vec_proc_do_default_vl,
.extra1 = _info[ARM64_VEC_SVE],
-   },
-   { }
+   }
 };
 
 static int __init sve_sysctl_init(void)
@@ -612,8 +611,7 @@ static struct ctl_table sme_default_vl_table[] = {
.mode   = 0644,
.proc_handler   = vec_proc_do_default_vl,
.extra1 = _info[ARM64_VEC_SME],
-   },
-   { }
+   }
 };
 
 static int __init sme_sysctl_init(void)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 0fcc4eb1a7ab..48861cdc3aae 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -723,8 +723,7 @@ static struct ctl_table tagged_addr_sysctl_table[] = {
.proc_handler   = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
-   },
-   { }
+   }
 };
 
 static int __init tagged_addr_init(void)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index e5a2ac4155f6..c4aa6a8d1b05 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1172,8 +1172,7 @@ static struct ctl_table armv8_pmu_sysctl_table[] = {
.proc_handler   = armv8pmu_proc_user_access_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
-   },
-   { }
+   }
 };
 
 static void armv8_pmu_register_sysctl_table(void)

-- 
2.30.2



[PATCH 8/8] c-sky: rm sentinel element from ctl_talbe array

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Remove sentinel from alignment_tbl ctl_table array.

Signed-off-by: Joel Granados 
---
 arch/csky/abiv1/alignment.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/csky/abiv1/alignment.c b/arch/csky/abiv1/alignment.c
index b60259daed1b..0d75ce7b0328 100644
--- a/arch/csky/abiv1/alignment.c
+++ b/arch/csky/abiv1/alignment.c
@@ -328,8 +328,7 @@ static struct ctl_table alignment_tbl[5] = {
.maxlen = sizeof(align_usr_count),
.mode = 0666,
.proc_handler = _dointvec
-   },
-   {}
+   }
 };
 
 static int __init csky_alignment_init(void)

-- 
2.30.2



[PATCH 6/8] powerpc: Remove sentinel element from ctl_table arrays

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Remove sentinel from powersave_nap_ctl_table and
nmi_wd_lpm_factor_ctl_table.

Signed-off-by: Joel Granados 
---
 arch/powerpc/kernel/idle.c| 3 +--
 arch/powerpc/platforms/pseries/mobility.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index b1c0418b25c8..a8591f5fa70e 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -104,8 +104,7 @@ static struct ctl_table powersave_nap_ctl_table[] = {
.maxlen = sizeof(int),
.mode   = 0644,
.proc_handler   = proc_dointvec,
-   },
-   {}
+   }
 };
 
 static int __init
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 0161226d8fec..d82b0c802fbb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -60,8 +60,7 @@ static struct ctl_table nmi_wd_lpm_factor_ctl_table[] = {
.maxlen = sizeof(int),
.mode   = 0644,
.proc_handler   = proc_douintvec_minmax,
-   },
-   {}
+   }
 };
 
 static int __init register_nmi_wd_lpm_factor_sysctl(void)

-- 
2.30.2



[PATCH 3/8] arch/x86: Remove sentinel elem from ctl_table arrays

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Remove sentinel element from sld_sysctl and itmt_kern_table.

Signed-off-by: Joel Granados 
---
 arch/x86/kernel/cpu/intel.c | 3 +--
 arch/x86/kernel/itmt.c  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index be4045628fd3..e63391b82624 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1015,8 +1015,7 @@ static struct ctl_table sld_sysctls[] = {
.proc_handler   = proc_douintvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
-   },
-   {}
+   }
 };
 
 static int __init sld_mitigate_sysctl_init(void)
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index ee4fe8cdb857..5f2ccff38297 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -73,8 +73,7 @@ static struct ctl_table itmt_kern_table[] = {
.proc_handler   = sched_itmt_update_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
-   },
-   {}
+   }
 };
 
 static struct ctl_table_header *itmt_sysctl_header;

-- 
2.30.2



[PATCH 5/8] riscv: Remove sentinel element from ctl_table array

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Remove sentinel element from riscv_v_default_vstate_table.

Signed-off-by: Joel Granados 
---
 arch/riscv/kernel/vector.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
index 8d92fb6c522c..a1ae68b2ac0f 100644
--- a/arch/riscv/kernel/vector.c
+++ b/arch/riscv/kernel/vector.c
@@ -254,8 +254,7 @@ static struct ctl_table riscv_v_default_vstate_table[] = {
.maxlen = sizeof(riscv_v_implicit_uacc),
.mode   = 0644,
.proc_handler   = proc_dobool,
-   },
-   { }
+   }
 };
 
 static int __init riscv_v_sysctl_init(void)

-- 
2.30.2



[PATCH 7/8] ia64: Remove sentinel element from ctl_table array

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Remove sentinel from kdump_ctl_table.

Signed-off-by: Joel Granados 
---
 arch/ia64/kernel/crash.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
index 88b3ce3e66cd..fbf8893a570c 100644
--- a/arch/ia64/kernel/crash.c
+++ b/arch/ia64/kernel/crash.c
@@ -231,8 +231,7 @@ static struct ctl_table kdump_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
-   },
-   { }
+   }
 };
 #endif
 

-- 
2.30.2



[PATCH 0/8] sysctl: Remove sentinel elements from arch

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

What?
These commits remove the sentinel element (last empty element) from the
sysctl arrays of all the files under the "arch/" directory that use a
sysctl array for registration. The merging of the preparation patches
(in https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)
to mainline allows us to just remove sentinel elements without changing
behavior (more info on how this was done here [1]).

These commits are part of a bigger set (bigger patchset here
https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V4)
that remove the ctl_table sentinel. The idea is to make the review
process easier by chunking the 52 commits into manageable pieces. By
sending out one chunk at a time, they can be reviewed separately without
noise from parallel sets. After the "arch/" commits in this set are
reviewed, I will continue with drivers/*, fs/*, kernel/*, net/* and
miscellaneous. The final set will remove the unneeded check for
->procname == NULL.

Why?
By removing the sysctl sentinel elements we avoid kernel bloat as
ctl_table arrays get moved out of kernel/sysctl.c into their own
respective subsystems. This move was started long ago to avoid merge
conflicts; the sentinel removal bit came after Mathew Wilcox suggested
it to avoid bloating the kernel by one element as arrays moved out. This
patchset will reduce the overall build time size of the kernel and run
time memory bloat by about ~64 bytes per declared ctl_table array. I
have consolidated some links that shed light on the history of this
effort [2].

Testing:
* Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh)
* Ran this through 0-day with no errors or warnings

Size saving after removing all sentinels:
  A consequence of eventually removing all the sentinels (64 bytes per
  sentinel) is the bytes we save. These are *not* numbers that we will
  get after this patch set; these are the numbers that we will get after
  removing all the sentinels. I included them here because they are
  relevant and to get an idea of just how much memory we are talking
  about.
* bloat-o-meter:
- The "yesall" configuration results save 9158 bytes (bloat-o-meter 
output here
  
https://lore.kernel.org/all/20230621091000.424843-1-j.grana...@samsung.com/)
- The "tiny" config + CONFIG_SYSCTL save 1215 bytes (bloat-o-meter 
output here
  
https://lore.kernel.org/all/20230809105006.1198165-1-j.grana...@samsung.com/)
* memory usage:
we save some bytes in main memory as well. In my testing kernel
I measured a difference of 7296 bytes. I include the way to
measure in [3]

Size saving after this patchset:
  Here I give the values that I measured for the architecture that I'm
  running (x86_64). This can give an approximation of how many bytes are
  saved for each arch. I won't publish for all the archs because I don't
  have access to all of them.
* bloat-o-meter
- The "yesall" config saves 192 bytes (bloat-o-meter output [4])
- The "tiny" config saves 64 bytes (bloat-o-meter output [5])
* memory usage:
In this case there were no bytes saved. To measure it comment the
printk in `new_dir` and uncomment the if conditional in `new_links`
[3].

Comments/feedback greatly appreciated

Best
Joel

[1]
We are able to remove a sentinel table without behavioral change by
introducing a table_size argument in the same place where procname is
checked for NULL. The idea is for it to keep stopping when it hits
->procname == NULL, while the sentinel is still present. And when the
sentinel is removed, it will stop on the table_size. You can go to 
(https://lore.kernel.org/all/20230809105006.1198165-1-j.grana...@samsung.com/)
for more information.

[2]
Links Related to the ctl_table sentinel removal:
* Good summary from Luis sent with the "pull request" for the
  preparation patches.
  https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/
* Another very good summary from Luis.
  https://lore.kernel.org/all/zmfizkfkvxuft...@bombadil.infradead.org/
* This is a patch set that replaces register_sysctl_table with register_sysctl
  https://lore.kernel.org/all/20230302204612.782387-1-mcg...@kernel.org/
* Patch set to deprecate register_sysctl_paths()
  https://lore.kernel.org/all/20230302202826.776286-1-mcg...@kernel.org/
* Here there is an explicit expectation for the removal of the sentinel element.
  https://lore.kernel.org/all/20230321130908.6972-1-frank...@vivo.com
* The "ARRAY_SIZE" approach was mentioned (proposed?) in this thread
  https://lore.kernel.org/all/20220220060626.15885-1-tangm...@uniontech.com

[3]
To measure the in memory savings apply this on top of this patchset.

"
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c88854df0b62..e0073a627bac 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -976,6 +976,8 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set,
  

[PATCH 4/8] x86 vdso: rm sentinel element from ctl_table array

2023-09-06 Thread Joel Granados via B4 Relay
From: Joel Granados 

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)

Remove sentinel element from abi_table2.

Signed-off-by: Joel Granados 
---
 arch/x86/entry/vdso/vdso32-setup.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso32-setup.c 
b/arch/x86/entry/vdso/vdso32-setup.c
index f3b3cacbcbb0..37b761802181 100644
--- a/arch/x86/entry/vdso/vdso32-setup.c
+++ b/arch/x86/entry/vdso/vdso32-setup.c
@@ -66,8 +66,7 @@ static struct ctl_table abi_table2[] = {
.proc_handler   = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
-   },
-   {}
+   }
 };
 
 static __init int ia32_binfmt_init(void)

-- 
2.30.2



[PATCH RFC] powerpc/rtas: Make it possible to disable sys_rtas

2023-09-06 Thread Michal Suchanek
Additional patch suggestion to go with the rtas devices:

---

With most important rtas functions available through different
interfaces the sys_rtas interface can be disabled completely.

Do not remove it for now to make it possible to run older versions of
userspace tools that don't support other interfaces.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/rtas.c | 2 ++
 arch/powerpc/platforms/Kconfig | 9 +
 2 files changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index eddc031c4b95..5854a8bb5731 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1684,6 +1684,7 @@ noinstr struct pseries_errorlog 
*get_pseries_errorlog(struct rtas_error_log *log
return NULL;
 }
 
+#ifdef PPC_RTAS_SYSCALL
 /*
  * The sys_rtas syscall, as originally designed, allows root to pass
  * arbitrary physical addresses to RTAS calls. A number of RTAS calls
@@ -1893,6 +1894,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 
return 0;
 }
+#endif
 
 static void __init rtas_function_table_init(void)
 {
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 1fd253f92a77..9563e38188d5 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -150,6 +150,15 @@ config RTAS_FLASH
tristate "Firmware flash interface"
depends on PPC64 && RTAS_PROC
 
+config RTAS_SYSCALL
+   bool "Legacy syscall interface to RTAS"
+   depends on PPC_RTAS
+   default y
+   help
+ Enables support for the legacy sys_rtas interface. Calls that need to
+ pass data buffers use /dev/mem directly which is not compatible with
+ lockdown. For now some tools still need this interface to work.
+
 config MMIO_NVRAM
bool
 
-- 
2.41.0



Re: [PATCH RFC 0/2] powerpc/pseries: new character devices for RTAS functions

2023-09-06 Thread Michal Suchánek
Hello,

On Tue, Aug 22, 2023 at 04:33:38PM -0500, Nathan Lynch via B4 Relay wrote:
> This is a proposal for adding chardev-based access to a select subset
> of RTAS functions on the pseries platform.
> 
> The problem: important platform features are enabled on Linux VMs
> through the powerpc-specific rtas() syscall in combination with
> writeable mappings of /dev/mem. In typical usage, this is encapsulated
> behind APIs provided by the librtas library. This paradigm is
> incompatible with lockdown, which prohibits /dev/mem access.
> 
> The solution I'm working on is to add a small pseries-specific
> "driver" for each functional area, exposing the relevant features to
> user space in ways that are compatible with lockdown. In most of these
> areas, I believe it's possible to change librtas to prefer the new
> chardev interfaces without disrupting existing users.

thanks for the driver.

> 
> I've broken down the affected functions into the following areas and
> priorities:
> 
> High priority:
> * VPD retrieval.
> * System parameters: retrieval and update.
> 
> Medium priority:
> * Platform dump retrieval.
> * Light path diagnostics (get/set-dynamic-indicator,
>   get-dynamic-sensor-state, get-indices).
> 
> Low priority (may never happen):
> * Error injection: would have to be carefully restricted.
> * Physical attestation: no known users.
> * LPAR perftools: no known users.
> 
> Out of scope:
> * DLPAR (configure-connector et al): involves device tree updates
>   which must be handled entirely in-kernel for lockdown. This is the
>   object of a separate effort.
> 
> See https://github.com/ibm-power-utilities/librtas/issues/29 for more
> details.
> 
> In this RFC, I've included a single driver for VPD retrieval. Clients
> use ioctl() to obtain a file descriptor-based handle for the VPD they
> want. I think this could be a good model for the other areas too, but
> I'd like to get opinions on it.

The call has parameters so it cannot be reasonably done with sysfs or
similar.

The paramater is string which is unweildy with ioctl, and netlink has
helpers for getting strings into and out of messages without garbage
pointers nad crashes. However, netlink does not have permissions, and
setting permissions for the different platform features available
through rtas is desirable.

Then this is as good as it gets with the kernel facilities Linux
provides.

Thanks

Michal


Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-06 Thread Michal Suchánek
On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch 
> 
> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> components using the ibm,get-vpd RTAS function.
> 
> We can expose this to user space with a /dev/papr-vpd character
> device, where the programming model is:
> 
>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
>   int devfd = open("/dev/papr-vpd", O_WRONLY);
>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
>   size_t size = lseek(vpdfd, 0, SEEK_END);
>   char *buf = malloc(size);
>   pread(devfd, buf, size, 0);
> 
> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> the file contains the result of a complete ibm,get-vpd sequence. The
> file contents are immutable from the POV of user space. To get a new
> view of VPD, clients must create a new handle.
> 
> This design choice insulates user space from most of the complexities
> that ibm,get-vpd brings:
> 
> * ibm,get-vpd must be called more than once to obtain complete
>   results.
> * Only one ibm,get-vpd call sequence should be in progress at a time;
>   concurrent sequences will disrupt each other. Callers must have a
>   protocol for serializing their use of the function.
> * A call sequence in progress may receive a "VPD changed, try again"
>   status, requiring the client to start over. (The driver does not yet
>   handle this, but it should be easy to add.)
> 
> The memory required for the VPD buffers seems acceptable, around 20KB
> for all VPD on one of my systems. And the value of the
> /rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
> consistently 300KB across various systems I've checked.
> 
> I've implemented support for this new ABI in the rtas_get_vpd()
> function in librtas, which the vpdupdate command currently uses to
> populate its VPD database. I've verified that an unmodified vpdupdate
> binary generates an identical database when using a librtas.so that
> prefers the new ABI.
> 
> Likely remaining work:
> 
> * Handle RTAS call status -4 (VPD changed) during ibm,get-vpd call
>   sequence.
> * Prevent ibm,get-vpd calls via rtas(2) from disrupting ibm,get-vpd
>   call sequences in this driver.
> * (Maybe) implement a poll method for delivering notifications of
>   potential changes to VPD, e.g. after a partition migration.
> 
> Questions, points for discussion:
> 
> * Am I allocating the ioctl numbers correctly?
> * The only way to discover the size of a VPD buffer is
>   lseek(SEEK_END). fstat() doesn't work for anonymous fds like
>   this. Is this OK, or should the buffer length be discoverable some
>   other way?
> 
> Signed-off-by: Nathan Lynch 
> ---
>  Documentation/userspace-api/ioctl/ioctl-number.rst |   2 +
>  arch/powerpc/include/uapi/asm/papr-vpd.h   |  29 ++
>  arch/powerpc/platforms/pseries/Makefile|   1 +
>  arch/powerpc/platforms/pseries/papr-vpd.c  | 353 
> +
>  4 files changed, 385 insertions(+)
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 4ea5b837399a..a950545bf7cd 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -349,6 +349,8 @@ Code  Seq#Include File
>Comments
>   
> 
>  0xB1  00-1F  PPPoX
>   
> 
> +0xB2  00 arch/powerpc/include/uapi/asm/papr-vpd.h
> powerpc/pseries VPD API
> + 
> 
>  0xB3  00 linux/mmc/ioctl.h
>  0xB4  00-0F  linux/gpio.h
> 
>  0xB5  00-0F  uapi/linux/rpmsg.h  
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr-vpd.h 
> b/arch/powerpc/include/uapi/asm/papr-vpd.h
> new file mode 100644
> index ..aa33217ad5de
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr-vpd.h
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> +#ifndef _UAPI_PAPR_VPD_H_
> +#define _UAPI_PAPR_VPD_H_
> +
> +#include 
> +#include 
> +
> +struct papr_location_code {
> + /*
> +  * PAPR+ 12.3.2.4 Converged Location Code Rules - Length
> +  * Restrictions. 79 characters plus nul.
> +  */
> + char str[80];
> +};
> +
> +#define PAPR_VPD_IOCTL_BASE 0xb2
> +
> +#define PAPR_VPD_IO(nr) _IO(PAPR_VPD_IOCTL_BASE, nr)
> +#define PAPR_VPD_IOR(nr, type)  _IOR(PAPR_VPD_IOCTL_BASE, nr, type)
> +#define PAPR_VPD_IOW(nr, type)  _IOW(PAPR_VPD_IOCTL_BASE, nr, type)
> 

Re: [PATCH gmem FIXUP] mm, compaction: make testing mapping_unmovable() safe

2023-09-06 Thread Vlastimil Babka
On 9/6/23 01:56, Sean Christopherson wrote:
> On Fri, Sep 01, 2023, Vlastimil Babka wrote:
>> As Kirill pointed out, mapping can be removed under us due to
>> truncation. Test it under folio lock as already done for the async
>> compaction / dirty folio case. To prevent locking every folio with
>> mapping to do the test, do it only for unevictable folios, as we can
>> expect the unmovable mapping folios are also unevictable - it is the
>> case for guest memfd folios.
> 
> Rather than expect/assume that unmovable mappings are always unevictable, how 
> about
> requiring that?  E.g. either through a VM_WARN_ON in mapping_set_unmovable(), 
> or by
> simply having that helper forcefully set AS_UNEVICTABLE as well.

Yeah I guess we could make the helper do that, with a comment, as gmem is
the only user right now. And if in the future somebody has case where it
makes sense to have unmovable without unevictable, we can discuss what to do
about it then.