Re: [PATCH 5/5] mm/treewide: Drop pXd_large()

2024-02-28 Thread Peter Xu
On Thu, Feb 29, 2024 at 01:17:36PM +0800, kernel test robot wrote:
> >> arch/x86/include/asm/pgtable.h:1099:19: error: redefinition of 'pud_leaf'
> 1099 | static inline int pud_leaf(pud_t pud)
>  |   ^
>include/asm-generic/pgtable-nopmd.h:34:19: note: previous definition is 
> here
>   34 | static inline int pud_leaf(pud_t pud)   { return 0; }
>  |   ^

This is CONFIG_PGTABLE_LEVELS=2.  IIUC patch 5 didn't do anything wrong,
but when renaming pud_large() it caused this confliction, while in the past
it was a silent confliction between the old pud_leaf() macro and pud_leaf()
defintion, the macro could have silently overwrote the function.

IIUC such pud_leaf() is not needed as we have a global fallback.  I'll add
a pre-requisite patch to remove such pXd_leaf() definitions.

-- 
Peter Xu



Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support

2024-02-28 Thread Baoquan He
On 02/26/24 at 02:11pm, Sourabh Jain wrote:
..snip...
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 70fa8111a9d6..630c4fd7ea39 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -496,7 +496,7 @@ static DEFINE_MUTEX(__crash_hotplug_lock);
>   * It reflects the kernel's ability/permission to update the crash
>   * elfcorehdr directly.
 ~ this should be updated too.

>   */
> -int crash_check_update_elfcorehdr(void)
> +int crash_check_hotplug_support(void)
>  {
>   int rc = 0;
>  
> @@ -508,10 +508,7 @@ int crash_check_update_elfcorehdr(void)
>   return 0;
>   }
>   if (kexec_crash_image) {
> - if (kexec_crash_image->file_mode)
> - rc = 1;
> - else
> - rc = kexec_crash_image->update_elfcorehdr;
> + rc = kexec_crash_image->hotplug_support;
>   }
>   /* Release lock now that update complete */
>   kexec_unlock();
> @@ -552,8 +549,8 @@ static void crash_handle_hotplug_event(unsigned int 
> hp_action, unsigned int cpu,
>  
>   image = kexec_crash_image;
>  
> - /* Check that updating elfcorehdr is permitted */
> - if (!(image->file_mode || image->update_elfcorehdr))
> + /* Check that kexec segments update is permitted */
> + if (!image->hotplug_support)
>   goto out;
>  
>   if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index bab542fc1463..a6b3f96bb50c 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -135,8 +135,8 @@ static int do_kexec_load(unsigned long entry, unsigned 
> long nr_segments,
>   image->preserve_context = 1;
>  
>  #ifdef CONFIG_CRASH_HOTPLUG
> - if (flags & KEXEC_UPDATE_ELFCOREHDR)
> - image->update_elfcorehdr = 1;
> + if ((flags & KEXEC_ON_CRASH) && arch_crash_hotplug_support(image, 
> flags))
> + image->hotplug_support = 1;
>  #endif
>  
>   ret = machine_kexec_prepare(image);
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 2d1db05fbf04..3d64290d24c9 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -376,6 +376,11 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, 
> initrd_fd,
>   if (ret)
>   goto out;
>  
> +#ifdef CONFIG_CRASH_HOTPLUG
> + if ((flags & KEXEC_FILE_ON_CRASH) && arch_crash_hotplug_support(image, 
> flags))
> + image->hotplug_support = 1;
> +#endif
> +
>   ret = machine_kexec_prepare(image);
>   if (ret)
>   goto out;

Other than the tiny part, the overall looks good to me.

Acked-by: Baoquan He 



Re: Boot failure with ppc64 port on iMacs G5

2024-02-28 Thread Michael Ellerman
John Paul Adrian Glaubitz  writes:
> On Tue, 2024-02-20 at 04:16 +0100, tuxayo wrote:
>> I tried snapshots/2024-01-31/debian-12.0.0-ppc64-NETINST-1.iso
>> 
>> And was able to start booting from usb with:
>> boot usb0/disk@1:,\boot\grub\powerpc.elf
>> (typed in Open Firmware shell)
>> (usb0 is the top port)
>> 
>> Grub worked, and then I tried default install (the 1st option) and it 
>> started loading during like 2 minutes.
>> And then it got stuck with some superposition of the messages
>> smp_core99_probe
>> and
>> the stuff before
>> DO-QUIESCE finisedBooting Linux via __start() @ 0x0209 ...
>
> There seems to be a regression in the kernel which affects PowerPC 970 
> machines,
> i.e. PowerMac G5 CPUs. The issue needs to be bisected and reported upstream.

I have a quad G5 that is booting mainline happily.

I used to have an iMac G5 but it died.

> If you have the time, I would really appreciate if you could test the various
> snapshots and let me know which kernel is the first to not work. I expect that
> the breakage occurred somewhere around kernel 6.3 or so.

Can someone send the .config for the kernel in question? I could try
that on my machine here.

cheers


Re: linux-next: manual merge of the powerpc tree with the mm-stable tree

2024-02-28 Thread Michael Ellerman
Stephen Rothwell  writes:
> Hi all,
>
> Today's linux-next merge of the powerpc tree got a conflict in:
>
>   arch/powerpc/mm/pgtable_32.c
>
> between commit:
>
>   a5e8131a0329 ("arm64, powerpc, riscv, s390, x86: ptdump: refactor 
> CONFIG_DEBUG_WX")
>
> from the mm-stable tree and commit:
>
>   8f17bd2f4196 ("powerpc: Handle error in mark_rodata_ro() and 
> mark_initmem_nx()")
>
> from the powerpc tree.

Thanks. That's a fairly ugly conflict.

Maybe I'll drop that patch until the generic change has gone in.

cheers


Re: [PATCH] powerpc/mm: Code cleanup for __hash_page_thp

2024-02-28 Thread Michael Ellerman
Aneesh Kumar K.V  writes:
> Michael Ellerman  writes:
>> Kunwu Chan  writes:
>>> On 2024/2/26 18:49, Michael Ellerman wrote:
 Kunwu Chan  writes:
> This part was commented from commit 6d492ecc6489
> ("powerpc/THP: Add code to handle HPTE faults for hugepages")
> in about 11 years before.
>
> If there are no plans to enable this part code in the future,
> we can remove this dead code.
 
 I agree the code can go. But I'd like it to be replaced with a comment
 explaining what the dead code was trying to say.
>>
>>> Thanks, i'll update a new patch with the following comment:
>>>  /*
>>>  * No CPU has hugepages but lacks no execute, so we
>>>  * don't need to worry about cpu no CPU_FTR_COHERENT_ICACHE feature case
>>>  */
>>
>> Maybe wait until we can get some input from Aneesh. I'm not sure the
>> code/comment are really up to date.
>
> How about?
>
> modified   arch/powerpc/mm/book3s64/hash_hugepage.c
> @@ -58,17 +58,13 @@ int __hash_page_thp(unsigned long ea, unsigned long 
> access, unsigned long vsid,
>   return 0;
>  
>   rflags = htab_convert_pte_flags(new_pmd, flags);
> + /*
> +  * THPs are only supported on platforms that can do mixed page size
> +  * segments (MPSS) and all such platforms have coherent icache. Hence we
> +  * don't need to do lazy icache flush (hash_page_do_lazy_icache()) on
> +  * noexecute fault.
> +  */

Yeah thanks that looks good.

It could say "see eg. __hash_page_4K()", but that's probably unnecessary
as it mentions hash_page_do_lazy_icache(), and anyone interested is just
going to grep for that anyway.

cheers


Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support

2024-02-28 Thread Baoquan He
On 02/29/24 at 10:35am, Sourabh Jain wrote:
> Hello Baoquan,
> 
> Do you have any comments or suggestions for this patch series, especially
> for this patch?

Have applied this series and reviewing, will ack or add comment if any
concern. Thanks.

> On 26/02/24 14:11, Sourabh Jain wrote:
> > Commit a72bbec70da2 ("crash: hotplug support for kexec_load()")
> > introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
> > this flag to indicate to the kernel that it is safe to modify the
> > elfcorehdr of the kdump image loaded using the kexec_load system call.
> > 
> > However, it is possible that architectures may need to update kexec
> > segments other then elfcorehdr. For example, FDT (Flatten Device Tree)
> > on PowerPC. Introducing a new kexec flag for every new kexec segment
> > may not be a good solution. Hence, a generic kexec flag bit,
> > `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory
> > hotplug support intent between the kexec tool and the kernel for the
> > kexec_load system call.
> > 
> > Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to
> > the kernel, it indicates to the kernel that all the required kexec
> > segment is skipped from SHA calculation and it is safe to update kdump
> > image loaded using the kexec_load syscall.
> > 
> > While loading the kdump image using the kexec_load syscall, the
> > @update_elfcorehdr member of struct kimage is set if the kexec tool
> > sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used
> > to determine whether it is safe to update elfcorehdr on hotplug events.
> > However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec
> > flag, the kexec tool could mark all the required kexec segments on an
> > architecture as safe to update. So rename the @update_elfcorehdr to
> > @hotplug_support. If @hotplug_support is set, the kernel can safely
> > update all the required kexec segments of the kdump image during
> > CPU/Memory hotplug events.
> > 
> > Introduce an architecture-specific function to process kexec flags for
> > determining hotplug support. Set the @hotplug_support member of struct
> > kimage for both kexec_load and kexec_file_load system calls. This
> > simplifies kernel checks to identify hotplug support for the currently
> > loaded kdump image by just examining the value of @hotplug_support.
> > 
> > Signed-off-by: Sourabh Jain 
> > Cc: Akhil Raj 
> > Cc: Andrew Morton 
> > Cc: Aneesh Kumar K.V 
> > Cc: Baoquan He 
> > Cc: Borislav Petkov (AMD) 
> > Cc: Boris Ostrovsky 
> > Cc: Christophe Leroy 
> > Cc: Dave Hansen 
> > Cc: Dave Young 
> > Cc: David Hildenbrand 
> > Cc: Eric DeVolder 
> > Cc: Greg Kroah-Hartman 
> > Cc: Hari Bathini 
> > Cc: Laurent Dufour 
> > Cc: Mahesh Salgaonkar 
> > Cc: Michael Ellerman 
> > Cc: Mimi Zohar 
> > Cc: Naveen N Rao 
> > Cc: Oscar Salvador 
> > Cc: Thomas Gleixner 
> > Cc: Valentin Schneider 
> > Cc: Vivek Goyal 
> > Cc: ke...@lists.infradead.org
> > Cc: x...@kernel.org
> > ---
> >   arch/x86/include/asm/kexec.h | 11 ++-
> >   arch/x86/kernel/crash.c  | 28 +---
> >   drivers/base/cpu.c   |  2 +-
> >   drivers/base/memory.c|  2 +-
> >   include/linux/crash_core.h   | 13 ++---
> >   include/linux/kexec.h| 11 +++
> >   include/uapi/linux/kexec.h   |  1 +
> >   kernel/crash_core.c  | 11 ---
> >   kernel/kexec.c   |  4 ++--
> >   kernel/kexec_file.c  |  5 +
> >   10 files changed, 46 insertions(+), 42 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> > index cb1320ebbc23..ae5482a2f0ca 100644
> > --- a/arch/x86/include/asm/kexec.h
> > +++ b/arch/x86/include/asm/kexec.h
> > @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void);
> >   void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
> >   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
> > -#ifdef CONFIG_HOTPLUG_CPU
> > -int arch_crash_hotplug_cpu_support(void);
> > -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
> > -#endif
> > -
> > -#ifdef CONFIG_MEMORY_HOTPLUG
> > -int arch_crash_hotplug_memory_support(void);
> > -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
> > -#endif
> > +int arch_crash_hotplug_support(struct kimage *image, unsigned long 
> > kexec_flags);
> > +#define arch_crash_hotplug_support arch_crash_hotplug_support
> >   unsigned int arch_crash_get_elfcorehdr_size(void);
> >   #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 2a682fe86352..f06501445cd9 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image)
> >   #undef pr_fmt
> >   #define pr_fmt(fmt) "crash hp: " fmt
> > -/* These functions provide the value for the sysfs 

Re: [PATCH 5/5] mm/treewide: Drop pXd_large()

2024-02-28 Thread kernel test robot
Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:
https://github.com/intel-lab-lkp/linux/commits/peterx-redhat-com/mm-ppc-Define-pXd_large-with-pXd_leaf/20240228-170049
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
mm-everything
patch link:
https://lore.kernel.org/r/20240228085350.520953-6-peterx%40redhat.com
patch subject: [PATCH 5/5] mm/treewide: Drop pXd_large()
config: i386-buildonly-randconfig-001-20240228 
(https://download.01.org/0day-ci/archive/20240229/202402291233.cvhchp2c-...@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 
6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240229/202402291233.cvhchp2c-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202402291233.cvhchp2c-...@intel.com/

All errors (new ones prefixed by >>):

   In file included from arch/x86/kernel/asm-offsets.c:14:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:17:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:9:
   In file included from include/linux/sched/task.h:13:
   In file included from include/linux/uaccess.h:11:
   In file included from arch/x86/include/asm/uaccess.h:17:
   In file included from arch/x86/include/asm/tlbflush.h:16:
>> arch/x86/include/asm/pgtable.h:1099:19: error: redefinition of 'pud_leaf'
1099 | static inline int pud_leaf(pud_t pud)
 |   ^
   include/asm-generic/pgtable-nopmd.h:34:19: note: previous definition is here
  34 | static inline int pud_leaf(pud_t pud)   { return 0; }
 |   ^
   1 error generated.
   make[3]: *** [scripts/Makefile.build:116: arch/x86/kernel/asm-offsets.s] 
Error 1 shuffle=298844285
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1191: prepare0] Error 2 shuffle=298844285
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:240: __sub-make] Error 2 shuffle=298844285
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:240: __sub-make] Error 2 shuffle=298844285
   make: Target 'prepare' not remade because of errors.


vim +/pud_leaf +1099 arch/x86/include/asm/pgtable.h

  1093  
  1094  static inline int pud_bad(pud_t pud)
  1095  {
  1096  return (pud_flags(pud) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0;
  1097  }
  1098  #else
> 1099  static inline int pud_leaf(pud_t pud)
  1100  {
  1101  return 0;
  1102  }
  1103  #endif  /* CONFIG_PGTABLE_LEVELS > 2 */
  1104  #define pud_leaf pud_leaf
  1105  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH v8 1/3] powerpc: make fadump resilient with memory add/remove events

2024-02-28 Thread Sourabh Jain

Hello Michael and Aneesh,

Please let me know if you have any comments or suggestions for this 
patch series.


Thanks,
Sourabh


On 17/02/24 12:50, Sourabh Jain wrote:

Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.

The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

There are several notable issues associated with re-registering fadump
for every memory add/remove events.

1. Bulk memory add/remove events with udev-based fadump re-registration
can lead to race conditions and, more importantly, it creates a wide
window during which fadump is inactive until all memory add/remove
events are settled.
2. Re-registering fadump for every memory add/remove event is
inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
available during early boot and remains fixed thereafter. However, if
elfcorehdr is later recreated with additional memblock regions, its
size will increase, potentially leading to memory corruption.

Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

At present, the first kernel is responsible for preparing the fadump
header and storing it in the fadump reserved area. The fadump header
includes the start address of the elfcorehdr, crashing CPU details, and
other relevant information. In the event of a crash in the first kernel,
the second/fadump boots and accesses the fadump header prepared by the
first kernel. It then performs the following steps in a
platform-specific function [rtas|opal]_fadump_process:

1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr

Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr and set its address to the global variable elfcorehdr_addr
for the vmcore module to process it in the second/fadump kernel.

Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.

To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.

At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.

In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.

1. version:
It stores the fadump header version, which is currently set to 1.
This provides flexibility to update the fadump crash info header in
the future without changing the magic number. For each change in the
fadump header, the version will be increased. This will help the
updated kernel determine how to handle kernel dumps from older
kernels. The magic number remains relevant for checking fadump header
corruption.

2. pt_regs_sz/cpu_mask_sz:
Store size of pt_regs and cpu_mask structure of first kernel. These
attributes are used to prevent dump processing if the sizes of
pt_regs or cpu_mask structure differ between the first and fadump
kernels.

Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
  arch/powerpc/include/asm/fadump-internal.h   |  31 +-
  arch/powerpc/kernel/fadump.c | 339 +++
  arch/powerpc/platforms/powernv/opal-fadump.c |  22 +-
  arch/powerpc/platforms/pseries/rtas-fadump.c |  30 +-
  4 

Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support

2024-02-28 Thread Sourabh Jain

Hello Baoquan,

Do you have any comments or suggestions for this patch series, 
especially for this patch?


Thanks,
Sourabh

On 26/02/24 14:11, Sourabh Jain wrote:

Commit a72bbec70da2 ("crash: hotplug support for kexec_load()")
introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
this flag to indicate to the kernel that it is safe to modify the
elfcorehdr of the kdump image loaded using the kexec_load system call.

However, it is possible that architectures may need to update kexec
segments other then elfcorehdr. For example, FDT (Flatten Device Tree)
on PowerPC. Introducing a new kexec flag for every new kexec segment
may not be a good solution. Hence, a generic kexec flag bit,
`KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory
hotplug support intent between the kexec tool and the kernel for the
kexec_load system call.

Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to
the kernel, it indicates to the kernel that all the required kexec
segment is skipped from SHA calculation and it is safe to update kdump
image loaded using the kexec_load syscall.

While loading the kdump image using the kexec_load syscall, the
@update_elfcorehdr member of struct kimage is set if the kexec tool
sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used
to determine whether it is safe to update elfcorehdr on hotplug events.
However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec
flag, the kexec tool could mark all the required kexec segments on an
architecture as safe to update. So rename the @update_elfcorehdr to
@hotplug_support. If @hotplug_support is set, the kernel can safely
update all the required kexec segments of the kdump image during
CPU/Memory hotplug events.

Introduce an architecture-specific function to process kexec flags for
determining hotplug support. Set the @hotplug_support member of struct
kimage for both kexec_load and kexec_file_load system calls. This
simplifies kernel checks to identify hotplug support for the currently
loaded kdump image by just examining the value of @hotplug_support.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/x86/include/asm/kexec.h | 11 ++-
  arch/x86/kernel/crash.c  | 28 +---
  drivers/base/cpu.c   |  2 +-
  drivers/base/memory.c|  2 +-
  include/linux/crash_core.h   | 13 ++---
  include/linux/kexec.h| 11 +++
  include/uapi/linux/kexec.h   |  1 +
  kernel/crash_core.c  | 11 ---
  kernel/kexec.c   |  4 ++--
  kernel/kexec_file.c  |  5 +
  10 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index cb1320ebbc23..ae5482a2f0ca 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void);
  void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
  
-#ifdef CONFIG_HOTPLUG_CPU

-int arch_crash_hotplug_cpu_support(void);
-#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
-#endif
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-int arch_crash_hotplug_memory_support(void);
-#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
-#endif
+int arch_crash_hotplug_support(struct kimage *image, unsigned long 
kexec_flags);
+#define arch_crash_hotplug_support arch_crash_hotplug_support
  
  unsigned int arch_crash_get_elfcorehdr_size(void);

  #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 2a682fe86352..f06501445cd9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image)
  #undef pr_fmt
  #define pr_fmt(fmt) "crash hp: " fmt
  
-/* These functions provide the value for the sysfs crash_hotplug nodes */

-#ifdef CONFIG_HOTPLUG_CPU
-int arch_crash_hotplug_cpu_support(void)
+int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags)
  {
-   return crash_check_update_elfcorehdr();
-}
-#endif
  
-#ifdef CONFIG_MEMORY_HOTPLUG

-int arch_crash_hotplug_memory_support(void)
-{
-   return crash_check_update_elfcorehdr();
-}
+#ifdef CONFIG_KEXEC_FILE
+   if (image->file_mode)
+   return 1;
  #endif
+   /*
+* Initially, crash hotplug support for 

Re: [kvm-unit-tests PATCH 32/32] powerpc: gitlab CI update

2024-02-28 Thread Nicholas Piggin
On Wed Feb 28, 2024 at 10:16 PM AEST, Andrew Jones wrote:
> On Mon, Feb 26, 2024 at 08:12:18PM +1000, Nicholas Piggin wrote:
> > This adds testing for the powernv machine, and adds a gitlab-ci test
> > group instead of specifying all tests in .gitlab-ci.yml.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  .gitlab-ci.yml| 16 ++--
> >  powerpc/unittests.cfg | 15 ---
> >  2 files changed, 14 insertions(+), 17 deletions(-)
> > 
> > diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> > index 61f196d5d..51a593021 100644
> > --- a/.gitlab-ci.yml
> > +++ b/.gitlab-ci.yml
> > @@ -69,11 +69,9 @@ build-ppc64be:
> >   - cd build
> >   - ../configure --arch=ppc64 --endian=big 
> > --cross-prefix=powerpc64-linux-gnu-
> >   - make -j2
> > - - ACCEL=tcg ./run_tests.sh
> > - selftest-setup selftest-migration selftest-migration-skip spapr_hcall
> > - rtas-get-time-of-day rtas-get-time-of-day-base rtas-set-time-of-day
> > - emulator
> > - | tee results.txt
> > + - ACCEL=tcg MAX_SMP=8 ./run_tests.sh -g gitlab-ci | tee results.txt
> > + - if grep -q FAIL results.txt ; then exit 1 ; fi
> > + - ACCEL=tcg MAX_SMP=8 MACHINE=powernv ./run_tests.sh -g gitlab-ci | tee 
> > results.txt
> >   - if grep -q FAIL results.txt ; then exit 1 ; fi
> >  
> >  build-ppc64le:
> > @@ -82,11 +80,9 @@ build-ppc64le:
> >   - dnf install -y qemu-system-ppc gcc-powerpc64-linux-gnu nmap-ncat
> >   - ./configure --arch=ppc64 --endian=little 
> > --cross-prefix=powerpc64-linux-gnu-
> >   - make -j2
> > - - ACCEL=tcg ./run_tests.sh
> > - selftest-setup selftest-migration selftest-migration-skip spapr_hcall
> > - rtas-get-time-of-day rtas-get-time-of-day-base rtas-set-time-of-day
> > - emulator
> > - | tee results.txt
> > + - ACCEL=tcg MAX_SMP=8 ./run_tests.sh -g gitlab-ci | tee results.txt
> > + - if grep -q FAIL results.txt ; then exit 1 ; fi
> > + - ACCEL=tcg MAX_SMP=8 MACHINE=powernv ./run_tests.sh -g gitlab-ci | tee 
> > results.txt
> >   - if grep -q FAIL results.txt ; then exit 1 ; fi
> >  
>
> We're slowly migrating all tests like these to
>
>  grep -q PASS results.txt && ! grep -q FAIL results.txt
>
> Here's a good opportunity to change ppc's.

Sure, I'll do that.

Thanks,
Nick


Re: [kvm-unit-tests PATCH 09/32] scripts: allow machine option to be specified in unittests.cfg

2024-02-28 Thread Nicholas Piggin
On Wed Feb 28, 2024 at 9:47 PM AEST, Andrew Jones wrote:
> On Mon, Feb 26, 2024 at 08:11:55PM +1000, Nicholas Piggin wrote:
> > This allows different machines with different requirements to be
> > supported by run_tests.sh, similarly to how different accelerators
> > are handled.
> > 
> > Acked-by: Thomas Huth 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  scripts/common.bash  |  8 ++--
> >  scripts/runtime.bash | 16 
> >  2 files changed, 18 insertions(+), 6 deletions(-)
>
> Please also update the unittests.cfg documentation.

Yeah good catch, I will do.

> Currently that
> documentation lives in the header of each unittests.cfg file, but
> we could maybe change each file to have a single line which points
> at a single document.

I'll take a look and do something if it's simple enough,
otherwise I'll just update the unittests.cfg.

Thanks,
Nick


Re: [PATCH] powerpc/mm: Code cleanup for __hash_page_thp

2024-02-28 Thread Aneesh Kumar K . V
Michael Ellerman  writes:

> Kunwu Chan  writes:
>> Thanks for the reply.
>>
>> On 2024/2/26 18:49, Michael Ellerman wrote:
>>> Kunwu Chan  writes:
 This part was commented from commit 6d492ecc6489
 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
 in about 11 years before.

 If there are no plans to enable this part code in the future,
 we can remove this dead code.
>>> 
>>> I agree the code can go. But I'd like it to be replaced with a comment
>>> explaining what the dead code was trying to say.
>
>> Thanks, i'll update a new patch with the following comment:
>>  /*
>>  * No CPU has hugepages but lacks no execute, so we
>>  * don't need to worry about cpu no CPU_FTR_COHERENT_ICACHE feature case
>>  */
>
> Maybe wait until we can get some input from Aneesh. I'm not sure the
> code/comment are really up to date.
>
> cheers

How about?

modified   arch/powerpc/mm/book3s64/hash_hugepage.c
@@ -58,17 +58,13 @@ int __hash_page_thp(unsigned long ea, unsigned long access, 
unsigned long vsid,
return 0;
 
rflags = htab_convert_pte_flags(new_pmd, flags);
+   /*
+* THPs are only supported on platforms that can do mixed page size
+* segments (MPSS) and all such platforms have coherent icache. Hence we
+* don't need to do lazy icache flush (hash_page_do_lazy_icache()) on
+* noexecute fault.
+*/
 
-#if 0
-   if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
-
-   /*
-* No CPU has hugepages but lacks no execute, so we
-* don't need to worry about that case
-*/
-   rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);
-   }
-#endif
/*
 * Find the slot index details for this ea, using base page size.
 */



Re: [kvm-unit-tests PATCH 04/32] powerpc: interrupt stack backtracing

2024-02-28 Thread Nicholas Piggin
On Wed Feb 28, 2024 at 9:46 PM AEST, Andrew Jones wrote:
> On Mon, Feb 26, 2024 at 08:11:50PM +1000, Nicholas Piggin wrote:
> > Add support for backtracing across interrupt stacks, and
> > add interrupt frame backtrace for unhandled interrupts.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  lib/powerpc/processor.c |  4 ++-
> >  lib/ppc64/asm/stack.h   |  3 +++
> >  lib/ppc64/stack.c   | 55 +
> >  powerpc/Makefile.ppc64  |  1 +
> >  powerpc/cstart64.S  |  7 --
> >  5 files changed, 67 insertions(+), 3 deletions(-)
> >  create mode 100644 lib/ppc64/stack.c
> > 
> > diff --git a/lib/powerpc/processor.c b/lib/powerpc/processor.c
> > index ad0d95666..114584024 100644
> > --- a/lib/powerpc/processor.c
> > +++ b/lib/powerpc/processor.c
> > @@ -51,7 +51,9 @@ void do_handle_exception(struct pt_regs *regs)
> > return;
> > }
> >  
> > -   printf("unhandled cpu exception %#lx at NIA:0x%016lx MSR:0x%016lx\n", 
> > regs->trap, regs->nip, regs->msr);
> > +   printf("Unhandled cpu exception %#lx at NIA:0x%016lx MSR:0x%016lx\n",
> > +   regs->trap, regs->nip, regs->msr);
> > +   dump_frame_stack((void *)regs->nip, (void *)regs->gpr[1]);
> > abort();
> >  }
> >  
> > diff --git a/lib/ppc64/asm/stack.h b/lib/ppc64/asm/stack.h
> > index 9734bbb8f..94fd1021c 100644
> > --- a/lib/ppc64/asm/stack.h
> > +++ b/lib/ppc64/asm/stack.h
> > @@ -5,4 +5,7 @@
> >  #error Do not directly include . Just use .
> >  #endif
> >  
> > +#define HAVE_ARCH_BACKTRACE
> > +#define HAVE_ARCH_BACKTRACE_FRAME
> > +
> >  #endif
> > diff --git a/lib/ppc64/stack.c b/lib/ppc64/stack.c
> > new file mode 100644
> > index 0..fcb7fa860
> > --- /dev/null
> > +++ b/lib/ppc64/stack.c
> > @@ -0,0 +1,55 @@
> > +#include 
> > +#include 
> > +#include 
> > +
> > +extern char exception_stack_marker[];
> > +
> > +int backtrace_frame(const void *frame, const void **return_addrs, int 
> > max_depth)
> > +{
> > +   static int walking;
> > +   int depth = 0;
> > +   const unsigned long *bp = (unsigned long *)frame;
> > +   void *return_addr;
> > +
> > +   asm volatile("" ::: "lr"); /* Force it to save LR */
> > +
> > +   if (walking) {
> > +   printf("RECURSIVE STACK WALK!!!\n");
> > +   return 0;
> > +   }
> > +   walking = 1;
> > +
> > +   bp = (unsigned long *)bp[0];
> > +   return_addr = (void *)bp[2];
> > +
> > +   for (depth = 0; bp && depth < max_depth; depth++) {
> > +   return_addrs[depth] = return_addr;
> > +   if (return_addrs[depth] == 0)
> > +   break;
> > +   if (return_addrs[depth] == exception_stack_marker) {
> > +   struct pt_regs *regs;
> > +
> > +   regs = (void *)bp + STACK_FRAME_OVERHEAD;
> > +   bp = (unsigned long *)bp[0];
> > +   /* Represent interrupt frame with vector number */
> > +   return_addr = (void *)regs->trap;
> > +   if (depth + 1 < max_depth) {
> > +   depth++;
> > +   return_addrs[depth] = return_addr;
> > +   return_addr = (void *)regs->nip;
> > +   }
> > +   } else {
> > +   bp = (unsigned long *)bp[0];
> > +   return_addr = (void *)bp[2];
> > +   }
> > +   }
> > +
> > +   walking = 0;
> > +   return depth;
> > +}
> > +
> > +int backtrace(const void **return_addrs, int max_depth)
> > +{
> > +   return backtrace_frame(__builtin_frame_address(0), return_addrs,
> > +  max_depth);
> > +}
>
> I'm about to post a series which has a couple treewide tracing changes
> in them. Depending on which series goes first the other will need to
> accommodate.

Yeah that's fine.

Thanks,
Nick


Re: [kvm-unit-tests PATCH 04/13] treewide: lib/stack: Make base_address arch specific

2024-02-28 Thread Nicholas Piggin
On Thu Feb 29, 2024 at 1:04 AM AEST, Andrew Jones wrote:
> Calculating the offset of an address is image specific, which is
> architecture specific. Until now, all architectures and architecture
> configurations which select CONFIG_RELOC were able to subtract
> _etext, but the EFI configuration of riscv cannot (it must subtract
> ImageBase). Make this function architecture specific, since the
> architecture's image layout already is.

arch_base_address()?

How about a default implementation unlesss HAVE_ARCH_BASE_ADDRESS?

Thanks,
Nick

>
> Signed-off-by: Andrew Jones 
> ---
>  lib/arm64/stack.c | 17 +
>  lib/riscv/stack.c | 18 ++
>  lib/stack.c   | 19 ++-
>  lib/stack.h   |  2 ++
>  lib/x86/stack.c   | 17 +
>  5 files changed, 56 insertions(+), 17 deletions(-)
>
> diff --git a/lib/arm64/stack.c b/lib/arm64/stack.c
> index f5eb57fd8892..3369031a74f7 100644
> --- a/lib/arm64/stack.c
> +++ b/lib/arm64/stack.c
> @@ -6,6 +6,23 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_RELOC
> +extern char _text, _etext;
> +
> +bool base_address(const void *rebased_addr, unsigned long *addr)
> +{
> + unsigned long ra = (unsigned long)rebased_addr;
> + unsigned long start = (unsigned long)&_text;
> + unsigned long end = (unsigned long)&_etext;
> +
> + if (ra < start || ra >= end)
> + return false;
> +
> + *addr = ra - start;
> + return true;
> +}
> +#endif
> +
>  extern char vector_stub_start, vector_stub_end;
>  
>  int arch_backtrace_frame(const void *frame, const void **return_addrs,
> diff --git a/lib/riscv/stack.c b/lib/riscv/stack.c
> index d865594b9671..a143c22a570a 100644
> --- a/lib/riscv/stack.c
> +++ b/lib/riscv/stack.c
> @@ -2,6 +2,24 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_RELOC
> +extern char ImageBase, _text, _etext;
> +
> +bool base_address(const void *rebased_addr, unsigned long *addr)
> +{
> + unsigned long ra = (unsigned long)rebased_addr;
> + unsigned long base = (unsigned long)
> + unsigned long start = (unsigned long)&_text;
> + unsigned long end = (unsigned long)&_etext;
> +
> + if (ra < start || ra >= end)
> + return false;
> +
> + *addr = ra - base;
> + return true;
> +}
> +#endif
> +
>  int arch_backtrace_frame(const void *frame, const void **return_addrs,
>int max_depth, bool current_frame)
>  {
> diff --git a/lib/stack.c b/lib/stack.c
> index dd6bfa8dac6e..e5099e207388 100644
> --- a/lib/stack.c
> +++ b/lib/stack.c
> @@ -11,23 +11,8 @@
>  
>  #define MAX_DEPTH 20
>  
> -#ifdef CONFIG_RELOC
> -extern char _text, _etext;
> -
> -static bool base_address(const void *rebased_addr, unsigned long *addr)
> -{
> - unsigned long ra = (unsigned long)rebased_addr;
> - unsigned long start = (unsigned long)&_text;
> - unsigned long end = (unsigned long)&_etext;
> -
> - if (ra < start || ra >= end)
> - return false;
> -
> - *addr = ra - start;
> - return true;
> -}
> -#else
> -static bool base_address(const void *rebased_addr, unsigned long *addr)
> +#ifndef CONFIG_RELOC
> +bool base_address(const void *rebased_addr, unsigned long *addr)
>  {
>   *addr = (unsigned long)rebased_addr;
>   return true;
> diff --git a/lib/stack.h b/lib/stack.h
> index 6edc84344b51..f8def4ad4d49 100644
> --- a/lib/stack.h
> +++ b/lib/stack.h
> @@ -10,6 +10,8 @@
>  #include 
>  #include 
>  
> +bool base_address(const void *rebased_addr, unsigned long *addr);
> +
>  #ifdef HAVE_ARCH_BACKTRACE_FRAME
>  extern int arch_backtrace_frame(const void *frame, const void **return_addrs,
>   int max_depth, bool current_frame);
> diff --git a/lib/x86/stack.c b/lib/x86/stack.c
> index 58ab6c4b293a..7ba73becbd69 100644
> --- a/lib/x86/stack.c
> +++ b/lib/x86/stack.c
> @@ -1,6 +1,23 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_RELOC
> +extern char _text, _etext;
> +
> +bool base_address(const void *rebased_addr, unsigned long *addr)
> +{
> + unsigned long ra = (unsigned long)rebased_addr;
> + unsigned long start = (unsigned long)&_text;
> + unsigned long end = (unsigned long)&_etext;
> +
> + if (ra < start || ra >= end)
> + return false;
> +
> + *addr = ra - start;
> + return true;
> +}
> +#endif
> +
>  int arch_backtrace_frame(const void *frame, const void **return_addrs,
>int max_depth, bool current_frame)
>  {



Re: [RFC] sched/eevdf: sched feature to dismiss lag on wakeup

2024-02-28 Thread K Prateek Nayak
(+ Xuewen Yan, Ke Wang)

Hello Tobias,

On 2/28/2024 9:40 PM, Tobias Huschle wrote:
> The previously used CFS scheduler gave tasks that were woken up an
> enhanced chance to see runtime immediately by deducting a certain value
> from its vruntime on runqueue placement during wakeup.
> 
> This property was used by some, at least vhost, to ensure, that certain
> kworkers are scheduled immediately after being woken up. The EEVDF
> scheduler, does not support this so far. Instead, if such a woken up
> entitiy carries a negative lag from its previous execution, it will have
> to wait for the current time slice to finish, which affects the
> performance of the process expecting the immediate execution negatively.
> 
> To address this issue, implement EEVDF strategy #2 for rejoining
> entities, which dismisses the lag from previous execution and allows
> the woken up task to run immediately (if no other entities are deemed
> to be preferred for scheduling by EEVDF).
> 
> The vruntime is decremented by an additional value of 1 to make sure,
> that the woken up tasks gets to actually run. This is of course not
> following strategy #2 in an exact manner but guarantees the expected
> behavior for the scenario described above. Without the additional
> decrement, the performance goes south even more. So there are some
> side effects I could not get my head around yet.
> 
> Questions:
> 1. The kworker getting its negative lag occurs in the following scenario
>- kworker and a cgroup are supposed to execute on the same CPU
>- one task within the cgroup is executing and wakes up the kworker
>- kworker with 0 lag, gets picked immediately and finishes its
>  execution within ~5000ns
>- on dequeue, kworker gets assigned a negative lag
>Is this expected behavior? With this short execution time, I would
>expect the kworker to be fine.
>For a more detailed discussion on this symptom, please see:
>https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/

Does the lag clamping path from Xuewen Yan [1] work for the vhost case
mentioned in the thread? Instead of placing the task just behind the
0-lag point, clamping the lag seems to be more principled approach since
EEVDF already does it in update_entity_lag().

If the lag is still too large, maybe the above coupled with Peter's
delayed dequeue patch can help [2] (Note: tree is prone to force
updates)

[1] https://lore.kernel.org/lkml/20240130080643.1828-1-xuewen@unisoc.com/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf=e62ef63a888c97188a977daddb72b61548da8417

> 2. The proposed code change of course only addresses the symptom. Am I
>assuming correctly that this is in general the exepected behavior and
>that the task waking up the kworker should rather do an explicit
>reschedule of itself to grant the kworker time to execute?
>In the vhost case, this is currently attempted through a cond_resched
>which is not doing anything because the need_resched flag is not set.
> 
> Feedback and opinions would be highly appreciated.
> 
> Signed-off-by: Tobias Huschle 
> ---
>  kernel/sched/fair.c | 5 +
>  kernel/sched/features.h | 1 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 533547e3c90a..c20ae6d62961 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5239,6 +5239,11 @@ place_entity(struct cfs_rq *cfs_rq, struct 
> sched_entity *se, int flags)
>   lag = div_s64(lag, load);
>   }
>  
> + if (sched_feat(NOLAG_WAKEUP) && (flags & ENQUEUE_WAKEUP)) {
> + se->vlag = 0;
> + lag = 1;
> + }
> +
>   se->vruntime = vruntime - lag;
>  
>   /*
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 143f55df890b..d3118e7568b4 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -7,6 +7,7 @@
>  SCHED_FEAT(PLACE_LAG, true)
>  SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
>  SCHED_FEAT(RUN_TO_PARITY, true)
> +SCHED_FEAT(NOLAG_WAKEUP, true)
>  
>  /*
>   * Prefer to schedule the task we woke last (assuming it failed

--
Thanks and Regards,
Prateek


Re: [kvm-unit-tests PATCH 03/13] treewide: lib/stack: Fix backtrace

2024-02-28 Thread Nicholas Piggin
On Thu Feb 29, 2024 at 1:04 AM AEST, Andrew Jones wrote:
> We should never pass the result of __builtin_frame_address(0) to
> another function since the compiler is within its rights to pop the
> frame to which it points before making the function call, as may be
> done for tail calls. Nobody has complained about backtrace(), so
> likely all compilations have been inlining backtrace_frame(), not
> dropping the frame on the tail call, or nobody is looking at traces.
> However, for riscv, when built for EFI, it does drop the frame on the
> tail call, and it was noticed. Preemptively fix backtrace() for all
> architectures.
>
> Fixes: 52266791750d ("lib: backtrace printing")
> Signed-off-by: Andrew Jones 
> ---
>  lib/arm/stack.c   | 13 +
>  lib/arm64/stack.c | 12 +---
>  lib/riscv/stack.c | 12 +---
>  lib/s390x/stack.c | 12 +---
>  lib/stack.h   | 24 +---
>  lib/x86/stack.c   | 12 +---
>  6 files changed, 42 insertions(+), 43 deletions(-)
>
> diff --git a/lib/arm/stack.c b/lib/arm/stack.c
> index 7d081be7c6d0..66d18b47ea53 100644
> --- a/lib/arm/stack.c
> +++ b/lib/arm/stack.c
> @@ -8,13 +8,16 @@
>  #include 
>  #include 
>  
> -int backtrace_frame(const void *frame, const void **return_addrs,
> - int max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   static int walking;
>   int depth;
>   const unsigned long *fp = (unsigned long *)frame;
>  
> + if (current_frame)
> + fp = __builtin_frame_address(0);
> +
>   if (walking) {
>   printf("RECURSIVE STACK WALK!!!\n");
>   return 0;
> @@ -33,9 +36,3 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs,
>   walking = 0;
>   return depth;
>  }
> -
> -int backtrace(const void **return_addrs, int max_depth)
> -{
> - return backtrace_frame(__builtin_frame_address(0),
> -return_addrs, max_depth);
> -}
> diff --git a/lib/arm64/stack.c b/lib/arm64/stack.c
> index 82611f4b1815..f5eb57fd8892 100644
> --- a/lib/arm64/stack.c
> +++ b/lib/arm64/stack.c
> @@ -8,7 +8,8 @@
>  
>  extern char vector_stub_start, vector_stub_end;
>  
> -int backtrace_frame(const void *frame, const void **return_addrs, int 
> max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   const void *fp = frame;
>   static bool walking;
> @@ -17,6 +18,9 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs, int max_depth)
>   bool is_exception = false;
>   unsigned long addr;
>  
> + if (current_frame)
> + fp = __builtin_frame_address(0);
> +
>   if (walking) {
>   printf("RECURSIVE STACK WALK!!!\n");
>   return 0;
> @@ -54,9 +58,3 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs, int max_depth)
>   walking = false;
>   return depth;
>  }
> -
> -int backtrace(const void **return_addrs, int max_depth)
> -{
> - return backtrace_frame(__builtin_frame_address(0),
> -return_addrs, max_depth);
> -}
> diff --git a/lib/riscv/stack.c b/lib/riscv/stack.c
> index 712a5478d547..d865594b9671 100644
> --- a/lib/riscv/stack.c
> +++ b/lib/riscv/stack.c
> @@ -2,12 +2,16 @@
>  #include 
>  #include 
>  
> -int backtrace_frame(const void *frame, const void **return_addrs, int 
> max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   static bool walking;
>   const unsigned long *fp = (unsigned long *)frame;
>   int depth;
>  
> + if (current_frame)
> + fp = __builtin_frame_address(0);
> +
>   if (walking) {
>   printf("RECURSIVE STACK WALK!!!\n");
>   return 0;
> @@ -24,9 +28,3 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs, int max_depth)
>   walking = false;
>   return depth;
>  }
> -
> -int backtrace(const void **return_addrs, int max_depth)
> -{
> - return backtrace_frame(__builtin_frame_address(0),
> -return_addrs, max_depth);
> -}
> diff --git a/lib/s390x/stack.c b/lib/s390x/stack.c
> index 9f234a12adf6..d194f654e94d 100644
> --- a/lib/s390x/stack.c
> +++ b/lib/s390x/stack.c
> @@ -14,11 +14,15 @@
>  #include 
>  #include 
>  
> -int backtrace_frame(const void *frame, const void **return_addrs, int 
> max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   int depth = 0;
>   struct stack_frame *stack = (struct stack_frame *)frame;
>  
> + if (current_frame)
> + stack = __builtin_frame_address(0);
> +
>   for (depth = 0; stack && depth < max_depth; 

Re: [netdev] Build failure on powerpc

2024-02-28 Thread Michael Ellerman
Tasmiya Nalatwad  writes:
> Greetings,
>
> [netdev] Build failure on powerpc
> latest netdev 6.8.0-rc5-auto-g1ce7d306ea63 fails to build on powerpc 
> below traces

Please include the defconfig you're building, and the toolchain
versions, in reports like this.

I wasn't able to reproduce this failure here.

cheers

> --- Traces---
>
> /include/linux/dpll.h: In function ‘netdev_dpll_pin’:
> /include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
> incomplete type ‘struct dpll_pin’
>    typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
>   ^
> /include/linux/rcupdate.h:587:2: note: in expansion of macro 
> ‘__rcu_dereference_check’
>    __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
>    ^~~
> /include/linux/rtnetlink.h:70:2: note: in expansion of macro 
> ‘rcu_dereference_check’
>    rcu_dereference_check(p, lockdep_rtnl_is_held())
>    ^
> /include/linux/dpll.h:175:9: note: in expansion of macro 
> ‘rcu_dereference_rtnl’
>    return rcu_dereference_rtnl(dev->dpll_pin);
>   ^~~~
> make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_core.o] Error 1
> make[4]: *** Waiting for unfinished jobs
>    AR  net/mpls/built-in.a
>    AR  net/l3mdev/built-in.a
> In file included from ./include/linux/rbtree.h:24,
>   from ./include/linux/mm_types.h:11,
>   from ./include/linux/mmzone.h:22,
>   from ./include/linux/gfp.h:7,
>   from ./include/linux/umh.h:4,
>   from ./include/linux/kmod.h:9,
>   from ./include/linux/module.h:17,
>   from drivers/dpll/dpll_netlink.c:9:
> /include/linux/dpll.h: In function ‘netdev_dpll_pin’:
> /include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
> incomplete type ‘struct dpll_pin’
>    typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
>   ^
> /include/linux/rcupdate.h:587:2: note: in expansion of macro 
> ‘__rcu_dereference_check’
>    __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
>    ^~~
> /include/linux/rtnetlink.h:70:2: note: in expansion of macro 
> ‘rcu_dereference_check’
>    rcu_dereference_check(p, lockdep_rtnl_is_held())
>    ^
> /include/linux/dpll.h:175:9: note: in expansion of macro 
> ‘rcu_dereference_rtnl’
>    return rcu_dereference_rtnl(dev->dpll_pin);
>   ^~~~
> make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_netlink.o] 
> Error 1
> make[3]: *** [scripts/Makefile.build:481: drivers/dpll] Error 2
> make[3]: *** Waiting for unfinished jobs
> In file included from ./arch/powerpc/include/generated/asm/rwonce.h:1,
>   from ./include/linux/compiler.h:251,
>   from ./include/linux/instrumented.h:10,
>   from ./include/linux/uaccess.h:6,
>   from net/core/dev.c:71:
> net/core/dev.c: In function ‘netdev_dpll_pin_assign’:
> /include/linux/rcupdate.h:462:36: error: dereferencing pointer to 
> incomplete type ‘struct dpll_pin’
>   #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
>      ^~~~
> /include/asm-generic/rwonce.h:55:33: note: in definition of macro 
> ‘__WRITE_ONCE’
>    *(volatile typeof(x) *)&(x) = (val);    \
>   ^~~
> /arch/powerpc/include/asm/barrier.h:76:2: note: in expansion of macro 
> ‘WRITE_ONCE’
>    WRITE_ONCE(*p, v);  \
>    ^~
> /include/asm-generic/barrier.h:172:55: note: in expansion of macro 
> ‘__smp_store_release’
>   #define smp_store_release(p, v) do { kcsan_release(); 
> __smp_store_release(p, v); } while (0)
> ^~~
> /include/linux/rcupdate.h:503:3: note: in expansion of macro 
> ‘smp_store_release’
>     smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
>     ^
> /include/linux/rcupdate.h:503:25: note: in expansion of macro 
> ‘RCU_INITIALIZER’
>     smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
>   ^~~
> net/core/dev.c:9081:2: note: in expansion of macro ‘rcu_assign_pointer’
>    rcu_assign_pointer(dev->dpll_pin, dpll_pin);
>    ^~
> make[4]: *** [scripts/Makefile.build:243: net/core/dev.o] Error 1
> make[4]: *** Waiting for unfinished jobs
>    AR  drivers/net/ethernet/built-in.a
>    AR  drivers/net/built-in.a
>    AR  net/dcb/built-in.a
>    AR  net/netlabel/built-in.a
>    AR  net/strparser/built-in.a
>    AR  net/handshake/built-in.a
>    GEN lib/test_fortify.log
>    AR  net/8021q/built-in.a
>    AR  net/nsh/built-in.a
>    AR  net/unix/built-in.a
>    CC  lib/string.o
>    AR  net/packet/built-in.a
>    AR  net/switchdev/built-in.a
>    AR  lib/lib.a
>    AR  net/mptcp/built-in.a
>    AR  net/devlink/built-in.a
> In file included from ./include/linux/rbtree.h:24,
>   from 

[powerpc:merge] BUILD SUCCESS 91ff6ca1e126332196bb331387d97b0a02aef93f

2024-02-28 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 91ff6ca1e126332196bb331387d97b0a02aef93f  Automatic merge of 
'next' into merge (2024-02-26 17:03)

elapsed time: 773m

configs tested: 163
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc  axs103_smp_defconfig   gcc  
arc defconfig   gcc  
arc haps_hs_smp_defconfig   gcc  
arcnsimosci_defconfig   gcc  
arc   randconfig-001-20240229   gcc  
arc   randconfig-002-20240229   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm defconfig   clang
armdove_defconfig   gcc  
arm pxa_defconfig   gcc  
arm   randconfig-001-20240229   gcc  
arm   randconfig-002-20240229   gcc  
arm   randconfig-004-20240229   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-002-20240229   gcc  
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240229   gcc  
csky  randconfig-002-20240229   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-001-20240229   clang
i386 buildonly-randconfig-002-20240229   gcc  
i386 buildonly-randconfig-003-20240229   gcc  
i386 buildonly-randconfig-004-20240229   clang
i386 buildonly-randconfig-005-20240229   gcc  
i386 buildonly-randconfig-006-20240229   gcc  
i386defconfig   clang
i386  randconfig-001-20240229   clang
i386  randconfig-002-20240229   gcc  
i386  randconfig-003-20240229   clang
i386  randconfig-004-20240229   gcc  
i386  randconfig-005-20240229   gcc  
i386  randconfig-006-20240229   gcc  
i386  randconfig-011-20240229   gcc  
i386  randconfig-012-20240229   gcc  
i386  randconfig-013-20240229   clang
i386  randconfig-014-20240229   clang
i386  randconfig-015-20240229   clang
i386  randconfig-016-20240229   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240229   gcc  
loongarch randconfig-002-20240229   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k   virt_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
microblaze  mmu_defconfig   gcc  
mips allmodconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
mips  cavium_octeon_defconfig   gcc  
mipsmalta_qemu_32r6_defconfig   gcc  
mips   rbtx49xx_defconfig   gcc  
mips   rs90_defconfig   gcc  
nios2alldefconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 

[powerpc:next-test] BUILD SUCCESS cb615bbe55268cc25a181e903862423670f97408

2024-02-28 Thread kernel test robot
tree/branch: https://github.com/linuxppc/linux next-test
branch HEAD: cb615bbe55268cc25a181e903862423670f97408  powerpc/32: fix ADB_CUDA 
kconfig warning

elapsed time: 725m

configs tested: 182
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc  axs103_smp_defconfig   gcc  
arc defconfig   gcc  
arcnsimosci_defconfig   gcc  
arc   randconfig-001-20240229   gcc  
arc   randconfig-002-20240229   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm defconfig   clang
armdove_defconfig   gcc  
arm   randconfig-001-20240229   gcc  
arm   randconfig-002-20240229   gcc  
arm   randconfig-003-20240229   clang
arm   randconfig-004-20240229   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20240229   clang
arm64 randconfig-002-20240229   gcc  
arm64 randconfig-003-20240229   clang
arm64 randconfig-004-20240229   clang
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240229   gcc  
csky  randconfig-002-20240229   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
hexagon   randconfig-001-20240229   clang
hexagon   randconfig-002-20240229   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-001-20240228   clang
i386 buildonly-randconfig-001-20240229   clang
i386 buildonly-randconfig-002-20240228   clang
i386 buildonly-randconfig-003-20240228   clang
i386 buildonly-randconfig-004-20240228   clang
i386 buildonly-randconfig-004-20240229   clang
i386 buildonly-randconfig-005-20240228   gcc  
i386 buildonly-randconfig-006-20240228   clang
i386defconfig   clang
i386  randconfig-001-20240228   clang
i386  randconfig-001-20240229   clang
i386  randconfig-002-20240228   clang
i386  randconfig-003-20240228   gcc  
i386  randconfig-003-20240229   clang
i386  randconfig-004-20240228   clang
i386  randconfig-005-20240228   clang
i386  randconfig-006-20240228   gcc  
i386  randconfig-011-20240228   clang
i386  randconfig-012-20240228   clang
i386  randconfig-013-20240228   gcc  
i386  randconfig-013-20240229   clang
i386  randconfig-014-20240228   gcc  
i386  randconfig-014-20240229   clang
i386  randconfig-015-20240228   gcc  
i386  randconfig-015-20240229   clang
i386  randconfig-016-20240228   clang
i386  randconfig-016-20240229   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240229   gcc  
loongarch randconfig-002-20240229   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k   virt_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
mips  cavium_octeon_defconfig   gcc  
mips

Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

2024-02-28 Thread Erhard Furtner
On Thu, 14 Sep 2023 04:54:17 +
Christophe Leroy  wrote:

> Le 12/09/2023 à 19:39, Christophe Leroy a écrit :
> > 
> > 
> > Le 12/09/2023 à 17:59, Erhard Furtner a écrit :  
> >>
> >> printk: bootconsole [udbg0] enabled
> >> Total memory = 2048MB; using 4096kB for hash table
> >> mapin_ram:125
> >> mmu_mapin_ram:169 0 3000 140 200
> >> __mmu_mapin_ram:146 0 140
> >> __mmu_mapin_ram:155 140
> >> __mmu_mapin_ram:146 140 3000
> >> __mmu_mapin_ram:155 2000
> >> __mapin_ram_chunk:107 2000 3000
> >> __mapin_ram_chunk:117
> >> mapin_ram:134
> >> kasan_mmu_init:129
> >> kasan_mmu_init:132 0
> >> kasan_mmu_init:137
> >> ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() 
> >> instead
> >> Linux version 6.6.0-rc1-PMacG4-dirty (root@T1000) (gcc (Gentoo 
> >> 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 
> >> SMP Tue Sep 12 16:50:47 CEST 2023
> >> kasan_init_region: c000 3000 f800 fe00
> >> kasan_init_region: loop f800 fe00
> >>
> >>
> >> So I get no "kasan_init_region: setbat" line and don't reach "KASAN init 
> >> done".  
> > 
> > Ah ok, maybe your CPU only has 4 BATs and they are all used, following
> > change would tell us.
> > 
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 850783cfa9c7..bd26767edce7 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -86,6 +86,7 @@ int __init find_free_bat(void)
> > if (!(bat[1].batu & 3))
> > return b;
> > }
> > +   pr_err("NO FREE BAT (%d)\n", n);
> > return -1;
> >}
> > 
> > 
> > Or you have 8 BATs in which case it's an alignment problem, you need to
> > increase CONFIG_DATA_SHIFT to 23, for that you need CONFIG_ADVANCED and
> > CONFIG_DATA_SHIFT_BOOL
> > 
> > But regardless of that there is a problem we need to find out, because
> > it should work without BATs.
> > 
> > As the BATs allocation fails, it falls back to :
> > 
> > phys = memblock_phys_alloc_range(k_end - k_start, PAGE_SIZE, 0,
> >  MEMBLOCK_ALLOC_ANYWHERE);
> > if (!phys)
> > return -ENOMEM;
> > }
> > 
> > ret = kasan_init_shadow_page_tables(k_start, k_end);
> > if (ret)
> > return ret;
> > 
> > for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
> > pmd_t *pmd = pmd_off_k(k_cur);
> > pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_start), 
> > PAGE_KERNEL);
> > 
> > __set_pte_at(_mm, k_cur, pte_offset_kernel(pmd, k_cur), 
> > pte, 0);
> > }
> > flush_tlb_kernel_range(k_start, k_end);
> > memset(kasan_mem_to_shadow(start), 0, k_end - k_start);
> > 
> > 
> > While the __weak function that you confirmed working is:
> > 
> > ret = kasan_init_shadow_page_tables(k_start, k_end);
> > if (ret)
> > return ret;
> > 
> > block = memblock_alloc(k_end - k_start, PAGE_SIZE);
> > if (!block)
> > return -ENOMEM;
> > 
> > for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
> > pmd_t *pmd = pmd_off_k(k_cur);
> > void *va = block + k_cur - k_start;
> > pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
> > 
> > __set_pte_at(_mm, k_cur, pte_offset_kernel(pmd, k_cur), 
> > pte, 0);
> > }
> > flush_tlb_kernel_range(k_start, k_end);
> > 
> > 
> > I'm having hard time to understand what's could be wrong at the first place.
> > 
> > Could you try following change:
> > 
> > diff --git a/arch/powerpc/mm/kasan/book3s_32.c
> > b/arch/powerpc/mm/kasan/book3s_32.c
> > index 9954b7a3b7ae..e04f21908c6a 100644
> > --- a/arch/powerpc/mm/kasan/book3s_32.c
> > +++ b/arch/powerpc/mm/kasan/book3s_32.c
> > @@ -38,7 +38,7 @@ int __init kasan_init_region(void *start, size_t size)
> > 
> > if (k_nobat < k_end) {
> > phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0,
> > -MEMBLOCK_ALLOC_ANYWHERE);
> > +MEMBLOCK_ALLOC_ACCESSIBLE);
> > if (!phys)
> > return -ENOMEM;
> > }
> > 
> > And also that one:
> > 
> > 
> > diff --git a/arch/powerpc/mm/kasan/init_32.c
> > b/arch/powerpc/mm/kasan/init_32.c
> > index a70828a6d935..bc1c075489f4 100644
> > --- a/arch/powerpc/mm/kasan/init_32.c
> > +++ b/arch/powerpc/mm/kasan/init_32.c
> > @@ -84,6 +84,9 @@ kasan_update_early_region(unsigned long k_start,
> > unsigned long k_end, pte_t pte)
> >{
> > unsigned long k_cur;
> > 
> > +   if (k_start == k_end)
> > +   return;
> > +
> > for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) {
> > pmd_t *pmd = pmd_off_k(k_cur);
> > pte_t *ptep = pte_offset_kernel(pmd, k_cur);
> > 
> > 
> >   
> 
> I tested the two vmlinux you sent me offlist, they both start 

linux-next: manual merge of the powerpc tree with the mm-stable tree

2024-02-28 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the powerpc tree got a conflict in:

  arch/powerpc/mm/pgtable_32.c

between commit:

  a5e8131a0329 ("arm64, powerpc, riscv, s390, x86: ptdump: refactor 
CONFIG_DEBUG_WX")

from the mm-stable tree and commit:

  8f17bd2f4196 ("powerpc: Handle error in mark_rodata_ro() and 
mark_initmem_nx()")

from the powerpc tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/mm/pgtable_32.c
index 12498017da8e,4be97b4a44f9..
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@@ -164,21 -174,17 +174,14 @@@ static int __mark_rodata_ro(void
numpages = PFN_UP((unsigned long)__end_rodata) -
   PFN_DOWN((unsigned long)_stext);
  
-   set_memory_ro((unsigned long)_stext, numpages);
+   return set_memory_ro((unsigned long)_stext, numpages);
+ }
+ 
+ void mark_rodata_ro(void)
+ {
+   int err = __mark_rodata_ro();
+ 
+   if (err)
+   panic("%s() failed, err = %d\n", __func__, err);
 -
 -  // mark_initmem_nx() should have already run by now
 -  ptdump_check_wx();
  }
  #endif
- 
- #if defined(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && 
defined(CONFIG_DEBUG_PAGEALLOC)
- void __kernel_map_pages(struct page *page, int numpages, int enable)
- {
-   unsigned long addr = (unsigned long)page_address(page);
- 
-   if (PageHighMem(page))
-   return;
- 
-   if (enable)
-   set_memory_p(addr, numpages);
-   else
-   set_memory_np(addr, numpages);
- }
- #endif /* CONFIG_DEBUG_PAGEALLOC */


pgpHYuNol7Vdi.pgp
Description: OpenPGP digital signature


Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Christophe Leroy


Le 28/02/2024 à 18:01, Edgecombe, Rick P a écrit :
> On Wed, 2024-02-28 at 13:22 +, Christophe Leroy wrote:
>>> Any preference? Or maybe am I missing your point and talking
>>> nonsense?
>>>
>>
>> So my preference would go to the addition of:
>>
>>  info.new_field = 0;
>>
>> But that's very minor and if you think it is easier to manage and
>> maintain by performing {} initialisation at declaration, lets go for
>> that.
> 
> Appreciate the clarification and help getting this right. I'm thinking
> Kees' and now Kirill's point about this patch resulting in unnecessary
> manual zero initialization of the structs is probably something that
> needs to be addressed.
> 
> If I created a bunch of patches to change each call site, I think the
> the best is probably to do the designated field zero initialization
> way.
> 
> But I can do something for powerpc special if you want. I'll first try
> with powerpc matching the others, and if it seems objectionable, please
> let me know.
> 

My comments were generic, it was not powerpc oriented. Please keep 
powerpc as similar as possible with others.

Christophe


Re: [PATCHv11 4/4] watchdog/softlockup: report the most frequent interrupts

2024-02-28 Thread Doug Anderson
Hi,

On Tue, Feb 27, 2024 at 11:22 PM Bitao Hu  wrote:
>
> When the watchdog determines that the current soft lockup is due
> to an interrupt storm based on CPU utilization, reporting the
> most frequent interrupts could be good enough for further
> troubleshooting.
>
> Below is an example of interrupt storm. The call tree does not
> provide useful information, but we can analyze which interrupt
> caused the soft lockup by comparing the counts of interrupts.
>
> [  638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0]
> [  638.870825] CPU#9 Utilization every 4s during lockup:
> [  638.871194]  #1:   0% system,  0% softirq,   100% hardirq, 0% 
> idle
> [  638.871652]  #2:   0% system,  0% softirq,   100% hardirq, 0% 
> idle
> [  638.872107]  #3:   0% system,  0% softirq,   100% hardirq, 0% 
> idle
> [  638.872563]  #4:   0% system,  0% softirq,   100% hardirq, 0% 
> idle
> [  638.873018]  #5:   0% system,  0% softirq,   100% hardirq, 0% 
> idle
> [  638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
> [  638.873994]  #1: 330945  irq#7
> [  638.874236]  #2: 31  irq#82
> [  638.874493]  #3: 10  irq#10
> [  638.874744]  #4: 2   irq#89
> [  638.874992]  #5: 1   irq#102
> ...
> [  638.875313] Call trace:
> [  638.875315]  __do_softirq+0xa8/0x364
>
> Signed-off-by: Bitao Hu 
> Reviewed-by: Liu Song 
> ---
>  kernel/watchdog.c | 115 --
>  1 file changed, 111 insertions(+), 4 deletions(-)

Reviewed-by: Douglas Anderson 


Re: [PATCHv11 3/4] genirq: Avoid summation loops for /proc/interrupts

2024-02-28 Thread Doug Anderson
Hi,

On Tue, Feb 27, 2024 at 11:22 PM Bitao Hu  wrote:
>
> show_interrupts() unconditionally accumulates the per CPU interrupt
> statistics to determine whether an interrupt was ever raised.
>
> This can be avoided for all interrupts which are not strictly per CPU
> and not of type NMI because those interrupts provide already an
> accumulated counter. The required logic is already implemented in
> kstat_irqs().
>
> Split the inner access logic out of kstat_irqs() and use it for
> kstat_irqs() and show_interrupts() to avoid the accumulation loop
> when possible.
>
> Originally-by: Thomas Gleixner 
> Signed-off-by: Bitao Hu 
> Reviewed-by: Liu Song 
> ---
>  kernel/irq/internals.h |  2 ++
>  kernel/irq/irqdesc.c   | 16 +++-
>  kernel/irq/proc.c  |  6 ++
>  3 files changed, 15 insertions(+), 9 deletions(-)

Reviewed-by: Douglas Anderson 


Re: [PATCHv11 2/4] genirq: Provide a snapshot mechanism for interrupt statistics

2024-02-28 Thread Doug Anderson
Hi,

On Tue, Feb 27, 2024 at 11:22 PM Bitao Hu  wrote:
>
> The soft lockup detector lacks a mechanism to identify interrupt storms
> as root cause of a lockup. To enable this the detector needs a
> mechanism to snapshot the interrupt count statistics on a CPU when the
> detector observes a potential lockup scenario and compare that against
> the interrupt count when it warns about the lockup later on. The number
> of interrupts in that period give a hint whether the lockup might be
> caused by an interrupt storm.
>
> Instead of having extra storage in the lockup detector and accessing
> the internals of the interrupt descriptor directly, convert the per CPU
> irq_desc::kstat_irq member to a data structure which contains the
> counter plus a snapshot member and provide interfaces to take a
> snapshot of all interrupts on the current CPU and to retrieve the delta
> of a specific interrupt later on.
>
> Originally-by: Thomas Gleixner 
> Signed-off-by: Bitao Hu 
> Reviewed-by: Liu Song 
> ---
>  arch/mips/dec/setup.c|  2 +-
>  arch/parisc/kernel/smp.c |  2 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c |  2 +-
>  include/linux/irqdesc.h  | 14 ++--
>  include/linux/kernel_stat.h  |  3 +++
>  kernel/irq/internals.h   |  2 +-
>  kernel/irq/irqdesc.c | 34 ++--
>  kernel/irq/proc.c|  5 ++--
>  scripts/gdb/linux/interrupts.py  |  6 ++---
>  9 files changed, 51 insertions(+), 19 deletions(-)

I won't insist on it, but I continue to worry about memory
implications with large numbers of CPUs. With a 4-byte int, 8192 max
CPUs, and 100 IRQs the extra "ref" value takes up over 3MB of memory
(8192 * 4 bytes * 100).

Technically, you could add a new symbol like "config
NEED_IRQ_SNAPSHOTS". This wouldn't be a symbol selectable by the end
user but would automatically be selected by "config
SOFTLOCKUP_DETECTOR_INTR_STORM". If the config wasn't defined then the
struct wouldn't contain "ref" and the snapshot routines would just be
static inline stubs.

Maybe Thomas has an opinion about whether this is something to worry
about. Worst case it wouldn't be hard to do in a follow-up patch.

Everything else looks good to me. Given that I'm not insisting on
adding the extra CONFIG, I'm OK w/:

Reviewed-by: Douglas Anderson 


Re: Boot failure with ppc64 port on iMacs G5

2024-02-28 Thread tuxayo

On 24-02-20 10:16, John Paul Adrian Glaubitz wrote:

There seems to be a regression in the kernel which affects PowerPC 970 machines,
i.e. PowerMac G5 CPUs. The issue needs to be bisected and reported upstream.

If you have the time, I would really appreciate if you could test the various
snapshots and let me know which kernel is the first to not work. I expect that
the breakage occurred somewhere around kernel 6.3 or so.


No problem, I'm going there every week and can do as much testing as 
necessary even if it takes months. We got a donation of a standard PC so 
we have now one publicly usable PC at the library.

So I can take my time to try to make something out of the iMacs.

Results:
- 2023-06-18: ok
  - running strings on iso's vmlinuz yields 6.3.7-1 (2023-06-12)
- 2023-11-16: fail
  - 6.5.10-1 (2023-11-03)

As expected, the regression happened during the big gap between 
snapshots >_<


Anything more I can test?
(cross compiling old kernels is unfortunately too much over my head)

It's weird that it seems the regression is about booting the iso. 
Otherwise, net installs from old iso like what Claudia did wouldn't work 
after rebooting after install.


Cheers,

--
tuxayo



[Bug 207129] PowerMac G4 DP (5.6.2 debug kernel + inline KASAN) freezes shortly after booting with "do_IRQ: stack overflow: 1760"

2024-02-28 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=207129

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |CODE_FIX

--- Comment #11 from Erhard F. (erhar...@mailbox.org) ---
You were correct! I forgot about that...

I shrunk the size by using -Os and disabling some debugging stuff and changing
some statically built-in stuff to 'M' without sacrificing debugging
capabilities too much until it fit < 32 MiB:

KASAN_OUTLINE vs.
 # size vmlinux-6.8.0-rc6-PMacG4 
   textdata bss dec hex filename
12367737 6652440 426336 19446513 128baf1 vmlinux-6.8.0-rc6-PMacG4

KASAN_INLINE
 # size vmlinux-6.8.0-rc6-PMacG4 
   textdata bss dec hex filename
24660169 6652440  426336 31738945 1e44c41 vmlinux-6.8.0-rc6-PMacG4


Apart from that I can confirm inline KASAN runs fine now and I really no longer
get this stack overflow when using it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures

2024-02-28 Thread Stafford Horne
On Mon, Feb 26, 2024 at 05:14:13PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> Most architectures only support a single hardcoded page size. In order
> to ensure that each one of these sets the corresponding Kconfig symbols,
> change over the PAGE_SHIFT definition to the common one and allow
> only the hardware page size to be selected.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/alpha/Kconfig | 1 +
>  arch/alpha/include/asm/page.h  | 2 +-
>  arch/arm/Kconfig   | 1 +
>  arch/arm/include/asm/page.h| 2 +-
>  arch/csky/Kconfig  | 1 +
>  arch/csky/include/asm/page.h   | 2 +-
>  arch/m68k/Kconfig  | 3 +++
>  arch/m68k/Kconfig.cpu  | 2 ++
>  arch/m68k/include/asm/page.h   | 6 +-
>  arch/microblaze/Kconfig| 1 +
>  arch/microblaze/include/asm/page.h | 2 +-
>  arch/nios2/Kconfig | 1 +
>  arch/nios2/include/asm/page.h  | 2 +-
>  arch/openrisc/Kconfig  | 1 +
>  arch/openrisc/include/asm/page.h   | 2 +-
>  arch/riscv/Kconfig | 1 +
>  arch/riscv/include/asm/page.h  | 2 +-
>  arch/s390/Kconfig  | 1 +
>  arch/s390/include/asm/page.h   | 2 +-
>  arch/sparc/Kconfig | 2 ++
>  arch/sparc/include/asm/page_32.h   | 2 +-
>  arch/sparc/include/asm/page_64.h   | 3 +--
>  arch/um/Kconfig| 1 +
>  arch/um/include/asm/page.h | 2 +-
>  arch/x86/Kconfig   | 1 +
>  arch/x86/include/asm/page_types.h  | 2 +-
>  arch/xtensa/Kconfig| 1 +
>  arch/xtensa/include/asm/page.h | 2 +-
>  28 files changed, 32 insertions(+), 19 deletions(-)

> diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
> index fd9bb76a610b..3586cda55bde 100644
> --- a/arch/openrisc/Kconfig
> +++ b/arch/openrisc/Kconfig
> @@ -25,6 +25,7 @@ config OPENRISC
>   select GENERIC_CPU_DEVICES
>   select HAVE_PCI
>   select HAVE_UID16
> + select HAVE_PAGE_SIZE_8KB
>   select GENERIC_ATOMIC64
>   select GENERIC_CLOCKEVENTS_BROADCAST
>   select GENERIC_SMP_IDLE_THREAD
> diff --git a/arch/openrisc/include/asm/page.h 
> b/arch/openrisc/include/asm/page.h
> index 44fc1fd56717..7925ce09ab5a 100644
> --- a/arch/openrisc/include/asm/page.h
> +++ b/arch/openrisc/include/asm/page.h
> @@ -18,7 +18,7 @@
>  
>  /* PAGE_SHIFT determines the page size */
>  
> -#define PAGE_SHIFT  13
> +#define PAGE_SHIFT  CONFIG_PAGE_SHIFT
>  #ifdef __ASSEMBLY__
>  #define PAGE_SIZE   (1 << PAGE_SHIFT)
>  #else

For the openrisc bits,

Acked-by: Stafford Horne 


Re: [kvm-unit-tests PATCH 03/13] treewide: lib/stack: Fix backtrace

2024-02-28 Thread Claudio Imbrenda
On Wed, 28 Feb 2024 16:04:19 +0100
Andrew Jones  wrote:

> We should never pass the result of __builtin_frame_address(0) to
> another function since the compiler is within its rights to pop the
> frame to which it points before making the function call, as may be
> done for tail calls. Nobody has complained about backtrace(), so
> likely all compilations have been inlining backtrace_frame(), not
> dropping the frame on the tail call, or nobody is looking at traces.
> However, for riscv, when built for EFI, it does drop the frame on the
> tail call, and it was noticed. Preemptively fix backtrace() for all
> architectures.
> 
> Fixes: 52266791750d ("lib: backtrace printing")
> Signed-off-by: Andrew Jones 

Acked-by: Claudio Imbrenda 

> ---
>  lib/arm/stack.c   | 13 +
>  lib/arm64/stack.c | 12 +---
>  lib/riscv/stack.c | 12 +---
>  lib/s390x/stack.c | 12 +---
>  lib/stack.h   | 24 +---
>  lib/x86/stack.c   | 12 +---
>  6 files changed, 42 insertions(+), 43 deletions(-)
> 
> diff --git a/lib/arm/stack.c b/lib/arm/stack.c
> index 7d081be7c6d0..66d18b47ea53 100644
> --- a/lib/arm/stack.c
> +++ b/lib/arm/stack.c
> @@ -8,13 +8,16 @@
>  #include 
>  #include 
>  
> -int backtrace_frame(const void *frame, const void **return_addrs,
> - int max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   static int walking;
>   int depth;
>   const unsigned long *fp = (unsigned long *)frame;
>  
> + if (current_frame)
> + fp = __builtin_frame_address(0);
> +
>   if (walking) {
>   printf("RECURSIVE STACK WALK!!!\n");
>   return 0;
> @@ -33,9 +36,3 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs,
>   walking = 0;
>   return depth;
>  }
> -
> -int backtrace(const void **return_addrs, int max_depth)
> -{
> - return backtrace_frame(__builtin_frame_address(0),
> -return_addrs, max_depth);
> -}
> diff --git a/lib/arm64/stack.c b/lib/arm64/stack.c
> index 82611f4b1815..f5eb57fd8892 100644
> --- a/lib/arm64/stack.c
> +++ b/lib/arm64/stack.c
> @@ -8,7 +8,8 @@
>  
>  extern char vector_stub_start, vector_stub_end;
>  
> -int backtrace_frame(const void *frame, const void **return_addrs, int 
> max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   const void *fp = frame;
>   static bool walking;
> @@ -17,6 +18,9 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs, int max_depth)
>   bool is_exception = false;
>   unsigned long addr;
>  
> + if (current_frame)
> + fp = __builtin_frame_address(0);
> +
>   if (walking) {
>   printf("RECURSIVE STACK WALK!!!\n");
>   return 0;
> @@ -54,9 +58,3 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs, int max_depth)
>   walking = false;
>   return depth;
>  }
> -
> -int backtrace(const void **return_addrs, int max_depth)
> -{
> - return backtrace_frame(__builtin_frame_address(0),
> -return_addrs, max_depth);
> -}
> diff --git a/lib/riscv/stack.c b/lib/riscv/stack.c
> index 712a5478d547..d865594b9671 100644
> --- a/lib/riscv/stack.c
> +++ b/lib/riscv/stack.c
> @@ -2,12 +2,16 @@
>  #include 
>  #include 
>  
> -int backtrace_frame(const void *frame, const void **return_addrs, int 
> max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   static bool walking;
>   const unsigned long *fp = (unsigned long *)frame;
>   int depth;
>  
> + if (current_frame)
> + fp = __builtin_frame_address(0);
> +
>   if (walking) {
>   printf("RECURSIVE STACK WALK!!!\n");
>   return 0;
> @@ -24,9 +28,3 @@ int backtrace_frame(const void *frame, const void 
> **return_addrs, int max_depth)
>   walking = false;
>   return depth;
>  }
> -
> -int backtrace(const void **return_addrs, int max_depth)
> -{
> - return backtrace_frame(__builtin_frame_address(0),
> -return_addrs, max_depth);
> -}
> diff --git a/lib/s390x/stack.c b/lib/s390x/stack.c
> index 9f234a12adf6..d194f654e94d 100644
> --- a/lib/s390x/stack.c
> +++ b/lib/s390x/stack.c
> @@ -14,11 +14,15 @@
>  #include 
>  #include 
>  
> -int backtrace_frame(const void *frame, const void **return_addrs, int 
> max_depth)
> +int arch_backtrace_frame(const void *frame, const void **return_addrs,
> +  int max_depth, bool current_frame)
>  {
>   int depth = 0;
>   struct stack_frame *stack = (struct stack_frame *)frame;
>  
> + if (current_frame)
> + stack = __builtin_frame_address(0);
> +
>   for (depth = 0; 

Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Kees Cook
On Wed, Feb 28, 2024 at 01:22:09PM +, Christophe Leroy wrote:
> [...]
> My worry with initialisation at declaration is it often hides missing 
> assignments. Let's take following simple exemple:
> 
> char *colour(int num)
> {
>   char *name;
> 
>   if (num == 0) {
>   name = "black";
>   } else if (num == 1) {
>   name = "white";
>   } else if (num == 2) {
>   } else {
>   name = "no colour";
>   }
> 
>   return name;
> }
> 
> Here, GCC warns about a missing initialisation of variable 'name'.

Sometimes. :( We build with -Wno-maybe-uninitialized because GCC gets
this wrong too often. Also, like with large structs like this, all
uninit warnings get suppressed if anything takes it by reference. So, if
before your "return name" statement above, you had something like:

do_something();

it won't warn with any option enabled.

> But if I declare it as
> 
>   char *name = "no colour";
> 
> Then GCC won't warn anymore that we are missing a value for when num is 2.
> 
> During my life I have so many times spent huge amount of time 
> investigating issues and bugs due to missing assignments that were going 
> undetected due to default initialisation at declaration.

I totally understand. If the "uninitialized" warnings were actually
reliable, I would agree. I look at it this way:

- initializations can be missed either in static initializers or via
  run time initializers. (So the risk of mistake here is matched --
  though I'd argue it's easier to *find* static initializers when adding
  new struct members.)
- uninitialized warnings are inconsistent (this becomes an unknown risk)
- when a run time initializer is missed, the contents are whatever was
  on the stack (high risk)
- what a static initializer is missed, the content is 0 (low risk)

I think unambiguous state (always 0) is significantly more important for
the safety of the system as a whole. Yes, individual cases maybe bad
("what uid should this be? root?!") but from a general memory safety
perspective the value doesn't become potentially influenced by order of
operations, leftover stack memory, etc.

I'd agree, lifting everything into a static initializer does seem
cleanest of all the choices.

-Kees

-- 
Kees Cook


Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Edgecombe, Rick P
On Wed, 2024-02-28 at 13:22 +, Christophe Leroy wrote:
> > Any preference? Or maybe am I missing your point and talking
> > nonsense?
> > 
> 
> So my preference would go to the addition of:
> 
> info.new_field = 0;
> 
> But that's very minor and if you think it is easier to manage and 
> maintain by performing {} initialisation at declaration, lets go for
> that.

Appreciate the clarification and help getting this right. I'm thinking
Kees' and now Kirill's point about this patch resulting in unnecessary
manual zero initialization of the structs is probably something that
needs to be addressed.

If I created a bunch of patches to change each call site, I think the
the best is probably to do the designated field zero initialization
way.

But I can do something for powerpc special if you want. I'll first try
with powerpc matching the others, and if it seems objectionable, please
let me know.

Thanks,

Rick


[RFC] sched/eevdf: avoid task starvation in cgroups

2024-02-28 Thread Tobias Huschle
When running update_curr, it is checked whether the current task has
missed its deadline (update_deadline). If the deadline has been crossed,
the task is set to be rescheduled if there are other tasks available on
its cfs_rq.
This can cause task starvation in some cgroup configurations.

Assume the following scenario:

   [ ]  rq of CPU
|
   [ ]  cfs_rq1
  /   \
cfs_rq2 [ ]   othertask
 |
curr

In this case, update_curr is called for all levels of the hierarchy,
starting at the leaf.
Assume that curr is a kernel task which does not give up the CPU
voluntarily, i.e. loops indefinitely unless forced to give up the CPU.
Assume further that curr has actually reached its deadline, the expected
behavior would be, that the scheduler determines that curr would now
start to overconsume and therefore should set the need_resched flag to
nudge the current task to allow a reschedule in favor of othertask.

To reach the expected behavior, it is not sufficient to check whether
other tasks are queued in the cfs_rq, which curr is assigned to.

In the configuration shown above, each run queue sees:
rq_cpu:  2 tasks
cfs_rq1: 2 tasks
cfs_rq2: 1 task

This means that cfs_rq2 will never reschedule on its own as it sees
no other tasks that would be worth rescheduling for. Hence, the
call to reschedule relies on the upper levels in the hierarchy.
cfs_rq1 sees 1 additional task and could therefore take the desired
decision. But, cfs_rq1 takes the additional task into account when
computing its own deadline, which means, its deadline lies further in
the future. This causes that its deadline is not being reached.
Therefore cfs_rq1 does not even check for other potential tasks to be
ran.

It could now be assumed that cfs_rq1 will reach its deadline at some
point. But, after curr has consumed its timeslice, all sched_entities
in the cgroup tree get reweighted. This causes the deadlines of all
entities in the hierarchy to be shifted further into the future.
This means especially, that the deadline of cfs_rq1 also gets pushed
into the future, causing it to never reach its deadline.
The decision whether a reweight needs to be done depends on the weight
of the sched_entity and a recalculated value of the shares the
sched_entity shall expect (see calc_group_shares in fair.c). These
values are, in the described scenario, always different, hence, reweight
is done.

The combination of these two circumstances can lead to curr
running indefinitely unless interrupted by an external entity.

Address this issue by rather checking for the nr_running value of the
rq of the CPU itself, i.e. if there is any other task, somewhere on the
CPU runqueue, give it a chance to execute.

The scenario that made me discover this is being discussed here:
https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/

Questions:
1. Is this behavior by design? I couldn't find an explanation why we
   only check for the local cfs_rq. Do we only want to allow take-overs
   from the same cgroup hierarchy level in that case?
2. This problem is referred to as the "hog"-problem by Ankur Arora here:
   
https://lore.kernel.org/lkml/2024021304.1802415-24-ankur.a.ar...@oracle.com/
   Might there be a connection, it's the same piece of code in the end. I'd also
   prefer the boolean passing to avoid calling resched_curr too often.
3. Which are the criteria for which I should expect a reweight of the 
   cgroup hierarchy? It seems unintuitive too me, that the shares value
   keeps changing every time.

Feedback and opinions would be highly appreciated.

Signed-off-by: Tobias Huschle 
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 61c4ef20a2f8..e9733ef7964a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1015,7 +1015,7 @@ static void update_deadline(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
/*
 * The task has consumed its request, reschedule.
 */
-   if (cfs_rq->nr_running > 1) {
+   if (rq_of(cfs_rq)->nr_running > 1) {
resched_curr(rq_of(cfs_rq));
clear_buddies(cfs_rq, se);
}
-- 
2.34.1



[RFC] sched/eevdf: sched feature to dismiss lag on wakeup

2024-02-28 Thread Tobias Huschle
The previously used CFS scheduler gave tasks that were woken up an
enhanced chance to see runtime immediately by deducting a certain value
from its vruntime on runqueue placement during wakeup.

This property was used by some, at least vhost, to ensure, that certain
kworkers are scheduled immediately after being woken up. The EEVDF
scheduler, does not support this so far. Instead, if such a woken up
entitiy carries a negative lag from its previous execution, it will have
to wait for the current time slice to finish, which affects the
performance of the process expecting the immediate execution negatively.

To address this issue, implement EEVDF strategy #2 for rejoining
entities, which dismisses the lag from previous execution and allows
the woken up task to run immediately (if no other entities are deemed
to be preferred for scheduling by EEVDF).

The vruntime is decremented by an additional value of 1 to make sure,
that the woken up tasks gets to actually run. This is of course not
following strategy #2 in an exact manner but guarantees the expected
behavior for the scenario described above. Without the additional
decrement, the performance goes south even more. So there are some
side effects I could not get my head around yet.

Questions:
1. The kworker getting its negative lag occurs in the following scenario
   - kworker and a cgroup are supposed to execute on the same CPU
   - one task within the cgroup is executing and wakes up the kworker
   - kworker with 0 lag, gets picked immediately and finishes its
 execution within ~5000ns
   - on dequeue, kworker gets assigned a negative lag
   Is this expected behavior? With this short execution time, I would
   expect the kworker to be fine.
   For a more detailed discussion on this symptom, please see:
   https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/
2. The proposed code change of course only addresses the symptom. Am I
   assuming correctly that this is in general the exepected behavior and
   that the task waking up the kworker should rather do an explicit
   reschedule of itself to grant the kworker time to execute?
   In the vhost case, this is currently attempted through a cond_resched
   which is not doing anything because the need_resched flag is not set.

Feedback and opinions would be highly appreciated.

Signed-off-by: Tobias Huschle 
---
 kernel/sched/fair.c | 5 +
 kernel/sched/features.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 533547e3c90a..c20ae6d62961 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5239,6 +5239,11 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int flags)
lag = div_s64(lag, load);
}
 
+   if (sched_feat(NOLAG_WAKEUP) && (flags & ENQUEUE_WAKEUP)) {
+   se->vlag = 0;
+   lag = 1;
+   }
+
se->vruntime = vruntime - lag;
 
/*
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 143f55df890b..d3118e7568b4 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -7,6 +7,7 @@
 SCHED_FEAT(PLACE_LAG, true)
 SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
 SCHED_FEAT(RUN_TO_PARITY, true)
+SCHED_FEAT(NOLAG_WAKEUP, true)
 
 /*
  * Prefer to schedule the task we woke last (assuming it failed
-- 
2.34.1



Re: [revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure

2024-02-28 Thread Christophe Leroy


Le 28/02/2024 à 15:43, Eric Dumazet a écrit :
> On Wed, Feb 28, 2024 at 3:07 PM Vadim Fedorenko
>  wrote:
>>
>> On 28/02/2024 11:09, Tasmiya Nalatwad wrote:
>>> Greetings,
>>>
>>> [revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure
>>>
>>> Reverting below commit fixes the issue
>>>
>>> commit 0d60d8df6f493bb46bf5db40d39dd60a1bafdd4e
>>>   dpll: rely on rcu for netdev_dpll_pin()
>>>
>>> --- Traces ---
>>>
>>> ./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
>>> ./include/linux/rcupdate.h:439:9: error: dereferencing pointer to
>>> incomplete type ‘struct dpll_pin’
>>> typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
>>>^
>>> ./include/linux/rcupdate.h:587:2: note: in expansion of macro
>>> ‘__rcu_dereference_check’
>>> __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
>>> ^~~
>>> ./include/linux/rtnetlink.h:70:2: note: in expansion of macro
>>> ‘rcu_dereference_check’
>>> rcu_dereference_check(p, lockdep_rtnl_is_held())
>>> ^
>>> ./include/linux/dpll.h:175:9: note: in expansion of macro
>>> ‘rcu_dereference_rtnl’
>>> return rcu_dereference_rtnl(dev->dpll_pin);
>>>^~~~
>>> make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_core.o] Error 1
>>> make[4]: *** Waiting for unfinished jobs
>>> AR  net/mpls/built-in.a
>>> AR  net/l3mdev/built-in.a
>>> In file included from ./include/linux/rbtree.h:24,
>>>from ./include/linux/mm_types.h:11,
>>>from ./include/linux/mmzone.h:22,
>>>from ./include/linux/gfp.h:7,
>>>from ./include/linux/umh.h:4,
>>>from ./include/linux/kmod.h:9,
>>>from ./include/linux/module.h:17,
>>>from drivers/dpll/dpll_netlink.c:9:
>>> ./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
>>> ./include/linux/rcupdate.h:439:9: error: dereferencing pointer to
>>> incomplete type ‘struct dpll_pin’
>>> typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
>>>^
>>> ./include/linux/rcupdate.h:587:2: note: in expansion of macro
>>> ‘__rcu_dereference_check’
>>> __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
>>> ^~~
>>> ./include/linux/rtnetlink.h:70:2: note: in expansion of macro
>>> ‘rcu_dereference_check’
>>> rcu_dereference_check(p, lockdep_rtnl_is_held())
>>> ^
>>> ./include/linux/dpll.h:175:9: note: in expansion of macro
>>> ‘rcu_dereference_rtnl’
>>> return rcu_dereference_rtnl(dev->dpll_pin);
>>>^~~~
>>> make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_netlink.o]
>>> Error 1
>>> make[3]: *** [scripts/Makefile.build:481: drivers/dpll] Error 2
>>> make[3]: *** Waiting for unfinished jobs
>>> In file included from ./arch/powerpc/include/generated/asm/rwonce.h:1,
>>>from ./include/linux/compiler.h:251,
>>>from ./include/linux/instrumented.h:10,
>>>from ./include/linux/uaccess.h:6,
>>>from net/core/dev.c:71:
>>> net/core/dev.c: In function ‘netdev_dpll_pin_assign’:
>>> ./include/linux/rcupdate.h:462:36: error: dereferencing pointer to
>>> incomplete type ‘struct dpll_pin’
>>>#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
>>>   ^~~~
>>> ./include/asm-generic/rwonce.h:55:33: note: in definition of macro
>>> ‘__WRITE_ONCE’
>>> *(volatile typeof(x) *)&(x) = (val);\
>>>^~~
>>> ./arch/powerpc/include/asm/barrier.h:76:2: note: in expansion of macro
>>> ‘WRITE_ONCE’
>>> WRITE_ONCE(*p, v);  \
>>> ^~
>>> ./include/asm-generic/barrier.h:172:55: note: in expansion of macro
>>> ‘__smp_store_release’
>>>#define smp_store_release(p, v) do { kcsan_release();
>>> __smp_store_release(p, v); } while (0)
>>> ^~~
>>> ./include/linux/rcupdate.h:503:3: note: in expansion of macro
>>> ‘smp_store_release’
>>>  smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
>>>  ^
>>> ./include/linux/rcupdate.h:503:25: note: in expansion of macro
>>> ‘RCU_INITIALIZER’
>>>  smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
>>>^~~
>>> net/core/dev.c:9081:2: note: in expansion of macro ‘rcu_assign_pointer’
>>> rcu_assign_pointer(dev->dpll_pin, dpll_pin);
>>> ^~
>>> make[4]: *** [scripts/Makefile.build:243: net/core/dev.o] Error 1
>>> make[4]: *** Waiting for unfinished jobs
>>> AR  drivers/net/ethernet/built-in.a
>>> AR  drivers/net/built-in.a
>>> AR  net/dcb/built-in.a
>>> AR  net/netlabel/built-in.a
>>> AR  net/strparser/built-in.a
>>> AR  net/handshake/built-in.a
>>> GEN lib/test_fortify.log
>>> AR  net/8021q/built-in.a
>>> AR  

[kvm-unit-tests PATCH 04/13] treewide: lib/stack: Make base_address arch specific

2024-02-28 Thread Andrew Jones
Calculating the offset of an address is image specific, which is
architecture specific. Until now, all architectures and architecture
configurations which select CONFIG_RELOC were able to subtract
_etext, but the EFI configuration of riscv cannot (it must subtract
ImageBase). Make this function architecture specific, since the
architecture's image layout already is.

Signed-off-by: Andrew Jones 
---
 lib/arm64/stack.c | 17 +
 lib/riscv/stack.c | 18 ++
 lib/stack.c   | 19 ++-
 lib/stack.h   |  2 ++
 lib/x86/stack.c   | 17 +
 5 files changed, 56 insertions(+), 17 deletions(-)

diff --git a/lib/arm64/stack.c b/lib/arm64/stack.c
index f5eb57fd8892..3369031a74f7 100644
--- a/lib/arm64/stack.c
+++ b/lib/arm64/stack.c
@@ -6,6 +6,23 @@
 #include 
 #include 
 
+#ifdef CONFIG_RELOC
+extern char _text, _etext;
+
+bool base_address(const void *rebased_addr, unsigned long *addr)
+{
+   unsigned long ra = (unsigned long)rebased_addr;
+   unsigned long start = (unsigned long)&_text;
+   unsigned long end = (unsigned long)&_etext;
+
+   if (ra < start || ra >= end)
+   return false;
+
+   *addr = ra - start;
+   return true;
+}
+#endif
+
 extern char vector_stub_start, vector_stub_end;
 
 int arch_backtrace_frame(const void *frame, const void **return_addrs,
diff --git a/lib/riscv/stack.c b/lib/riscv/stack.c
index d865594b9671..a143c22a570a 100644
--- a/lib/riscv/stack.c
+++ b/lib/riscv/stack.c
@@ -2,6 +2,24 @@
 #include 
 #include 
 
+#ifdef CONFIG_RELOC
+extern char ImageBase, _text, _etext;
+
+bool base_address(const void *rebased_addr, unsigned long *addr)
+{
+   unsigned long ra = (unsigned long)rebased_addr;
+   unsigned long base = (unsigned long)
+   unsigned long start = (unsigned long)&_text;
+   unsigned long end = (unsigned long)&_etext;
+
+   if (ra < start || ra >= end)
+   return false;
+
+   *addr = ra - base;
+   return true;
+}
+#endif
+
 int arch_backtrace_frame(const void *frame, const void **return_addrs,
 int max_depth, bool current_frame)
 {
diff --git a/lib/stack.c b/lib/stack.c
index dd6bfa8dac6e..e5099e207388 100644
--- a/lib/stack.c
+++ b/lib/stack.c
@@ -11,23 +11,8 @@
 
 #define MAX_DEPTH 20
 
-#ifdef CONFIG_RELOC
-extern char _text, _etext;
-
-static bool base_address(const void *rebased_addr, unsigned long *addr)
-{
-   unsigned long ra = (unsigned long)rebased_addr;
-   unsigned long start = (unsigned long)&_text;
-   unsigned long end = (unsigned long)&_etext;
-
-   if (ra < start || ra >= end)
-   return false;
-
-   *addr = ra - start;
-   return true;
-}
-#else
-static bool base_address(const void *rebased_addr, unsigned long *addr)
+#ifndef CONFIG_RELOC
+bool base_address(const void *rebased_addr, unsigned long *addr)
 {
*addr = (unsigned long)rebased_addr;
return true;
diff --git a/lib/stack.h b/lib/stack.h
index 6edc84344b51..f8def4ad4d49 100644
--- a/lib/stack.h
+++ b/lib/stack.h
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+bool base_address(const void *rebased_addr, unsigned long *addr);
+
 #ifdef HAVE_ARCH_BACKTRACE_FRAME
 extern int arch_backtrace_frame(const void *frame, const void **return_addrs,
int max_depth, bool current_frame);
diff --git a/lib/x86/stack.c b/lib/x86/stack.c
index 58ab6c4b293a..7ba73becbd69 100644
--- a/lib/x86/stack.c
+++ b/lib/x86/stack.c
@@ -1,6 +1,23 @@
 #include 
 #include 
 
+#ifdef CONFIG_RELOC
+extern char _text, _etext;
+
+bool base_address(const void *rebased_addr, unsigned long *addr)
+{
+   unsigned long ra = (unsigned long)rebased_addr;
+   unsigned long start = (unsigned long)&_text;
+   unsigned long end = (unsigned long)&_etext;
+
+   if (ra < start || ra >= end)
+   return false;
+
+   *addr = ra - start;
+   return true;
+}
+#endif
+
 int arch_backtrace_frame(const void *frame, const void **return_addrs,
 int max_depth, bool current_frame)
 {
-- 
2.43.0



[kvm-unit-tests PATCH 03/13] treewide: lib/stack: Fix backtrace

2024-02-28 Thread Andrew Jones
We should never pass the result of __builtin_frame_address(0) to
another function since the compiler is within its rights to pop the
frame to which it points before making the function call, as may be
done for tail calls. Nobody has complained about backtrace(), so
likely all compilations have been inlining backtrace_frame(), not
dropping the frame on the tail call, or nobody is looking at traces.
However, for riscv, when built for EFI, it does drop the frame on the
tail call, and it was noticed. Preemptively fix backtrace() for all
architectures.

Fixes: 52266791750d ("lib: backtrace printing")
Signed-off-by: Andrew Jones 
---
 lib/arm/stack.c   | 13 +
 lib/arm64/stack.c | 12 +---
 lib/riscv/stack.c | 12 +---
 lib/s390x/stack.c | 12 +---
 lib/stack.h   | 24 +---
 lib/x86/stack.c   | 12 +---
 6 files changed, 42 insertions(+), 43 deletions(-)

diff --git a/lib/arm/stack.c b/lib/arm/stack.c
index 7d081be7c6d0..66d18b47ea53 100644
--- a/lib/arm/stack.c
+++ b/lib/arm/stack.c
@@ -8,13 +8,16 @@
 #include 
 #include 
 
-int backtrace_frame(const void *frame, const void **return_addrs,
-   int max_depth)
+int arch_backtrace_frame(const void *frame, const void **return_addrs,
+int max_depth, bool current_frame)
 {
static int walking;
int depth;
const unsigned long *fp = (unsigned long *)frame;
 
+   if (current_frame)
+   fp = __builtin_frame_address(0);
+
if (walking) {
printf("RECURSIVE STACK WALK!!!\n");
return 0;
@@ -33,9 +36,3 @@ int backtrace_frame(const void *frame, const void 
**return_addrs,
walking = 0;
return depth;
 }
-
-int backtrace(const void **return_addrs, int max_depth)
-{
-   return backtrace_frame(__builtin_frame_address(0),
-  return_addrs, max_depth);
-}
diff --git a/lib/arm64/stack.c b/lib/arm64/stack.c
index 82611f4b1815..f5eb57fd8892 100644
--- a/lib/arm64/stack.c
+++ b/lib/arm64/stack.c
@@ -8,7 +8,8 @@
 
 extern char vector_stub_start, vector_stub_end;
 
-int backtrace_frame(const void *frame, const void **return_addrs, int 
max_depth)
+int arch_backtrace_frame(const void *frame, const void **return_addrs,
+int max_depth, bool current_frame)
 {
const void *fp = frame;
static bool walking;
@@ -17,6 +18,9 @@ int backtrace_frame(const void *frame, const void 
**return_addrs, int max_depth)
bool is_exception = false;
unsigned long addr;
 
+   if (current_frame)
+   fp = __builtin_frame_address(0);
+
if (walking) {
printf("RECURSIVE STACK WALK!!!\n");
return 0;
@@ -54,9 +58,3 @@ int backtrace_frame(const void *frame, const void 
**return_addrs, int max_depth)
walking = false;
return depth;
 }
-
-int backtrace(const void **return_addrs, int max_depth)
-{
-   return backtrace_frame(__builtin_frame_address(0),
-  return_addrs, max_depth);
-}
diff --git a/lib/riscv/stack.c b/lib/riscv/stack.c
index 712a5478d547..d865594b9671 100644
--- a/lib/riscv/stack.c
+++ b/lib/riscv/stack.c
@@ -2,12 +2,16 @@
 #include 
 #include 
 
-int backtrace_frame(const void *frame, const void **return_addrs, int 
max_depth)
+int arch_backtrace_frame(const void *frame, const void **return_addrs,
+int max_depth, bool current_frame)
 {
static bool walking;
const unsigned long *fp = (unsigned long *)frame;
int depth;
 
+   if (current_frame)
+   fp = __builtin_frame_address(0);
+
if (walking) {
printf("RECURSIVE STACK WALK!!!\n");
return 0;
@@ -24,9 +28,3 @@ int backtrace_frame(const void *frame, const void 
**return_addrs, int max_depth)
walking = false;
return depth;
 }
-
-int backtrace(const void **return_addrs, int max_depth)
-{
-   return backtrace_frame(__builtin_frame_address(0),
-  return_addrs, max_depth);
-}
diff --git a/lib/s390x/stack.c b/lib/s390x/stack.c
index 9f234a12adf6..d194f654e94d 100644
--- a/lib/s390x/stack.c
+++ b/lib/s390x/stack.c
@@ -14,11 +14,15 @@
 #include 
 #include 
 
-int backtrace_frame(const void *frame, const void **return_addrs, int 
max_depth)
+int arch_backtrace_frame(const void *frame, const void **return_addrs,
+int max_depth, bool current_frame)
 {
int depth = 0;
struct stack_frame *stack = (struct stack_frame *)frame;
 
+   if (current_frame)
+   stack = __builtin_frame_address(0);
+
for (depth = 0; stack && depth < max_depth; depth++) {
return_addrs[depth] = (void *)stack->grs[8];
stack = stack->back_chain;
@@ -28,9 +32,3 @@ int backtrace_frame(const void *frame, const void 
**return_addrs, int max_depth)
 
return depth;
 }
-
-int 

Re: [revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure

2024-02-28 Thread Eric Dumazet
On Wed, Feb 28, 2024 at 3:07 PM Vadim Fedorenko
 wrote:
>
> On 28/02/2024 11:09, Tasmiya Nalatwad wrote:
> > Greetings,
> >
> > [revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure
> >
> > Reverting below commit fixes the issue
> >
> > commit 0d60d8df6f493bb46bf5db40d39dd60a1bafdd4e
> >  dpll: rely on rcu for netdev_dpll_pin()
> >
> > --- Traces ---
> >
> > ./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
> > ./include/linux/rcupdate.h:439:9: error: dereferencing pointer to
> > incomplete type ‘struct dpll_pin’
> >typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
> >   ^
> > ./include/linux/rcupdate.h:587:2: note: in expansion of macro
> > ‘__rcu_dereference_check’
> >__rcu_dereference_check((p), __UNIQUE_ID(rcu), \
> >^~~
> > ./include/linux/rtnetlink.h:70:2: note: in expansion of macro
> > ‘rcu_dereference_check’
> >rcu_dereference_check(p, lockdep_rtnl_is_held())
> >^
> > ./include/linux/dpll.h:175:9: note: in expansion of macro
> > ‘rcu_dereference_rtnl’
> >return rcu_dereference_rtnl(dev->dpll_pin);
> >   ^~~~
> > make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_core.o] Error 1
> > make[4]: *** Waiting for unfinished jobs
> >AR  net/mpls/built-in.a
> >AR  net/l3mdev/built-in.a
> > In file included from ./include/linux/rbtree.h:24,
> >   from ./include/linux/mm_types.h:11,
> >   from ./include/linux/mmzone.h:22,
> >   from ./include/linux/gfp.h:7,
> >   from ./include/linux/umh.h:4,
> >   from ./include/linux/kmod.h:9,
> >   from ./include/linux/module.h:17,
> >   from drivers/dpll/dpll_netlink.c:9:
> > ./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
> > ./include/linux/rcupdate.h:439:9: error: dereferencing pointer to
> > incomplete type ‘struct dpll_pin’
> >typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
> >   ^
> > ./include/linux/rcupdate.h:587:2: note: in expansion of macro
> > ‘__rcu_dereference_check’
> >__rcu_dereference_check((p), __UNIQUE_ID(rcu), \
> >^~~
> > ./include/linux/rtnetlink.h:70:2: note: in expansion of macro
> > ‘rcu_dereference_check’
> >rcu_dereference_check(p, lockdep_rtnl_is_held())
> >^
> > ./include/linux/dpll.h:175:9: note: in expansion of macro
> > ‘rcu_dereference_rtnl’
> >return rcu_dereference_rtnl(dev->dpll_pin);
> >   ^~~~
> > make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_netlink.o]
> > Error 1
> > make[3]: *** [scripts/Makefile.build:481: drivers/dpll] Error 2
> > make[3]: *** Waiting for unfinished jobs
> > In file included from ./arch/powerpc/include/generated/asm/rwonce.h:1,
> >   from ./include/linux/compiler.h:251,
> >   from ./include/linux/instrumented.h:10,
> >   from ./include/linux/uaccess.h:6,
> >   from net/core/dev.c:71:
> > net/core/dev.c: In function ‘netdev_dpll_pin_assign’:
> > ./include/linux/rcupdate.h:462:36: error: dereferencing pointer to
> > incomplete type ‘struct dpll_pin’
> >   #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
> >  ^~~~
> > ./include/asm-generic/rwonce.h:55:33: note: in definition of macro
> > ‘__WRITE_ONCE’
> >*(volatile typeof(x) *)&(x) = (val);\
> >   ^~~
> > ./arch/powerpc/include/asm/barrier.h:76:2: note: in expansion of macro
> > ‘WRITE_ONCE’
> >WRITE_ONCE(*p, v);  \
> >^~
> > ./include/asm-generic/barrier.h:172:55: note: in expansion of macro
> > ‘__smp_store_release’
> >   #define smp_store_release(p, v) do { kcsan_release();
> > __smp_store_release(p, v); } while (0)
> > ^~~
> > ./include/linux/rcupdate.h:503:3: note: in expansion of macro
> > ‘smp_store_release’
> > smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
> > ^
> > ./include/linux/rcupdate.h:503:25: note: in expansion of macro
> > ‘RCU_INITIALIZER’
> > smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
> >   ^~~
> > net/core/dev.c:9081:2: note: in expansion of macro ‘rcu_assign_pointer’
> >rcu_assign_pointer(dev->dpll_pin, dpll_pin);
> >^~
> > make[4]: *** [scripts/Makefile.build:243: net/core/dev.o] Error 1
> > make[4]: *** Waiting for unfinished jobs
> >AR  drivers/net/ethernet/built-in.a
> >AR  drivers/net/built-in.a
> >AR  net/dcb/built-in.a
> >AR  net/netlabel/built-in.a
> >AR  net/strparser/built-in.a
> >AR  net/handshake/built-in.a
> >GEN lib/test_fortify.log
> >AR  net/8021q/built-in.a
> >AR  net/nsh/built-in.a
> >AR  net/unix/built-in.a
> >CC  lib/string.o
> >AR  

Re: [revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure

2024-02-28 Thread Vadim Fedorenko

On 28/02/2024 11:09, Tasmiya Nalatwad wrote:

Greetings,

[revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure

Reverting below commit fixes the issue

commit 0d60d8df6f493bb46bf5db40d39dd60a1bafdd4e
     dpll: rely on rcu for netdev_dpll_pin()

--- Traces ---

./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

   typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
  ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

   __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
   ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

   rcu_dereference_check(p, lockdep_rtnl_is_held())
   ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

   return rcu_dereference_rtnl(dev->dpll_pin);
  ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_core.o] Error 1
make[4]: *** Waiting for unfinished jobs
   AR  net/mpls/built-in.a
   AR  net/l3mdev/built-in.a
In file included from ./include/linux/rbtree.h:24,
  from ./include/linux/mm_types.h:11,
  from ./include/linux/mmzone.h:22,
  from ./include/linux/gfp.h:7,
  from ./include/linux/umh.h:4,
  from ./include/linux/kmod.h:9,
  from ./include/linux/module.h:17,
  from drivers/dpll/dpll_netlink.c:9:
./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

   typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
  ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

   __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
   ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

   rcu_dereference_check(p, lockdep_rtnl_is_held())
   ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

   return rcu_dereference_rtnl(dev->dpll_pin);
  ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_netlink.o] 
Error 1

make[3]: *** [scripts/Makefile.build:481: drivers/dpll] Error 2
make[3]: *** Waiting for unfinished jobs
In file included from ./arch/powerpc/include/generated/asm/rwonce.h:1,
  from ./include/linux/compiler.h:251,
  from ./include/linux/instrumented.h:10,
  from ./include/linux/uaccess.h:6,
  from net/core/dev.c:71:
net/core/dev.c: In function ‘netdev_dpll_pin_assign’:
./include/linux/rcupdate.h:462:36: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

  #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
     ^~~~
./include/asm-generic/rwonce.h:55:33: note: in definition of macro 
‘__WRITE_ONCE’

   *(volatile typeof(x) *)&(x) = (val);    \
  ^~~
./arch/powerpc/include/asm/barrier.h:76:2: note: in expansion of macro 
‘WRITE_ONCE’

   WRITE_ONCE(*p, v);  \
   ^~
./include/asm-generic/barrier.h:172:55: note: in expansion of macro 
‘__smp_store_release’
  #define smp_store_release(p, v) do { kcsan_release(); 
__smp_store_release(p, v); } while (0)

^~~
./include/linux/rcupdate.h:503:3: note: in expansion of macro 
‘smp_store_release’

    smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
    ^
./include/linux/rcupdate.h:503:25: note: in expansion of macro 
‘RCU_INITIALIZER’

    smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
  ^~~
net/core/dev.c:9081:2: note: in expansion of macro ‘rcu_assign_pointer’
   rcu_assign_pointer(dev->dpll_pin, dpll_pin);
   ^~
make[4]: *** [scripts/Makefile.build:243: net/core/dev.o] Error 1
make[4]: *** Waiting for unfinished jobs
   AR  drivers/net/ethernet/built-in.a
   AR  drivers/net/built-in.a
   AR  net/dcb/built-in.a
   AR  net/netlabel/built-in.a
   AR  net/strparser/built-in.a
   AR  net/handshake/built-in.a
   GEN lib/test_fortify.log
   AR  net/8021q/built-in.a
   AR  net/nsh/built-in.a
   AR  net/unix/built-in.a
   CC  lib/string.o
   AR  net/packet/built-in.a
   AR  net/switchdev/built-in.a
   AR  lib/lib.a
   AR  net/mptcp/built-in.a
   AR  net/devlink/built-in.a
In file included from ./include/linux/rbtree.h:24,
  from ./include/linux/mm_types.h:11,
  from ./include/linux/mmzone.h:22,
  from ./include/linux/gfp.h:7,
  from ./include/linux/umh.h:4,
  from ./include/linux/kmod.h:9,
    

Re: [PATCH linux-next v2 0/3] powerpc/kexec: split CONFIG_CRASH_DUMP out from CONFIG_KEXEC_CORE

2024-02-28 Thread Baoquan He
On 02/26/24 at 04:00pm, Hari Bathini wrote:
> This patch series is a follow-up to [1] based on discussions at [2]
> about additional work needed to get it working on powerpc.
> 
> The first patch in the series makes struct crash_mem available with or
> without CONFIG_CRASH_DUMP enabled. The next patch moves kdump specific
> code for kexec_file_load syscall under CONFIG_CRASH_DUMP and the last
> patch splits other kdump specific code under CONFIG_CRASH_DUMP and
> removes dependency with CONFIG_CRASH_DUMP for CONFIG_KEXEC_CORE.
> 
> [1] https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/
> [2] 
> https://lore.kernel.org/all/9101bb07-70f1-476c-bec9-ec67e9899...@linux.ibm.com/
> 
> Changes in v2:
> * Fixed a compile error for POWERNV build reported by Sourabh.
> 
> Hari Bathini (3):
>   kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
>   powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
>   powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency

I have acked patch 1. And patch 2 and 3 looks good to me, leave these
two to powerpc experts to have a careful reviewing. Thanks for these
great work.


> 
>  arch/powerpc/Kconfig |   9 +-
>  arch/powerpc/include/asm/kexec.h |  98 +-
>  arch/powerpc/kernel/prom.c   |   2 +-
>  arch/powerpc/kernel/setup-common.c   |   2 +-
>  arch/powerpc/kernel/smp.c|   4 +-
>  arch/powerpc/kexec/Makefile  |   3 +-
>  arch/powerpc/kexec/core.c|   4 +
>  arch/powerpc/kexec/elf_64.c  |   4 +-
>  arch/powerpc/kexec/file_load_64.c| 269 ++-
>  arch/powerpc/platforms/powernv/smp.c |   2 +-
>  include/linux/crash_core.h   |  12 +-
>  11 files changed, 209 insertions(+), 200 deletions(-)
> 
> -- 
> 2.43.2
> 



Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Christophe Leroy


Le 27/02/2024 à 21:25, Edgecombe, Rick P a écrit :
> On Tue, 2024-02-27 at 18:16 +, Christophe Leroy wrote:
 Why doing a full init of the struct when all fields are re-
 written a few
 lines after ?
>>>
>>> It's a nice change for robustness and makes future changes easier.
>>> It's
>>> not actually wasteful since the compiler will throw away all
>>> redundant
>>> stores.
>>
>> Well, I tend to dislike default init at declaration because it often
>> hides missed real init. When a field is not initialized GCC should
>> emit
>> a Warning, at least when built with W=2 which sets
>> -Wmissing-field-initializers ?
> 
> Sorry, I'm not following where you are going with this. There aren't
> any struct vm_unmapped_area_info users that use initializers today, so
> that warning won't apply in this case. Meanwhile, designated style
> struct initialization (which would zero new members) is very common, as
> well as not get anything checked by that warning. Anything with this
> many members is probably going to use the designated style.
> 
> If we are optimizing to avoid bugs, the way this struct is used today
> is not great. It is essentially being used as an argument passer.
> Normally when a function signature changes, but a caller is missed, of
> course the compiler will notice loudly. But not here. So I think
> probably zero initializing it is safer than being setup to pass
> garbage.

No worry, if everybody thinks that init at declaration is worth it in 
that case it is OK for me and I'm not going to ask for something special 
on powerpc, my comment was more general allthough I used powerpc as an 
exemple.

My worry with initialisation at declaration is it often hides missing 
assignments. Let's take following simple exemple:

char *colour(int num)
{
char *name;

if (num == 0) {
name = "black";
} else if (num == 1) {
name = "white";
} else if (num == 2) {
} else {
name = "no colour";
}

return name;
}


Here, GCC warns about a missing initialisation of variable 'name'.

But if I declare it as

char *name = "no colour";

Then GCC won't warn anymore that we are missing a value for when num is 2.

During my life I have so many times spent huge amount of time 
investigating issues and bugs due to missing assignments that were going 
undetected due to default initialisation at declaration.

> 
> I'm trying to figure out what to do here. If I changed it so that just
> powerpc set the new field manually, then the convention across the
> kernel would be for everything to be default zero, and future other new
> parameters could have a greater chance of turning into garbage on
> powerpc. Since it could be easy to miss that powerpc was special. Would
> you prefer it?
> 
> Or maybe I could try a new vm_unmapped_area() that takes the extra
> argument separately? The old callers could call the old function and
> not need any arch updates. It all seems strange though, because
> automatic zero initializing struct members is so common in the kernel.
> But it also wouldn't add the cleanup Kees was pointing out. Hmm.
> 
> Any preference? Or maybe am I missing your point and talking nonsense?
> 

So my preference would go to the addition of:

info.new_field = 0;

But that's very minor and if you think it is easier to manage and 
maintain by performing {} initialisation at declaration, lets go for that.

Christophe


Re: [kvm-unit-tests PATCH 32/32] powerpc: gitlab CI update

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:12:18PM +1000, Nicholas Piggin wrote:
> This adds testing for the powernv machine, and adds a gitlab-ci test
> group instead of specifying all tests in .gitlab-ci.yml.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  .gitlab-ci.yml| 16 ++--
>  powerpc/unittests.cfg | 15 ---
>  2 files changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index 61f196d5d..51a593021 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -69,11 +69,9 @@ build-ppc64be:
>   - cd build
>   - ../configure --arch=ppc64 --endian=big --cross-prefix=powerpc64-linux-gnu-
>   - make -j2
> - - ACCEL=tcg ./run_tests.sh
> - selftest-setup selftest-migration selftest-migration-skip spapr_hcall
> - rtas-get-time-of-day rtas-get-time-of-day-base rtas-set-time-of-day
> - emulator
> - | tee results.txt
> + - ACCEL=tcg MAX_SMP=8 ./run_tests.sh -g gitlab-ci | tee results.txt
> + - if grep -q FAIL results.txt ; then exit 1 ; fi
> + - ACCEL=tcg MAX_SMP=8 MACHINE=powernv ./run_tests.sh -g gitlab-ci | tee 
> results.txt
>   - if grep -q FAIL results.txt ; then exit 1 ; fi
>  
>  build-ppc64le:
> @@ -82,11 +80,9 @@ build-ppc64le:
>   - dnf install -y qemu-system-ppc gcc-powerpc64-linux-gnu nmap-ncat
>   - ./configure --arch=ppc64 --endian=little 
> --cross-prefix=powerpc64-linux-gnu-
>   - make -j2
> - - ACCEL=tcg ./run_tests.sh
> - selftest-setup selftest-migration selftest-migration-skip spapr_hcall
> - rtas-get-time-of-day rtas-get-time-of-day-base rtas-set-time-of-day
> - emulator
> - | tee results.txt
> + - ACCEL=tcg MAX_SMP=8 ./run_tests.sh -g gitlab-ci | tee results.txt
> + - if grep -q FAIL results.txt ; then exit 1 ; fi
> + - ACCEL=tcg MAX_SMP=8 MACHINE=powernv ./run_tests.sh -g gitlab-ci | tee 
> results.txt
>   - if grep -q FAIL results.txt ; then exit 1 ; fi
>  

We're slowly migrating all tests like these to

 grep -q PASS results.txt && ! grep -q FAIL results.txt

Here's a good opportunity to change ppc's.

Thanks,
drew


Re: [kvm-unit-tests PATCH 30/32] configure: Make arch_libdir a first-class entity

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:12:16PM +1000, Nicholas Piggin wrote:
> arch_libdir was brought in to improve the heuristic determination of
> the lib/ directory based on arch and testdir names, but it did not
> entirely clean that mess up.
> 
> Remove the arch_libdir->arch->testdir heuristic and just require
> everybody sets arch_libdir correctly. Fail if the lib/arch or
> lib/arch/asm directories can not be found.
> 
> Cc: Alexandru Elisei 
> Cc: Andrew Jones 
> Cc: Claudio Imbrenda 
> Cc: David Hildenbrand 
> Cc: Eric Auger 
> Cc: Janosch Frank 
> Cc: Laurent Vivier 
> Cc: Nico Böhr 
> Cc: Paolo Bonzini 
> Cc: Thomas Huth 
> Cc: k...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Cc: kvm...@lists.linux.dev
> Cc: kvm-ri...@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Nicholas Piggin 
> ---
>  Makefile  |  2 +-
>  configure | 18 +-
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 4f35fffc6..4e0f54543 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -10,7 +10,7 @@ include config.mak
>  VPATH = $(SRCDIR)
>  
>  libdirs-get = $(shell [ -d "lib/$(1)" ] && echo "lib/$(1) lib/$(1)/asm")
> -ARCH_LIBDIRS := $(call libdirs-get,$(ARCH_LIBDIR)) $(call 
> libdirs-get,$(TEST_DIR))
> +ARCH_LIBDIRS := $(call libdirs-get,$(ARCH_LIBDIR))
>  OBJDIRS := $(ARCH_LIBDIRS)
>  
>  DESTDIR := $(PREFIX)/share/kvm-unit-tests/
> diff --git a/configure b/configure
> index ae522c556..8c0e3506f 100755
> --- a/configure
> +++ b/configure
> @@ -199,7 +199,6 @@ fi
>  arch_name=$arch
>  [ "$arch" = "aarch64" ] && arch="arm64"
>  [ "$arch_name" = "arm64" ] && arch_name="aarch64"
> -arch_libdir=$arch
>  
>  if [ "$arch" = "riscv" ]; then
>  echo "riscv32 or riscv64 must be specified"
> @@ -264,8 +263,10 @@ fi
>  
>  if [ "$arch" = "i386" ] || [ "$arch" = "x86_64" ]; then
>  testdir=x86
> +arch_libdir=x86
>  elif [ "$arch" = "arm" ] || [ "$arch" = "arm64" ]; then
>  testdir=arm
> +arch_libdir=$arch
>  if [ "$target" = "qemu" ]; then
>  arm_uart_early_addr=0x0900
>  elif [ "$target" = "kvmtool" ]; then
> @@ -314,6 +315,7 @@ elif [ "$arch" = "arm" ] || [ "$arch" = "arm64" ]; then
>  fi
>  elif [ "$arch" = "ppc64" ]; then
>  testdir=powerpc
> +arch_libdir=ppc64
>  firmware="$testdir/boot_rom.bin"
>  if [ "$endian" != "little" ] && [ "$endian" != "big" ]; then
>  echo "You must provide endianness (big or little)!"
> @@ -324,6 +326,7 @@ elif [ "$arch" = "riscv32" ] || [ "$arch" = "riscv64" ]; 
> then
>  arch_libdir=riscv
>  elif [ "$arch" = "s390x" ]; then
>  testdir=s390x
> +arch_libdir=s390x

Probably could have left the arch_libdir=$arch above and only added the
ppc64 line, but either way.

>  else
>  echo "arch $arch is not supported!"
>  arch=
> @@ -333,6 +336,10 @@ if [ ! -d "$srcdir/$testdir" ]; then
>  echo "$srcdir/$testdir does not exist!"
>  exit 1
>  fi
> +if [ ! -d "$srcdir/lib/$arch_libdir" ]; then
> +echo "$srcdir/lib/$arch_libdir does not exist!"
> +exit 1
> +fi
>  
>  if [ "$efi" = "y" ] && [ -f "$srcdir/$testdir/efi/run" ]; then
>  ln -fs "$srcdir/$testdir/efi/run" $testdir-run
> @@ -395,10 +402,11 @@ fi
>  # link lib/asm for the architecture
>  rm -f lib/asm
>  asm="asm-generic"
> -if [ -d "$srcdir/lib/$arch/asm" ]; then
> - asm="$srcdir/lib/$arch/asm"
> -elif [ -d "$srcdir/lib/$testdir/asm" ]; then
> - asm="$srcdir/lib/$testdir/asm"
> +if [ -d "$srcdir/lib/$arch_libdir/asm" ]; then
> +asm="$srcdir/lib/$arch_libdir/asm"
> +else
> +echo "$srcdir/lib/$arch_libdir/asm does not exist"
> +exit 1
>  fi
>  mkdir -p lib
>  ln -sf "$asm" lib/asm
> -- 
> 2.42.0
> 
>

Reviewed-by: Andrew Jones 

Thanks,
drew


Re: [kvm-unit-tests PATCH 29/32] configure: Fail on unknown arch

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:12:15PM +1000, Nicholas Piggin wrote:
> configure will accept an unknown arch, and if it is the name of a
> directory in the source tree the command will silently succeed. Make
> it only accept supported arch names.
> 
> Also print the full path of a missing test directory to disambiguate
> the error in out of tree builds.
> 
> Cc: Alexandru Elisei 
> Cc: Andrew Jones 
> Cc: Claudio Imbrenda 
> Cc: David Hildenbrand 
> Cc: Eric Auger 
> Cc: Janosch Frank 
> Cc: Laurent Vivier 
> Cc: Nico Böhr 
> Cc: Paolo Bonzini 
> Cc: Thomas Huth 
> Cc: k...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Cc: kvm...@lists.linux.dev
> Cc: kvm-ri...@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Nicholas Piggin 
> ---
>  configure | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/configure b/configure
> index 6907ccbbb..ae522c556 100755
> --- a/configure
> +++ b/configure
> @@ -45,7 +45,8 @@ usage() {
>   Usage: $0 [options]
>  
>   Options include:
> - --arch=ARCHarchitecture to compile for ($arch)
> + --arch=ARCHarchitecture to compile for ($arch). ARCH 
> can be one of:
> +arm, arm64, i386, ppc64, riscv32, riscv64, 
> s390x, x86_64
>   --processor=PROCESSOR  processor to compile for ($arch)
>   --target=TARGETtarget platform that the tests will be 
> running on (qemu or
>  kvmtool, default is qemu) (arm/arm64 only)
> @@ -321,11 +322,15 @@ elif [ "$arch" = "ppc64" ]; then
>  elif [ "$arch" = "riscv32" ] || [ "$arch" = "riscv64" ]; then
>  testdir=riscv
>  arch_libdir=riscv
> +elif [ "$arch" = "s390x" ]; then
> +testdir=s390x
>  else
> -testdir=$arch
> +echo "arch $arch is not supported!"
> +arch=
> +usage
>  fi
>  if [ ! -d "$srcdir/$testdir" ]; then
> -echo "$testdir does not exist!"
> +echo "$srcdir/$testdir does not exist!"
>  exit 1
>  fi
>  
> -- 
> 2.42.0
>

Reviewed-by: Andrew Jones 


Re: [kvm-unit-tests PATCH 25/32] common/sieve: Support machines without MMU

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:12:11PM +1000, Nicholas Piggin wrote:
> Not all powerpc CPUs provide MMU support. Define vm_available() that is
> true by default but archs can override it. Use this to run VM tests.
> 
> Cc: Paolo Bonzini 
> Cc: Thomas Huth 
> Cc: Andrew Jones 
> Cc: k...@vger.kernel.org
> Signed-off-by: Nicholas Piggin 
> ---
>  common/sieve.c  | 14 --
>  lib/ppc64/asm/mmu.h |  1 -
>  lib/ppc64/mmu.c |  2 +-
>  lib/vmalloc.c   |  7 +++
>  lib/vmalloc.h   |  2 ++
>  5 files changed, 18 insertions(+), 8 deletions(-)
>

Reviewed-by: Andrew Jones 


Re: [kvm-unit-tests PATCH 24/32] common/sieve: Use vmalloc.h for setup_mmu definition

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:12:10PM +1000, Nicholas Piggin wrote:
> There is no good reason to put setup_vm in libcflat.h when it's
> defined in vmalloc.h.
> 
> Cc: Paolo Bonzini 
> Cc: Thomas Huth 
> Cc: Andrew Jones 
> Cc: Janosch Frank 
> Cc: Claudio Imbrenda 
> Cc: Nico Böhr 
> Cc: David Hildenbrand 
> Cc: k...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Signed-off-by: Nicholas Piggin 
> ---
>  common/sieve.c | 1 +
>  lib/libcflat.h | 2 --
>  lib/s390x/io.c | 1 +
>  lib/s390x/uv.h | 1 +
>  lib/x86/vm.h   | 1 +
>  s390x/mvpg.c   | 1 +
>  s390x/selftest.c   | 1 +
>  x86/pmu.c  | 1 +
>  x86/pmu_lbr.c  | 1 +
>  x86/vmexit.c   | 1 +
>  x86/vmware_backdoors.c | 1 +
>  11 files changed, 10 insertions(+), 2 deletions(-)
>

Acked-by: Andrew Jones 


Re: [kvm-unit-tests PATCH 23/32] powerpc: Add MMU support

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:12:09PM +1000, Nicholas Piggin wrote:
> Add support for radix MMU, 4kB and 64kB pages.
> 
> This also adds MMU interrupt test cases, and runs the interrupts
> test entirely with MMU enabled if it is available (aside from
> machine check tests).
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  configure |  39 +++--
>  lib/powerpc/asm/hcall.h   |   6 +
>  lib/powerpc/asm/processor.h   |   1 +
>  lib/powerpc/asm/reg.h |   3 +
>  lib/powerpc/asm/smp.h |   2 +
>  lib/powerpc/processor.c   |   9 ++
>  lib/powerpc/setup.c   |   4 +
>  lib/ppc64/asm/mmu.h   |  11 ++
>  lib/ppc64/asm/page.h  |  67 -
>  lib/ppc64/asm/pgtable-hwdef.h |  67 +
>  lib/ppc64/asm/pgtable.h   | 126 
>  lib/ppc64/mmu.c   | 273 ++
>  lib/ppc64/opal-calls.S|   4 +-
>  powerpc/Makefile.common   |   2 +
>  powerpc/Makefile.ppc64|   1 +
>  powerpc/interrupts.c  |  96 ++--
>  16 files changed, 684 insertions(+), 27 deletions(-)
>  create mode 100644 lib/ppc64/asm/mmu.h
>  create mode 100644 lib/ppc64/asm/pgtable-hwdef.h
>  create mode 100644 lib/ppc64/asm/pgtable.h
>  create mode 100644 lib/ppc64/mmu.c
> 
> diff --git a/configure b/configure
> index 05e6702ea..6907ccbbb 100755
> --- a/configure
> +++ b/configure
> @@ -222,29 +222,35 @@ fi
>  if [ -z "$page_size" ]; then
>  if [ "$efi" = 'y' ] && [ "$arch" = "arm64" ]; then
>  page_size="4096"
> -elif [ "$arch" = "arm64" ]; then
> +elif [ "$arch" = "arm64" ] || [ "$arch" = "ppc64" ]; then
>  page_size="65536"
>  elif [ "$arch" = "arm" ]; then
>  page_size="4096"
>  fi
>  else
> -if [ "$arch" != "arm64" ]; then
> -echo "--page-size is not supported for $arch"
> -usage
> -fi
> -
>  if [ "${page_size: -1}" = "K" ] || [ "${page_size: -1}" = "k" ]; then
>  page_size=$(( ${page_size%?} * 1024 ))
>  fi
> -if [ "$page_size" != "4096" ] && [ "$page_size" != "16384" ] &&
> -   [ "$page_size" != "65536" ]; then
> -echo "arm64 doesn't support page size of $page_size"
> +
> +if [ "$arch" = "arm64" ]; then
> +if [ "$page_size" != "4096" ] && [ "$page_size" != "16384" ] &&
> +   [ "$page_size" != "65536" ]; then
> +echo "arm64 doesn't support page size of $page_size"
> +usage
> +fi
> +if [ "$efi" = 'y' ] && [ "$page_size" != "4096" ]; then
> +echo "efi must use 4K pages"
> +exit 1
> +fi
> +elif [ "$arch" = "ppc64" ]; then
> +if [ "$page_size" != "4096" ] && [ "$page_size" != "65536" ]; then
> +echo "ppc64 doesn't support page size of $page_size"
> +usage
> +fi
> +else
> +echo "--page-size is not supported for $arch"
>  usage
>  fi
> -if [ "$efi" = 'y' ] && [ "$page_size" != "4096" ]; then
> -echo "efi must use 4K pages"
> -exit 1
> -fi
>  fi
>  
>  [ -z "$processor" ] && processor="$arch"
> @@ -444,6 +450,13 @@ cat <> lib/config.h
>  
>  #define CONFIG_UART_EARLY_BASE ${arm_uart_early_addr}
>  #define CONFIG_ERRATA_FORCE ${errata_force}
> +
> +EOF
> +fi
> +
> +if [ "$arch" = "arm" ] || [ "$arch" = "arm64" ] || [ "$arch" = "ppc64" ]; 
> then
> +cat <> lib/config.h
> +
>  #define CONFIG_PAGE_SIZE _AC(${page_size}, UL)
>  
>  EOF

Ack for the configure changes.

Thanks,
drew


Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Kirill A. Shutemov
On Mon, Feb 26, 2024 at 11:09:47AM -0800, Rick Edgecombe wrote:
> diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
> index 5db88b627439..dd6801bb9240 100644
> --- a/arch/alpha/kernel/osf_sys.c
> +++ b/arch/alpha/kernel/osf_sys.c
> @@ -1218,7 +1218,7 @@ static unsigned long
>  arch_get_unmapped_area_1(unsigned long addr, unsigned long len,
>unsigned long limit)
>  {
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {};
>  
>   info.flags = 0;
>   info.length = len;

Can we make a step forward and actually move initialization inside the
initializator? Something like below.

I understand that it is substantially more work, but I think it is useful.

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 5db88b627439..c40ddede3b13 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1218,14 +1218,12 @@ static unsigned long
 arch_get_unmapped_area_1(unsigned long addr, unsigned long len,
 unsigned long limit)
 {
-   struct vm_unmapped_area_info info;
+   struct vm_unmapped_area_info info = {
+   .length = len;
+   .low_limit = addr,
+   .high_limit = limit,
+   };

-   info.flags = 0;
-   info.length = len;
-   info.low_limit = addr;
-   info.high_limit = limit;
-   info.align_mask = 0;
-   info.align_offset = 0;
return vm_unmapped_area();
 }

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


Re: [kvm-unit-tests PATCH 10/32] scripts: Accommodate powerpc powernv machine differences

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:11:56PM +1000, Nicholas Piggin wrote:
> The QEMU powerpc powernv machine has minor differences that must be
> accommodated for in output parsing:
> 
> - Summary parsing must search more lines of output for the summary
>   line, to accommodate OPAL message on shutdown.
> - Premature failure testing must tolerate case differences in kernel
>   load error message.
> 
> Acked-by: Thomas Huth 
> Signed-off-by: Nicholas Piggin 
> ---
>  scripts/runtime.bash | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 8f9672d0d..bb32c0d10 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -9,7 +9,7 @@ FAIL() { echo -ne "\e[31mFAIL\e[0m"; }
>  extract_summary()
>  {
>  local cr=$'\r'
> -tail -3 | grep '^SUMMARY: ' | sed 's/^SUMMARY: /(/;s/'"$cr"'\{0,1\}$/)/'
> +tail -5 | grep '^SUMMARY: ' | sed 's/^SUMMARY: /(/;s/'"$cr"'\{0,1\}$/)/'
>  }
>  
>  # We assume that QEMU is going to work if it tried to load the kernel
> @@ -18,7 +18,7 @@ premature_failure()
>  local log="$(eval "$(get_cmdline _NO_FILE_4Uhere_)" 2>&1)"
>  
>  echo "$log" | grep "_NO_FILE_4Uhere_" |
> -grep -q -e "could not \(load\|open\) kernel" -e "error loading" &&
> +grep -q -e "[Cc]ould not \(load\|open\) kernel" -e "error loading" &&
>  return 1
>  
>  RUNTIME_log_stderr <<< "$log"
> -- 
> 2.42.0
>

Acked-by: Andrew Jones 


Re: [kvm-unit-tests PATCH 17/32] arch-run: Fix handling multiple exit status messages

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:12:03PM +1000, Nicholas Piggin wrote:
> In SMP tests, it's possible for multiple CPUs to print an exit
> message if they abort concurrently, confusing the harness:
> 
>   EXIT: STATUS=127
> 
>   EXIT: STATUS=127
>   scripts/arch-run.bash: line 85: [: too many arguments
>   scripts/arch-run.bash: line 93: return: too many arguments
> 
> lib/arch code should probably serialise this to prevent it, but
> at the moment not all do. So make the parser handle this by
> just looking at the first EXIT.
> 
> Cc: Paolo Bonzini 
> Cc: Thomas Huth 
> Cc: Andrew Jones 
> Cc: k...@vger.kernel.org
> Signed-off-by: Nicholas Piggin 
> ---
>  scripts/arch-run.bash | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/arch-run.bash b/scripts/arch-run.bash
> index 5c7e72036..4af670f1c 100644
> --- a/scripts/arch-run.bash
> +++ b/scripts/arch-run.bash
> @@ -79,7 +79,7 @@ run_qemu_status ()
>   exec {stdout}>&-
>  
>   if [ $ret -eq 1 ]; then
> - testret=$(grep '^EXIT: ' <<<"$lines" | sed 
> 's/.*STATUS=\([0-9][0-9]*\).*/\1/')
> + testret=$(grep '^EXIT: ' <<<"$lines" | head -n1 | sed 
> 's/.*STATUS=\([0-9][0-9]*\).*/\1/')
>   if [ "$testret" ]; then
>   if [ $testret -eq 1 ]; then
>   ret=0
> -- 
> 2.42.0
>

Acked-by: Andrew Jones 


Re: [kvm-unit-tests PATCH 09/32] scripts: allow machine option to be specified in unittests.cfg

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:11:55PM +1000, Nicholas Piggin wrote:
> This allows different machines with different requirements to be
> supported by run_tests.sh, similarly to how different accelerators
> are handled.
> 
> Acked-by: Thomas Huth 
> Signed-off-by: Nicholas Piggin 
> ---
>  scripts/common.bash  |  8 ++--
>  scripts/runtime.bash | 16 
>  2 files changed, 18 insertions(+), 6 deletions(-)

Please also update the unittests.cfg documentation. Currently that
documentation lives in the header of each unittests.cfg file, but
we could maybe change each file to have a single line which points
at a single document.

Thanks,
drew


Re: [kvm-unit-tests PATCH 04/32] powerpc: interrupt stack backtracing

2024-02-28 Thread Andrew Jones
On Mon, Feb 26, 2024 at 08:11:50PM +1000, Nicholas Piggin wrote:
> Add support for backtracing across interrupt stacks, and
> add interrupt frame backtrace for unhandled interrupts.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  lib/powerpc/processor.c |  4 ++-
>  lib/ppc64/asm/stack.h   |  3 +++
>  lib/ppc64/stack.c   | 55 +
>  powerpc/Makefile.ppc64  |  1 +
>  powerpc/cstart64.S  |  7 --
>  5 files changed, 67 insertions(+), 3 deletions(-)
>  create mode 100644 lib/ppc64/stack.c
> 
> diff --git a/lib/powerpc/processor.c b/lib/powerpc/processor.c
> index ad0d95666..114584024 100644
> --- a/lib/powerpc/processor.c
> +++ b/lib/powerpc/processor.c
> @@ -51,7 +51,9 @@ void do_handle_exception(struct pt_regs *regs)
>   return;
>   }
>  
> - printf("unhandled cpu exception %#lx at NIA:0x%016lx MSR:0x%016lx\n", 
> regs->trap, regs->nip, regs->msr);
> + printf("Unhandled cpu exception %#lx at NIA:0x%016lx MSR:0x%016lx\n",
> + regs->trap, regs->nip, regs->msr);
> + dump_frame_stack((void *)regs->nip, (void *)regs->gpr[1]);
>   abort();
>  }
>  
> diff --git a/lib/ppc64/asm/stack.h b/lib/ppc64/asm/stack.h
> index 9734bbb8f..94fd1021c 100644
> --- a/lib/ppc64/asm/stack.h
> +++ b/lib/ppc64/asm/stack.h
> @@ -5,4 +5,7 @@
>  #error Do not directly include . Just use .
>  #endif
>  
> +#define HAVE_ARCH_BACKTRACE
> +#define HAVE_ARCH_BACKTRACE_FRAME
> +
>  #endif
> diff --git a/lib/ppc64/stack.c b/lib/ppc64/stack.c
> new file mode 100644
> index 0..fcb7fa860
> --- /dev/null
> +++ b/lib/ppc64/stack.c
> @@ -0,0 +1,55 @@
> +#include 
> +#include 
> +#include 
> +
> +extern char exception_stack_marker[];
> +
> +int backtrace_frame(const void *frame, const void **return_addrs, int 
> max_depth)
> +{
> + static int walking;
> + int depth = 0;
> + const unsigned long *bp = (unsigned long *)frame;
> + void *return_addr;
> +
> + asm volatile("" ::: "lr"); /* Force it to save LR */
> +
> + if (walking) {
> + printf("RECURSIVE STACK WALK!!!\n");
> + return 0;
> + }
> + walking = 1;
> +
> + bp = (unsigned long *)bp[0];
> + return_addr = (void *)bp[2];
> +
> + for (depth = 0; bp && depth < max_depth; depth++) {
> + return_addrs[depth] = return_addr;
> + if (return_addrs[depth] == 0)
> + break;
> + if (return_addrs[depth] == exception_stack_marker) {
> + struct pt_regs *regs;
> +
> + regs = (void *)bp + STACK_FRAME_OVERHEAD;
> + bp = (unsigned long *)bp[0];
> + /* Represent interrupt frame with vector number */
> + return_addr = (void *)regs->trap;
> + if (depth + 1 < max_depth) {
> + depth++;
> + return_addrs[depth] = return_addr;
> + return_addr = (void *)regs->nip;
> + }
> + } else {
> + bp = (unsigned long *)bp[0];
> + return_addr = (void *)bp[2];
> + }
> + }
> +
> + walking = 0;
> + return depth;
> +}
> +
> +int backtrace(const void **return_addrs, int max_depth)
> +{
> + return backtrace_frame(__builtin_frame_address(0), return_addrs,
> +max_depth);
> +}

I'm about to post a series which has a couple treewide tracing changes
in them. Depending on which series goes first the other will need to
accommodate.

Thanks,
drew


[netdev] Build failure on powerpc

2024-02-28 Thread Tasmiya Nalatwad

Greetings,

[netdev] Build failure on powerpc
latest netdev 6.8.0-rc5-auto-g1ce7d306ea63 fails to build on powerpc 
below traces


--- Traces---

./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

  typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
 ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
  ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

  rcu_dereference_check(p, lockdep_rtnl_is_held())
  ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

  return rcu_dereference_rtnl(dev->dpll_pin);
 ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_core.o] Error 1
make[4]: *** Waiting for unfinished jobs
  AR  net/mpls/built-in.a
  AR  net/l3mdev/built-in.a
In file included from ./include/linux/rbtree.h:24,
 from ./include/linux/mm_types.h:11,
 from ./include/linux/mmzone.h:22,
 from ./include/linux/gfp.h:7,
 from ./include/linux/umh.h:4,
 from ./include/linux/kmod.h:9,
 from ./include/linux/module.h:17,
 from drivers/dpll/dpll_netlink.c:9:
./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

  typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
 ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
  ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

  rcu_dereference_check(p, lockdep_rtnl_is_held())
  ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

  return rcu_dereference_rtnl(dev->dpll_pin);
 ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_netlink.o] 
Error 1

make[3]: *** [scripts/Makefile.build:481: drivers/dpll] Error 2
make[3]: *** Waiting for unfinished jobs
In file included from ./arch/powerpc/include/generated/asm/rwonce.h:1,
 from ./include/linux/compiler.h:251,
 from ./include/linux/instrumented.h:10,
 from ./include/linux/uaccess.h:6,
 from net/core/dev.c:71:
net/core/dev.c: In function ‘netdev_dpll_pin_assign’:
./include/linux/rcupdate.h:462:36: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

 #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
    ^~~~
./include/asm-generic/rwonce.h:55:33: note: in definition of macro 
‘__WRITE_ONCE’

  *(volatile typeof(x) *)&(x) = (val);    \
 ^~~
./arch/powerpc/include/asm/barrier.h:76:2: note: in expansion of macro 
‘WRITE_ONCE’

  WRITE_ONCE(*p, v);  \
  ^~
./include/asm-generic/barrier.h:172:55: note: in expansion of macro 
‘__smp_store_release’
 #define smp_store_release(p, v) do { kcsan_release(); 
__smp_store_release(p, v); } while (0)

^~~
./include/linux/rcupdate.h:503:3: note: in expansion of macro 
‘smp_store_release’

   smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
   ^
./include/linux/rcupdate.h:503:25: note: in expansion of macro 
‘RCU_INITIALIZER’

   smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
 ^~~
net/core/dev.c:9081:2: note: in expansion of macro ‘rcu_assign_pointer’
  rcu_assign_pointer(dev->dpll_pin, dpll_pin);
  ^~
make[4]: *** [scripts/Makefile.build:243: net/core/dev.o] Error 1
make[4]: *** Waiting for unfinished jobs
  AR  drivers/net/ethernet/built-in.a
  AR  drivers/net/built-in.a
  AR  net/dcb/built-in.a
  AR  net/netlabel/built-in.a
  AR  net/strparser/built-in.a
  AR  net/handshake/built-in.a
  GEN lib/test_fortify.log
  AR  net/8021q/built-in.a
  AR  net/nsh/built-in.a
  AR  net/unix/built-in.a
  CC  lib/string.o
  AR  net/packet/built-in.a
  AR  net/switchdev/built-in.a
  AR  lib/lib.a
  AR  net/mptcp/built-in.a
  AR  net/devlink/built-in.a
In file included from ./include/linux/rbtree.h:24,
 from ./include/linux/mm_types.h:11,
 from ./include/linux/mmzone.h:22,
 from ./include/linux/gfp.h:7,
 from ./include/linux/umh.h:4,
 from ./include/linux/kmod.h:9,
 from ./include/linux/module.h:17,
 from net/core/rtnetlink.c:17:
./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: 

Re: [revert 0d60d8df6f49] [netdev/net] [6.8-rc5] Build Failure

2024-02-28 Thread Tasmiya Nalatwad

Greetings,

[revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure

Reverting below commit fixes the issue

commit 0d60d8df6f493bb46bf5db40d39dd60a1bafdd4e
    dpll: rely on rcu for netdev_dpll_pin()

--- Traces ---

./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

  typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
 ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
  ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

  rcu_dereference_check(p, lockdep_rtnl_is_held())
  ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

  return rcu_dereference_rtnl(dev->dpll_pin);
 ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_core.o] 
Error 1

make[4]: *** Waiting for unfinished jobs
  AR  net/mpls/built-in.a
  AR  net/l3mdev/built-in.a
In file included from ./include/linux/rbtree.h:24,
 from ./include/linux/mm_types.h:11,
 from ./include/linux/mmzone.h:22,
 from ./include/linux/gfp.h:7,
 from ./include/linux/umh.h:4,
 from ./include/linux/kmod.h:9,
 from ./include/linux/module.h:17,
 from drivers/dpll/dpll_netlink.c:9:
./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

  typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
 ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
  ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

  rcu_dereference_check(p, lockdep_rtnl_is_held())
  ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

  return rcu_dereference_rtnl(dev->dpll_pin);
 ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_netlink.o] 
Error 1

make[3]: *** [scripts/Makefile.build:481: drivers/dpll] Error 2
make[3]: *** Waiting for unfinished jobs
In file included from ./arch/powerpc/include/generated/asm/rwonce.h:1,
 from ./include/linux/compiler.h:251,
 from ./include/linux/instrumented.h:10,
 from ./include/linux/uaccess.h:6,
 from net/core/dev.c:71:
net/core/dev.c: In function ‘netdev_dpll_pin_assign’:
./include/linux/rcupdate.h:462:36: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

 #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
    ^~~~
./include/asm-generic/rwonce.h:55:33: note: in definition of macro 
‘__WRITE_ONCE’

  *(volatile typeof(x) *)&(x) = (val);    \
 ^~~
./arch/powerpc/include/asm/barrier.h:76:2: note: in expansion of macro 
‘WRITE_ONCE’

  WRITE_ONCE(*p, v);  \
  ^~
./include/asm-generic/barrier.h:172:55: note: in expansion of macro 
‘__smp_store_release’
 #define smp_store_release(p, v) do { kcsan_release(); 
__smp_store_release(p, v); } while (0)

^~~
./include/linux/rcupdate.h:503:3: note: in expansion of macro 
‘smp_store_release’

   smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
   ^
./include/linux/rcupdate.h:503:25: note: in expansion of macro 
‘RCU_INITIALIZER’

   smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
 ^~~
net/core/dev.c:9081:2: note: in expansion of macro ‘rcu_assign_pointer’
  rcu_assign_pointer(dev->dpll_pin, dpll_pin);
  ^~
make[4]: *** [scripts/Makefile.build:243: net/core/dev.o] Error 1
make[4]: *** Waiting for unfinished jobs
  AR  drivers/net/ethernet/built-in.a
  AR  drivers/net/built-in.a
  AR  net/dcb/built-in.a
  AR  net/netlabel/built-in.a
  AR  net/strparser/built-in.a
  AR  net/handshake/built-in.a
  GEN lib/test_fortify.log
  AR  net/8021q/built-in.a
  AR  net/nsh/built-in.a
  AR  net/unix/built-in.a
  CC  lib/string.o
  AR  net/packet/built-in.a
  AR  net/switchdev/built-in.a
  AR  lib/lib.a
  AR  net/mptcp/built-in.a
  AR  net/devlink/built-in.a
In file included from ./include/linux/rbtree.h:24,
 from ./include/linux/mm_types.h:11,
 from ./include/linux/mmzone.h:22,
 from ./include/linux/gfp.h:7,
 from ./include/linux/umh.h:4,
 from ./include/linux/kmod.h:9,
 from ./include/linux/module.h:17,
 from net/core/rtnetlink.c:17:
./include/linux/dpll.h: 

[revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure

2024-02-28 Thread Tasmiya Nalatwad

Greetings,

[revert 0d60d8df6f49] [net/net-next] [6.8-rc5] Build Failure

Reverting below commit fixes the issue

commit 0d60d8df6f493bb46bf5db40d39dd60a1bafdd4e
    dpll: rely on rcu for netdev_dpll_pin()

--- Traces ---

./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

  typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
 ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
  ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

  rcu_dereference_check(p, lockdep_rtnl_is_held())
  ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

  return rcu_dereference_rtnl(dev->dpll_pin);
 ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_core.o] Error 1
make[4]: *** Waiting for unfinished jobs
  AR  net/mpls/built-in.a
  AR  net/l3mdev/built-in.a
In file included from ./include/linux/rbtree.h:24,
 from ./include/linux/mm_types.h:11,
 from ./include/linux/mmzone.h:22,
 from ./include/linux/gfp.h:7,
 from ./include/linux/umh.h:4,
 from ./include/linux/kmod.h:9,
 from ./include/linux/module.h:17,
 from drivers/dpll/dpll_netlink.c:9:
./include/linux/dpll.h: In function ‘netdev_dpll_pin’:
./include/linux/rcupdate.h:439:9: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

  typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
 ^
./include/linux/rcupdate.h:587:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), __UNIQUE_ID(rcu), \
  ^~~
./include/linux/rtnetlink.h:70:2: note: in expansion of macro 
‘rcu_dereference_check’

  rcu_dereference_check(p, lockdep_rtnl_is_held())
  ^
./include/linux/dpll.h:175:9: note: in expansion of macro 
‘rcu_dereference_rtnl’

  return rcu_dereference_rtnl(dev->dpll_pin);
 ^~~~
make[4]: *** [scripts/Makefile.build:243: drivers/dpll/dpll_netlink.o] 
Error 1

make[3]: *** [scripts/Makefile.build:481: drivers/dpll] Error 2
make[3]: *** Waiting for unfinished jobs
In file included from ./arch/powerpc/include/generated/asm/rwonce.h:1,
 from ./include/linux/compiler.h:251,
 from ./include/linux/instrumented.h:10,
 from ./include/linux/uaccess.h:6,
 from net/core/dev.c:71:
net/core/dev.c: In function ‘netdev_dpll_pin_assign’:
./include/linux/rcupdate.h:462:36: error: dereferencing pointer to 
incomplete type ‘struct dpll_pin’

 #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
    ^~~~
./include/asm-generic/rwonce.h:55:33: note: in definition of macro 
‘__WRITE_ONCE’

  *(volatile typeof(x) *)&(x) = (val);    \
 ^~~
./arch/powerpc/include/asm/barrier.h:76:2: note: in expansion of macro 
‘WRITE_ONCE’

  WRITE_ONCE(*p, v);  \
  ^~
./include/asm-generic/barrier.h:172:55: note: in expansion of macro 
‘__smp_store_release’
 #define smp_store_release(p, v) do { kcsan_release(); 
__smp_store_release(p, v); } while (0)

^~~
./include/linux/rcupdate.h:503:3: note: in expansion of macro 
‘smp_store_release’

   smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
   ^
./include/linux/rcupdate.h:503:25: note: in expansion of macro 
‘RCU_INITIALIZER’

   smp_store_release(, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
 ^~~
net/core/dev.c:9081:2: note: in expansion of macro ‘rcu_assign_pointer’
  rcu_assign_pointer(dev->dpll_pin, dpll_pin);
  ^~
make[4]: *** [scripts/Makefile.build:243: net/core/dev.o] Error 1
make[4]: *** Waiting for unfinished jobs
  AR  drivers/net/ethernet/built-in.a
  AR  drivers/net/built-in.a
  AR  net/dcb/built-in.a
  AR  net/netlabel/built-in.a
  AR  net/strparser/built-in.a
  AR  net/handshake/built-in.a
  GEN lib/test_fortify.log
  AR  net/8021q/built-in.a
  AR  net/nsh/built-in.a
  AR  net/unix/built-in.a
  CC  lib/string.o
  AR  net/packet/built-in.a
  AR  net/switchdev/built-in.a
  AR  lib/lib.a
  AR  net/mptcp/built-in.a
  AR  net/devlink/built-in.a
In file included from ./include/linux/rbtree.h:24,
 from ./include/linux/mm_types.h:11,
 from ./include/linux/mmzone.h:22,
 from ./include/linux/gfp.h:7,
 from ./include/linux/umh.h:4,
 from ./include/linux/kmod.h:9,
 from ./include/linux/module.h:17,
 from net/core/rtnetlink.c:17:
./include/linux/dpll.h: In 

Re: [PATCH 0/5] mm/treewide: Replace pXd_large() with pXd_leaf()

2024-02-28 Thread Peter Xu
On Wed, Feb 28, 2024 at 09:50:52AM +, Christophe Leroy wrote:
> Le 28/02/2024 à 09:53, pet...@redhat.com a écrit :
> > From: Peter Xu 
> > 
> > [based on latest akpm/mm-unstable, commit 1274e7646240]
> > 
> > These two APIs are mostly always the same.  It's confusing to have both of
> > them.  Merge them into one.  Here I used pXd_leaf() only because pXd_leaf()
> > is a global API which is always defined, while pXd_large() is not.
> > 
> > We have yet one more API that is similar which is pXd_huge(), but that's
> > even trickier, so let's do it step by step.
> > 
> > Some cautions are needed on either x86 or ppc: x86 is currently the only
> > user of p4d_large(), while ppc used to define pXd_large() only with THP,
> > while it is not the case for pXd_leaf().  For the rest archs, afaict
> > they're 100% identical.
> 
> Maybe would also be a good opportunity to replace pmd_is_leaf() by 
> pmd_leaf() and the same for pud_is_leaf()

Sure, while I'll wait for some more comments, I can add one on top when
repost.  Thanks,

-- 
Peter Xu



Re: [PATCH 0/5] mm/treewide: Replace pXd_large() with pXd_leaf()

2024-02-28 Thread Christophe Leroy


Le 28/02/2024 à 09:53, pet...@redhat.com a écrit :
> From: Peter Xu 
> 
> [based on latest akpm/mm-unstable, commit 1274e7646240]
> 
> These two APIs are mostly always the same.  It's confusing to have both of
> them.  Merge them into one.  Here I used pXd_leaf() only because pXd_leaf()
> is a global API which is always defined, while pXd_large() is not.
> 
> We have yet one more API that is similar which is pXd_huge(), but that's
> even trickier, so let's do it step by step.
> 
> Some cautions are needed on either x86 or ppc: x86 is currently the only
> user of p4d_large(), while ppc used to define pXd_large() only with THP,
> while it is not the case for pXd_leaf().  For the rest archs, afaict
> they're 100% identical.

Maybe would also be a good opportunity to replace pmd_is_leaf() by 
pmd_leaf() and the same for pud_is_leaf()

Christophe

> 
> Only lightly tested on x86.
> 
> Please have a look, thanks.
> 
> Peter Xu (5):
>mm/ppc: Define pXd_large() with pXd_leaf()
>mm/x86: Replace p4d_large() with p4d_leaf()
>mm/treewide: Replace pmd_large() with pmd_leaf()
>mm/treewide: Replace pud_large() with pud_leaf()
>mm/treewide: Drop pXd_large()
> 
>   arch/arm/include/asm/pgtable-2level.h|  1 -
>   arch/arm/include/asm/pgtable-3level.h|  1 -
>   arch/arm/mm/dump.c   |  4 ++--
>   arch/powerpc/include/asm/book3s/64/pgtable.h | 14 --
>   arch/powerpc/include/asm/pgtable.h   |  4 
>   arch/powerpc/mm/book3s64/pgtable.c   |  4 ++--
>   arch/powerpc/mm/book3s64/radix_pgtable.c |  2 +-
>   arch/powerpc/mm/pgtable_64.c |  2 +-
>   arch/s390/boot/vmem.c|  4 ++--
>   arch/s390/include/asm/pgtable.h  | 20 ++--
>   arch/s390/mm/gmap.c  | 14 +++---
>   arch/s390/mm/hugetlbpage.c   |  6 +++---
>   arch/s390/mm/pageattr.c  |  4 ++--
>   arch/s390/mm/pgtable.c   |  8 
>   arch/s390/mm/vmem.c  | 12 ++--
>   arch/sparc/include/asm/pgtable_64.h  |  8 
>   arch/sparc/mm/init_64.c  |  6 +++---
>   arch/x86/boot/compressed/ident_map_64.c  |  2 +-
>   arch/x86/include/asm/pgtable.h   | 15 +++
>   arch/x86/kvm/mmu/mmu.c   |  4 ++--
>   arch/x86/mm/fault.c  | 16 
>   arch/x86/mm/ident_map.c  |  2 +-
>   arch/x86/mm/init_32.c|  2 +-
>   arch/x86/mm/init_64.c| 14 +++---
>   arch/x86/mm/kasan_init_64.c  |  4 ++--
>   arch/x86/mm/mem_encrypt_identity.c   |  6 +++---
>   arch/x86/mm/pat/set_memory.c | 14 +++---
>   arch/x86/mm/pgtable.c|  4 ++--
>   arch/x86/mm/pti.c|  8 
>   arch/x86/power/hibernate.c   |  6 +++---
>   arch/x86/xen/mmu_pv.c| 10 +-
>   drivers/misc/sgi-gru/grufault.c  |  2 +-
>   32 files changed, 101 insertions(+), 122 deletions(-)
> 


[PATCH 3/5] mm/treewide: Replace pmd_large() with pmd_leaf()

2024-02-28 Thread peterx
From: Peter Xu 

pmd_large() is always defined as pmd_leaf().  Merge their usages.  Chose
pmd_leaf() because pmd_leaf() is a global API, while pmd_large() is not.

Signed-off-by: Peter Xu 
---
 arch/arm/mm/dump.c   |  4 ++--
 arch/powerpc/mm/book3s64/pgtable.c   |  2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c |  2 +-
 arch/powerpc/mm/pgtable_64.c |  2 +-
 arch/s390/boot/vmem.c|  2 +-
 arch/s390/include/asm/pgtable.h  |  8 
 arch/s390/mm/gmap.c  | 12 ++--
 arch/s390/mm/hugetlbpage.c   |  2 +-
 arch/s390/mm/pageattr.c  |  2 +-
 arch/s390/mm/pgtable.c   |  6 +++---
 arch/s390/mm/vmem.c  |  6 +++---
 arch/sparc/mm/init_64.c  |  4 ++--
 arch/x86/boot/compressed/ident_map_64.c  |  2 +-
 arch/x86/kvm/mmu/mmu.c   |  2 +-
 arch/x86/mm/fault.c  |  8 
 arch/x86/mm/init_32.c|  2 +-
 arch/x86/mm/init_64.c|  8 
 arch/x86/mm/kasan_init_64.c  |  2 +-
 arch/x86/mm/mem_encrypt_identity.c   |  4 ++--
 arch/x86/mm/pat/set_memory.c |  4 ++--
 arch/x86/mm/pgtable.c|  2 +-
 arch/x86/mm/pti.c|  4 ++--
 arch/x86/power/hibernate.c   |  2 +-
 arch/x86/xen/mmu_pv.c|  4 ++--
 drivers/misc/sgi-gru/grufault.c  |  2 +-
 25 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/arm/mm/dump.c b/arch/arm/mm/dump.c
index a9381095ab36..cd032522d902 100644
--- a/arch/arm/mm/dump.c
+++ b/arch/arm/mm/dump.c
@@ -349,12 +349,12 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, 
unsigned long start)
for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
addr = start + i * PMD_SIZE;
domain = get_domain_name(pmd);
-   if (pmd_none(*pmd) || pmd_large(*pmd) || !pmd_present(*pmd))
+   if (pmd_none(*pmd) || pmd_leaf(*pmd) || !pmd_present(*pmd))
note_page(st, addr, 4, pmd_val(*pmd), domain);
else
walk_pte(st, pmd, addr, domain);
 
-   if (SECTION_SIZE < PMD_SIZE && pmd_large(pmd[1])) {
+   if (SECTION_SIZE < PMD_SIZE && pmd_leaf(pmd[1])) {
addr += SECTION_SIZE;
pmd++;
domain = get_domain_name(pmd);
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 3438ab72c346..45f526547b27 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -113,7 +113,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 
WARN_ON(pte_hw_valid(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp)));
assert_spin_locked(pmd_lockptr(mm, pmdp));
-   WARN_ON(!(pmd_large(pmd)));
+   WARN_ON(!(pmd_leaf(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c6a4ac766b2b..4ef39c133777 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -924,7 +924,7 @@ bool vmemmap_can_optimize(struct vmem_altmap *altmap, 
struct dev_pagemap *pgmap)
 int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
unsigned long addr, unsigned long next)
 {
-   int large = pmd_large(*pmdp);
+   int large = pmd_leaf(*pmdp);
 
if (large)
vmemmap_verify(pmdp_ptep(pmdp), node, addr, next);
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 1b366526f4f2..6c2cdea340df 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -132,7 +132,7 @@ struct page *pmd_page(pmd_t pmd)
 * enabled so these checks can't be used.
 */
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
-   VM_WARN_ON(!(pmd_large(pmd) || pmd_huge(pmd)));
+   VM_WARN_ON(!(pmd_leaf(pmd) || pmd_huge(pmd)));
return pte_page(pmd_pte(pmd));
}
return virt_to_page(pmd_page_vaddr(pmd));
diff --git a/arch/s390/boot/vmem.c b/arch/s390/boot/vmem.c
index e3a4500a5a75..348ab02b1028 100644
--- a/arch/s390/boot/vmem.c
+++ b/arch/s390/boot/vmem.c
@@ -333,7 +333,7 @@ static void pgtable_pmd_populate(pud_t *pud, unsigned long 
addr, unsigned long e
}
pte = boot_pte_alloc();
pmd_populate(_mm, pmd, pte);
-   } else if (pmd_large(*pmd)) {
+   } else if (pmd_leaf(*pmd)) {
continue;
}
pgtable_pte_populate(pmd, addr, next, mode);
diff --git 

[PATCH 5/5] mm/treewide: Drop pXd_large()

2024-02-28 Thread peterx
From: Peter Xu 

They're not used anymore, drop all of them.

Signed-off-by: Peter Xu 
---
 arch/arm/include/asm/pgtable-2level.h|  1 -
 arch/arm/include/asm/pgtable-3level.h|  1 -
 arch/powerpc/include/asm/book3s/64/pgtable.h |  2 --
 arch/powerpc/include/asm/pgtable.h   |  4 
 arch/s390/include/asm/pgtable.h  |  8 
 arch/sparc/include/asm/pgtable_64.h  |  8 
 arch/x86/include/asm/pgtable.h   | 15 +++
 7 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h 
b/arch/arm/include/asm/pgtable-2level.h
index ce543cd9380c..b0a262566eb9 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -213,7 +213,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long 
addr)
 
 #define pmd_pfn(pmd)   (__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
 
-#define pmd_large(pmd) (pmd_val(pmd) & 2)
 #define pmd_leaf(pmd)  (pmd_val(pmd) & 2)
 #define pmd_bad(pmd)   (pmd_val(pmd) & 2)
 #define pmd_present(pmd)   (pmd_val(pmd))
diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index 71c3add6417f..4b1d9eb3908a 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -118,7 +118,6 @@
 PMD_TYPE_TABLE)
 #define pmd_sect(pmd)  ((pmd_val(pmd) & PMD_TYPE_MASK) == \
 PMD_TYPE_SECT)
-#define pmd_large(pmd) pmd_sect(pmd)
 #define pmd_leaf(pmd)  pmd_sect(pmd)
 
 #define pud_clear(pudp)\
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index d1318e8582ac..176d63ec5c3a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1441,7 +1441,6 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
  */
 #define pmd_is_leaf pmd_is_leaf
 #define pmd_leaf pmd_is_leaf
-#define pmd_large pmd_leaf
 static inline bool pmd_is_leaf(pmd_t pmd)
 {
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
@@ -1449,7 +1448,6 @@ static inline bool pmd_is_leaf(pmd_t pmd)
 
 #define pud_is_leaf pud_is_leaf
 #define pud_leaf pud_is_leaf
-#define pud_large pud_leaf
 static inline bool pud_is_leaf(pud_t pud)
 {
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 5928b3c1458d..8a19066e5e12 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -101,10 +101,6 @@ void poking_init(void);
 extern unsigned long ioremap_bot;
 extern const pgprot_t protection_map[16];
 
-#ifndef pmd_large
-#define pmd_large(pmd) 0
-#endif
-
 /* can we use this in kvm */
 unsigned long vmalloc_to_phys(void *vmalloc_addr);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index a5f16a244a64..9e08af5b9247 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -705,16 +705,16 @@ static inline int pud_none(pud_t pud)
return pud_val(pud) == _REGION3_ENTRY_EMPTY;
 }
 
-#define pud_leaf   pud_large
-static inline int pud_large(pud_t pud)
+#define pud_leaf pud_leaf
+static inline int pud_leaf(pud_t pud)
 {
if ((pud_val(pud) & _REGION_ENTRY_TYPE_MASK) != _REGION_ENTRY_TYPE_R3)
return 0;
return !!(pud_val(pud) & _REGION3_ENTRY_LARGE);
 }
 
-#define pmd_leaf   pmd_large
-static inline int pmd_large(pmd_t pmd)
+#define pmd_leaf pmd_leaf
+static inline int pmd_leaf(pmd_t pmd)
 {
return (pmd_val(pmd) & _SEGMENT_ENTRY_LARGE) != 0;
 }
diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 652af9d63fa2..6ff0a28d5fd1 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -680,8 +680,8 @@ static inline unsigned long pte_special(pte_t pte)
return pte_val(pte) & _PAGE_SPECIAL;
 }
 
-#define pmd_leaf   pmd_large
-static inline unsigned long pmd_large(pmd_t pmd)
+#define pmd_leaf pmd_leaf
+static inline unsigned long pmd_leaf(pmd_t pmd)
 {
pte_t pte = __pte(pmd_val(pmd));
 
@@ -867,8 +867,8 @@ static inline pmd_t *pud_pgtable(pud_t pud)
 /* only used by the stubbed out hugetlb gup code, should never be called */
 #define p4d_page(p4d)  NULL
 
-#define pud_leaf   pud_large
-static inline unsigned long pud_large(pud_t pud)
+#define pud_leaf pud_leaf
+static inline unsigned long pud_leaf(pud_t pud)
 {
pte_t pte = __pte(pud_val(pud));
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 69ed0ea0641b..87be73474e8d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -251,8 +251,8 @@ static inline unsigned 

[PATCH 4/5] mm/treewide: Replace pud_large() with pud_leaf()

2024-02-28 Thread peterx
From: Peter Xu 

pud_large() is always defined as pud_leaf().  Merge their usages.  Chose
pud_leaf() because pud_leaf() is a global API, while pud_large() is not.

Signed-off-by: Peter Xu 
---
 arch/powerpc/mm/book3s64/pgtable.c | 2 +-
 arch/s390/boot/vmem.c  | 2 +-
 arch/s390/include/asm/pgtable.h| 4 ++--
 arch/s390/mm/gmap.c| 2 +-
 arch/s390/mm/hugetlbpage.c | 4 ++--
 arch/s390/mm/pageattr.c| 2 +-
 arch/s390/mm/pgtable.c | 2 +-
 arch/s390/mm/vmem.c| 6 +++---
 arch/sparc/mm/init_64.c| 2 +-
 arch/x86/kvm/mmu/mmu.c | 2 +-
 arch/x86/mm/fault.c| 4 ++--
 arch/x86/mm/ident_map.c| 2 +-
 arch/x86/mm/init_64.c  | 4 ++--
 arch/x86/mm/kasan_init_64.c| 2 +-
 arch/x86/mm/mem_encrypt_identity.c | 2 +-
 arch/x86/mm/pat/set_memory.c   | 6 +++---
 arch/x86/mm/pgtable.c  | 2 +-
 arch/x86/mm/pti.c  | 2 +-
 arch/x86/power/hibernate.c | 2 +-
 arch/x86/xen/mmu_pv.c  | 4 ++--
 20 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 45f526547b27..83823db3488b 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -130,7 +130,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
 
WARN_ON(pte_hw_valid(pud_pte(*pudp)));
assert_spin_locked(pud_lockptr(mm, pudp));
-   WARN_ON(!(pud_large(pud)));
+   WARN_ON(!(pud_leaf(pud)));
 #endif
trace_hugepage_set_pud(addr, pud_val(pud));
return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
diff --git a/arch/s390/boot/vmem.c b/arch/s390/boot/vmem.c
index 348ab02b1028..09b10bb6e4d0 100644
--- a/arch/s390/boot/vmem.c
+++ b/arch/s390/boot/vmem.c
@@ -366,7 +366,7 @@ static void pgtable_pud_populate(p4d_t *p4d, unsigned long 
addr, unsigned long e
}
pmd = boot_crst_alloc(_SEGMENT_ENTRY_EMPTY);
pud_populate(_mm, pud, pmd);
-   } else if (pud_large(*pud)) {
+   } else if (pud_leaf(*pud)) {
continue;
}
pgtable_pmd_populate(pud, addr, next, mode);
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 431d03d5116b..a5f16a244a64 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -730,7 +730,7 @@ static inline int pud_bad(pud_t pud)
 {
unsigned long type = pud_val(pud) & _REGION_ENTRY_TYPE_MASK;
 
-   if (type > _REGION_ENTRY_TYPE_R3 || pud_large(pud))
+   if (type > _REGION_ENTRY_TYPE_R3 || pud_leaf(pud))
return 1;
if (type < _REGION_ENTRY_TYPE_R3)
return 0;
@@ -1400,7 +1400,7 @@ static inline unsigned long pud_deref(pud_t pud)
unsigned long origin_mask;
 
origin_mask = _REGION_ENTRY_ORIGIN;
-   if (pud_large(pud))
+   if (pud_leaf(pud))
origin_mask = _REGION3_ENTRY_ORIGIN_LARGE;
return (unsigned long)__va(pud_val(pud) & origin_mask);
 }
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index a4f34c1db3cf..dcb38e351fa6 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -596,7 +596,7 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, 
unsigned long vmaddr)
pud = pud_offset(p4d, vmaddr);
VM_BUG_ON(pud_none(*pud));
/* large puds cannot yet be handled */
-   if (pud_large(*pud))
+   if (pud_leaf(*pud))
return -EFAULT;
pmd = pmd_offset(pud, vmaddr);
VM_BUG_ON(pmd_none(*pmd));
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 1ccb5b40fe92..c2e8242bd15d 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -224,7 +224,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
if (p4d_present(*p4dp)) {
pudp = pud_offset(p4dp, addr);
if (pud_present(*pudp)) {
-   if (pud_large(*pudp))
+   if (pud_leaf(*pudp))
return (pte_t *) pudp;
pmdp = pmd_offset(pudp, addr);
}
@@ -240,7 +240,7 @@ int pmd_huge(pmd_t pmd)
 
 int pud_huge(pud_t pud)
 {
-   return pud_large(pud);
+   return pud_leaf(pud);
 }
 
 bool __init arch_hugetlb_valid_size(unsigned long size)
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index 9f55d5a3210c..01bc8fad64d6 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -274,7 +274,7 @@ static int walk_pud_level(p4d_t *p4d, unsigned long addr, 
unsigned long end,
if (pud_none(*pudp))
return -EINVAL;
next = pud_addr_end(addr, end);
-   if (pud_large(*pudp)) {
+

[PATCH 2/5] mm/x86: Replace p4d_large() with p4d_leaf()

2024-02-28 Thread peterx
From: Peter Xu 

p4d_large() is always defined as p4d_leaf().  Merge their usages.  Chose
p4d_leaf() because p4d_leaf() is a global API, while p4d_large() is not.

Only x86 has p4d_leaf() defined as of now.  So it also means after this
patch we removed all p4d_large() usages.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Signed-off-by: Peter Xu 
---
 arch/x86/mm/fault.c  | 4 ++--
 arch/x86/mm/init_64.c| 2 +-
 arch/x86/mm/pat/set_memory.c | 4 ++--
 arch/x86/mm/pti.c| 2 +-
 arch/x86/power/hibernate.c   | 2 +-
 arch/x86/xen/mmu_pv.c| 2 +-
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09cfe241..8b69ce3f4115 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -368,7 +368,7 @@ static void dump_pagetable(unsigned long address)
goto bad;
 
pr_cont("P4D %lx ", p4d_val(*p4d));
-   if (!p4d_present(*p4d) || p4d_large(*p4d))
+   if (!p4d_present(*p4d) || p4d_leaf(*p4d))
goto out;
 
pud = pud_offset(p4d, address);
@@ -1039,7 +1039,7 @@ spurious_kernel_fault(unsigned long error_code, unsigned 
long address)
if (!p4d_present(*p4d))
return 0;
 
-   if (p4d_large(*p4d))
+   if (p4d_leaf(*p4d))
return spurious_kernel_fault_check(error_code, (pte_t *) p4d);
 
pud = pud_offset(p4d, address);
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ebdbcae48011..d691e7992a9a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1197,7 +1197,7 @@ remove_p4d_table(p4d_t *p4d_start, unsigned long addr, 
unsigned long end,
if (!p4d_present(*p4d))
continue;
 
-   BUILD_BUG_ON(p4d_large(*p4d));
+   BUILD_BUG_ON(p4d_leaf(*p4d));
 
pud_base = pud_offset(p4d, 0);
remove_pud_table(pud_base, addr, next, altmap, direct);
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index e9b448d1b1b7..5359a9c88099 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -676,7 +676,7 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long 
address,
return NULL;
 
*level = PG_LEVEL_512G;
-   if (p4d_large(*p4d) || !p4d_present(*p4d))
+   if (p4d_leaf(*p4d) || !p4d_present(*p4d))
return (pte_t *)p4d;
 
pud = pud_offset(p4d, address);
@@ -739,7 +739,7 @@ pmd_t *lookup_pmd_address(unsigned long address)
return NULL;
 
p4d = p4d_offset(pgd, address);
-   if (p4d_none(*p4d) || p4d_large(*p4d) || !p4d_present(*p4d))
+   if (p4d_none(*p4d) || p4d_leaf(*p4d) || !p4d_present(*p4d))
return NULL;
 
pud = pud_offset(p4d, address);
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 669ba1c345b3..dc0a81f5f60e 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -206,7 +206,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!p4d)
return NULL;
 
-   BUILD_BUG_ON(p4d_large(*p4d) != 0);
+   BUILD_BUG_ON(p4d_leaf(*p4d) != 0);
if (p4d_none(*p4d)) {
unsigned long new_pud_page = __get_free_page(gfp);
if (WARN_ON_ONCE(!new_pud_page))
diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
index 6f955eb1e163..28153789f873 100644
--- a/arch/x86/power/hibernate.c
+++ b/arch/x86/power/hibernate.c
@@ -165,7 +165,7 @@ int relocate_restore_code(void)
pgd = (pgd_t *)__va(read_cr3_pa()) +
pgd_index(relocated_restore_code);
p4d = p4d_offset(pgd, relocated_restore_code);
-   if (p4d_large(*p4d)) {
+   if (p4d_leaf(*p4d)) {
set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
goto out;
}
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index e21974f2cf2d..12a43a4abebf 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1104,7 +1104,7 @@ static void __init xen_cleanmfnmap_p4d(p4d_t *p4d, bool 
unpin)
pud_t *pud_tbl;
int i;
 
-   if (p4d_large(*p4d)) {
+   if (p4d_leaf(*p4d)) {
pa = p4d_val(*p4d) & PHYSICAL_PAGE_MASK;
xen_free_ro_pages(pa, P4D_SIZE);
return;
-- 
2.43.0



[PATCH 1/5] mm/ppc: Define pXd_large() with pXd_leaf()

2024-02-28 Thread peterx
From: Peter Xu 

The two definitions are the same.  The only difference is that pXd_large()
is only defined with THP selected, and only on book3s 64bits.

Instead of implementing it twice, make pXd_large() a macro to pXd_leaf().
Define it unconditionally just like pXd_leaf().  This helps to prepare
merging the two APIs.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Signed-off-by: Peter Xu 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 16 ++--
 arch/powerpc/include/asm/pgtable.h   |  2 +-
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 927d585652bc..d1318e8582ac 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1157,20 +1157,6 @@ pud_hugepage_update(struct mm_struct *mm, unsigned long 
addr, pud_t *pudp,
return pud_val(*pudp);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
-static inline int pud_large(pud_t pud)
-{
-   return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
-}
-
 /*
  * For radix we should always find H_PAGE_HASHPTE zero. Hence
  * the below will work for radix too
@@ -1455,6 +1441,7 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
  */
 #define pmd_is_leaf pmd_is_leaf
 #define pmd_leaf pmd_is_leaf
+#define pmd_large pmd_leaf
 static inline bool pmd_is_leaf(pmd_t pmd)
 {
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
@@ -1462,6 +1449,7 @@ static inline bool pmd_is_leaf(pmd_t pmd)
 
 #define pud_is_leaf pud_is_leaf
 #define pud_leaf pud_is_leaf
+#define pud_large pud_leaf
 static inline bool pud_is_leaf(pud_t pud)
 {
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 7a1ba8889aea..5928b3c1458d 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -101,7 +101,7 @@ void poking_init(void);
 extern unsigned long ioremap_bot;
 extern const pgprot_t protection_map[16];
 
-#ifndef CONFIG_TRANSPARENT_HUGEPAGE
+#ifndef pmd_large
 #define pmd_large(pmd) 0
 #endif
 
-- 
2.43.0



[PATCH 0/5] mm/treewide: Replace pXd_large() with pXd_leaf()

2024-02-28 Thread peterx
From: Peter Xu 

[based on latest akpm/mm-unstable, commit 1274e7646240]

These two APIs are mostly always the same.  It's confusing to have both of
them.  Merge them into one.  Here I used pXd_leaf() only because pXd_leaf()
is a global API which is always defined, while pXd_large() is not.

We have yet one more API that is similar which is pXd_huge(), but that's
even trickier, so let's do it step by step.

Some cautions are needed on either x86 or ppc: x86 is currently the only
user of p4d_large(), while ppc used to define pXd_large() only with THP,
while it is not the case for pXd_leaf().  For the rest archs, afaict
they're 100% identical.

Only lightly tested on x86.

Please have a look, thanks.

Peter Xu (5):
  mm/ppc: Define pXd_large() with pXd_leaf()
  mm/x86: Replace p4d_large() with p4d_leaf()
  mm/treewide: Replace pmd_large() with pmd_leaf()
  mm/treewide: Replace pud_large() with pud_leaf()
  mm/treewide: Drop pXd_large()

 arch/arm/include/asm/pgtable-2level.h|  1 -
 arch/arm/include/asm/pgtable-3level.h|  1 -
 arch/arm/mm/dump.c   |  4 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h | 14 --
 arch/powerpc/include/asm/pgtable.h   |  4 
 arch/powerpc/mm/book3s64/pgtable.c   |  4 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c |  2 +-
 arch/powerpc/mm/pgtable_64.c |  2 +-
 arch/s390/boot/vmem.c|  4 ++--
 arch/s390/include/asm/pgtable.h  | 20 ++--
 arch/s390/mm/gmap.c  | 14 +++---
 arch/s390/mm/hugetlbpage.c   |  6 +++---
 arch/s390/mm/pageattr.c  |  4 ++--
 arch/s390/mm/pgtable.c   |  8 
 arch/s390/mm/vmem.c  | 12 ++--
 arch/sparc/include/asm/pgtable_64.h  |  8 
 arch/sparc/mm/init_64.c  |  6 +++---
 arch/x86/boot/compressed/ident_map_64.c  |  2 +-
 arch/x86/include/asm/pgtable.h   | 15 +++
 arch/x86/kvm/mmu/mmu.c   |  4 ++--
 arch/x86/mm/fault.c  | 16 
 arch/x86/mm/ident_map.c  |  2 +-
 arch/x86/mm/init_32.c|  2 +-
 arch/x86/mm/init_64.c| 14 +++---
 arch/x86/mm/kasan_init_64.c  |  4 ++--
 arch/x86/mm/mem_encrypt_identity.c   |  6 +++---
 arch/x86/mm/pat/set_memory.c | 14 +++---
 arch/x86/mm/pgtable.c|  4 ++--
 arch/x86/mm/pti.c|  8 
 arch/x86/power/hibernate.c   |  6 +++---
 arch/x86/xen/mmu_pv.c| 10 +-
 drivers/misc/sgi-gru/grufault.c  |  2 +-
 32 files changed, 101 insertions(+), 122 deletions(-)

-- 
2.43.0