Re: [PATCH] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]

2022-05-18 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [1], binutils (v2.36+) started dropping section symbols that
> it thought were unused.  This isn't an issue in general, but with
> kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
> separate .text.unlikely section and the section symbol ".text.unlikely"
> is being dropped. Due to this, recordmcount is unable to find a non-weak
> symbol in .text.unlikely to generate a relocation record against.
>
> Address this by dropping the weak attribute from these functions:
> - arch_kexec_apply_relocations() is not overridden by any architecture
>   today, so just drop the weak attribute.
> - arch_kexec_apply_relocations_add() is only overridden by x86 and s390.
>   Retain the function prototype for those and move the weak
>   implementation into the header as a static inline for other
>   architectures.
>
> [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
>
> Signed-off-by: Naveen N. Rao 
> ---
>  include/linux/kexec.h | 28 
>  kernel/kexec_file.c   | 19 +--
>  2 files changed, 25 insertions(+), 22 deletions(-)

I think it could be cleaner done with the #define foo foo style, see
patch below. It does have its downsides, but for a simple hook like this
I think it's not too bad.

cheers


(only build tested)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 7f3c9ac34bd8..e818b58ccc43 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -74,6 +74,8 @@ void *kexec_file_add_components(struct kimage *image,
 int arch_kexec_do_relocs(int r_type, void *loc, unsigned long val,
 unsigned long addr);
 
+#define arch_kexec_apply_relocations_add arch_kexec_apply_relocations_add
+
 #define ARCH_HAS_KIMAGE_ARCH
 
 struct kimage_arch {
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 11b7c06e2828..58e3939a350a 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -186,6 +186,8 @@ extern int arch_kexec_post_alloc_pages(void *vaddr, 
unsigned int pages,
 extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
 #define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
 
+#define arch_kexec_apply_relocations_add arch_kexec_apply_relocations_add
+
 #endif
 
 typedef void crash_vmclear_fn(void);
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..6f07acb59f29 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -108,6 +108,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage 
*image, void *buf,
 }
 #endif
 
+#ifndef arch_kexec_apply_relocations_add
 /*
  * arch_kexec_apply_relocations_add - apply relocations of type RELA
  * @pi:Purgatory to be relocated.
@@ -117,14 +118,16 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage 
*image, void *buf,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
pr_err("RELA relocation unsupported.\n");
return -ENOEXEC;
 }
+#endif
 
+#ifndef arch_kexec_apply_relocations
 /*
  * arch_kexec_apply_relocations - apply relocations of type REL
  * @pi:Purgatory to be relocated.
@@ -134,13 +137,14 @@ arch_kexec_apply_relocations_add(struct purgatory_info 
*pi, Elf_Shdr *section,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {
pr_err("REL relocation unsupported.\n");
return -ENOEXEC;
 }
+#endif
 
 /*
  * Free up memory used by kernel, initrd, and command line. This is temporary



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Guilherme G. Piccoli
On 18/05/2022 04:33, Petr Mladek wrote:
> [...]
> Anyway, I would distinguish it the following way.
> 
>   + If the notifier is preserving kernel log then it should be ideally
> treated as kmsg_dump().
> 
>   + It the notifier is saving another debugging data then it better
> fits into the "hypervisor" notifier list.
> 
>

Definitely, I agree - it's logical, since we want more info in the logs,
and happens some notifiers running in the informational list do that,
like ftrace_on_oops for example.


> Regarding the reliability. From my POV, any panic notifier enabled
> in a generic kernel should be reliable with more than 99,9%.
> Otherwise, they should not be in the notifier list at all.
> 
> An exception would be a platform-specific notifier that is
> called only on some specific platform and developers maintaining
> this platform agree on this.
> 
> The value "99,9%" is arbitrary. I am not sure if it is realistic
> even in the other code, for example, console_flush_on_panic()
> or emergency_restart(). I just want to point out that the border
> should be rather high. Otherwise we would back in the situation
> where people would want to disable particular notifiers.
> 

Totally agree, these percentages are just an example, 50% is ridiculous
low reliability in my example heheh

But some notifiers deep dive in abstraction layers (like regmap or GPIO
stuff) and it's hard to determine the probability of a lock issue (take
a spinlock already taken inside regmap code and live-lock forever, for
example). These are better to run, if possible, later than kdump or even
info list.

Thanks again for the good analysis Petr!
Cheers,


Guilherme




Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Guilherme G. Piccoli
On 18/05/2022 04:58, Petr Mladek wrote:
> [...]
>> I does similar things like kmsg_dump() so it should be called in
>> the same location (after info notifier list and before kdump).
>>
>> A solution might be to put it at these notifiers at the very
>> end of the "info" list or make extra "dump" notifier list.
> 
> I just want to point out that the above idea has problems.
> Notifiers storing kernel log need to be treated as kmsg_dump().
> In particular, we would  need to know if there are any.
> We do not need to call "info" notifier list before kdump
> when there is no kernel log dumper registered.
> 

Notifiers respect the priority concept, which is just a number that
orders the list addition (and the list is called in order).

I've used the last position to panic_print() [in patch 25] - one idea
here is to "reserve" the last position (represented by INT_MIN) for
notifiers that act like kmsg_dump(). I couldn't find any IIRC, but that
doesn't prevent us to save this position and comment about that.

Makes sense to you ?
Cheers!


Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Guilherme G. Piccoli
On 18/05/2022 04:38, Petr Mladek wrote:
> [...]
> I have answered this in more detail in the other reply, see
> https://lore.kernel.org/r/YoShZVYNAdvvjb7z@alley
> 
> I agree that both notifiers in
> 
> drivers/soc/bcm/brcmstb/pm/pm-arm.c
> drivers/firmware/google/gsmi.c
> 
> better fit into the hypervisor list after all.
> 
> Best Regards,
> Petr

Perfect, thanks - will keep both in such list for V2.


Re: [PATCH] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]

2022-05-18 Thread Baoquan He
Hi Eric,

On 05/18/22 at 04:59pm, Eric W. Biederman wrote:
> "Naveen N. Rao"  writes:
> 
> > Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> > symbols") [1], binutils (v2.36+) started dropping section symbols that
> > it thought were unused.  This isn't an issue in general, but with
> > kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
> > separate .text.unlikely section and the section symbol ".text.unlikely"
> > is being dropped. Due to this, recordmcount is unable to find a non-weak
> > symbol in .text.unlikely to generate a relocation record against.
> >
> > Address this by dropping the weak attribute from these functions:
> > - arch_kexec_apply_relocations() is not overridden by any architecture
> >   today, so just drop the weak attribute.
> > - arch_kexec_apply_relocations_add() is only overridden by x86 and s390.
> >   Retain the function prototype for those and move the weak
> >   implementation into the header as a static inline for other
> >   architectures.
> >
> > [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Any chance you can also get machine_kexec_post_load,
> crash_free_reserved_phys_range, arch_kexec_protect_protect_crashkres,
> arch_kexec_unprotect_crashkres, arch_kexec_kernel_image_probe,
> arch_kexec_kernel_image_probe, arch_kimage_file_post_load_cleanup,
> arch_kexec_kernel_verify_sig, and arch_kexec_locate_mem_hole as well.
> 
> That is everything in kexec that uses a __weak symbol.  If we can't
> count on them working we might as well just get rid of the rest
> preemptively.

Is there a new rule that __weak is not suggested in kernel any more?
Please help provide a pointer if yes, so that I can learn that.

In my mind, __weak is very simple and clear as a mechanism to add
ARCH related functionality.

Thanks
Baoquan



Re: [PATCH v1 3/4] powerpc/code-patching: Use jump_label for testing freed initmem

2022-05-18 Thread Guenter Roeck
On Tue, Mar 22, 2022 at 04:40:20PM +0100, Christophe Leroy wrote:
> Once init is done, initmem is freed forever so no need to
> test system_state at every call to patch_instruction().
> 
> Use jump_label.
> 
> This reduces by 2% the time needed to activate ftrace on an 8xx.
> 

It also causes the qemu mpc8544ds emulation to crash.

BUG: Unable to handle kernel data access on write at 0xc122eb34
Faulting instruction address: 0xc001b580
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MPC8544 DS
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 5.18.0-rc7-next-20220518 #1
NIP:  c001b580 LR: c001b560 CTR: 0003
REGS: c5107dd0 TRAP: 0300   Not tainted  (5.18.0-rc7-next-20220518)
MSR:  9000   CR: 24000882  XER: 
DEAR: c122eb34 ESR: 0080
GPR00: c001b560 c5107ec0 c5120020 1000  0078 0c00 cfff
GPR08: c001e9ec 0001 0007  44000882  c0005178 
GPR16:        
GPR24:        c123
NIP [c001b580] free_initmem+0x48/0xa8
LR [c001b560] free_initmem+0x28/0xa8
Call Trace:
[c5107ec0] [c001b560] free_initmem+0x28/0xa8 (unreliable)
[c5107ee0] [c00051b0] kernel_init+0x38/0x150
[c5107f00] [c001626c] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
3fe0c123 912a00dc 90010024 48000665 3d20c218 8929fa65 2c09 41820058
813feb34 2c09 4082003c 3921 <913feb34> 80010024 3cc0c114 83e1001c

Reverting this patch fixes the problem.

Guenter

---
# bad: [736ee37e2e8eed7fe48d0a37ee5a709514d478b3] Add linux-next specific files 
for 20220518
# good: [42226c989789d8da4af1de0c31070c96726d990c] Linux 5.18-rc7
git bisect start 'HEAD' 'v5.18-rc7'
# bad: [555b5fa93f08980ccb6bc8e196226046fe047901] Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect bad 555b5fa93f08980ccb6bc8e196226046fe047901
# bad: [8f5ef5e622d3f217d6542779723566099f370c31] Merge branch 'for-next' of 
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
git bisect bad 8f5ef5e622d3f217d6542779723566099f370c31
# good: [2b7d17d4b7c1ff40f58b0d32be40fc0bb6c582fb] soc: document merges
git bisect good 2b7d17d4b7c1ff40f58b0d32be40fc0bb6c582fb
# good: [4964f9250fbf76cb0b9c1124d5b9ab65de9bfd0e] Merge branch 'clk-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
git bisect good 4964f9250fbf76cb0b9c1124d5b9ab65de9bfd0e
# bad: [18fae10a22071ccd0a2c44df2749ff482132774e] Merge branch 'for-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
git bisect bad 18fae10a22071ccd0a2c44df2749ff482132774e
# bad: [b4a551e4ab7f03eec509d3710d50e52e87a6] Merge branch 'for-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git
git bisect bad b4a551e4ab7f03eec509d3710d50e52e87a6
# bad: [b6b1c3ce06ca438eb24e0f45bf0e63ecad0369f5] powerpc/rtas: Keep MSR[RI] 
set when calling RTAS
git bisect bad b6b1c3ce06ca438eb24e0f45bf0e63ecad0369f5
# good: [87ccc6684d3b57e3073f77cf28396b3037154193] powerpc/book3e: Fix sparse 
report in mm/nohash/fsl_book3e.c
git bisect good 87ccc6684d3b57e3073f77cf28396b3037154193
# good: [f31c618373f2051a32e30002d8eacad7bbbd3885] powerpc: Sort and de-dup 
primary opcodes in ppc-opcode.h
git bisect good f31c618373f2051a32e30002d8eacad7bbbd3885
# good: [9290c379d19774d8de6e2b895d756004dbad9ce5] powerpc/8xx: Simplify 
flush_tlb_kernel_range()
git bisect good 9290c379d19774d8de6e2b895d756004dbad9ce5
# bad: [d8d2af70b98109418bb16ff6638d7c1c4336f7fe] cxl/ocxl: Prepare cleanup of 
powerpc's asm/prom.h
git bisect bad d8d2af70b98109418bb16ff6638d7c1c4336f7fe
# bad: [b033767848c4115e486b1a51946de3bee2ac0fa6] powerpc/code-patching: Use 
jump_label for testing freed initmem
git bisect bad b033767848c4115e486b1a51946de3bee2ac0fa6
# good: [cb3ac45214c03852430979a43180371a44b74596] powerpc/code-patching: Don't 
call is_vmalloc_or_module_addr() without CONFIG_MODULES
git bisect good cb3ac45214c03852430979a43180371a44b74596
# first bad commit: [b033767848c4115e486b1a51946de3bee2ac0fa6] 
powerpc/code-patching: Use jump_label for testing freed initmem


Re: [PATCH] powerpc/vdso: Fix incorrect CFI in gettimeofday.S

2022-05-18 Thread Alan Modra
On Tue, May 17, 2022 at 10:32:09PM +1000, Michael Ellerman wrote:
> "Naveen N. Rao"  writes:
> > Michael Ellerman wrote:
> >>
> >> diff --git a/arch/powerpc/kernel/vdso/gettimeofday.S 
> >> b/arch/powerpc/kernel/vdso/gettimeofday.S
> >> index eb9c81e1c218..0aee255e9cbb 100644
> >> --- a/arch/powerpc/kernel/vdso/gettimeofday.S
> >> +++ b/arch/powerpc/kernel/vdso/gettimeofday.S
> >> @@ -22,12 +22,15 @@
> >>  .macro cvdso_call funct call_time=0
> >>.cfi_startproc
> >>PPC_STLUr1, -PPC_MIN_STKFRM(r1)
> >> +  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
> >>mflrr0
> >> -  .cfi_register lr, r0
> >>PPC_STLUr1, -PPC_MIN_STKFRM(r1)
> >> +  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
> >>PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
> >
> > 
> >
> >> @@ -46,6 +50,7 @@
> >>mtlrr0
> >>.cfi_restore lr
> >>addir1, r1, 2 * PPC_MIN_STKFRM
> >> +  .cfi_def_cfa_offset 0
> >
> > Should this be .cfi_adjust_cfa_offset, given that we used that at the
> > start of the function?
>  
> AIUI "adjust x" is offset += x, whereas "def x" is offset = x.

Yes.

> So we could use adjust here, but we'd need to adjust by -(2 * PPC_MIN_STKFRM).

Yes.

> It seemed clearer to just set the offset back to 0, which is what it is
> at the start of the function.

Yes.  In detail, both .cfi_def_cfa_offset and .cfi_adjust_cfa_offset
are interpreteted by the assembler into DW_CFA_def_cfa_offset byte
codes, so you should get the same .eh_frame contents if using Naveen's
suggestion.  It boils down to style really, and the most common style
is to use ".cfi_def_cfa_offset 0" here.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]

2022-05-18 Thread Eric W. Biederman
"Naveen N. Rao"  writes:

> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [1], binutils (v2.36+) started dropping section symbols that
> it thought were unused.  This isn't an issue in general, but with
> kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
> separate .text.unlikely section and the section symbol ".text.unlikely"
> is being dropped. Due to this, recordmcount is unable to find a non-weak
> symbol in .text.unlikely to generate a relocation record against.
>
> Address this by dropping the weak attribute from these functions:
> - arch_kexec_apply_relocations() is not overridden by any architecture
>   today, so just drop the weak attribute.
> - arch_kexec_apply_relocations_add() is only overridden by x86 and s390.
>   Retain the function prototype for those and move the weak
>   implementation into the header as a static inline for other
>   architectures.
>
> [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Any chance you can also get machine_kexec_post_load,
crash_free_reserved_phys_range, arch_kexec_protect_protect_crashkres,
arch_kexec_unprotect_crashkres, arch_kexec_kernel_image_probe,
arch_kexec_kernel_image_probe, arch_kimage_file_post_load_cleanup,
arch_kexec_kernel_verify_sig, and arch_kexec_locate_mem_hole as well.

That is everything in kexec that uses a __weak symbol.  If we can't
count on them working we might as well just get rid of the rest
preemptively.

Could you also address Andrews concerns by using a Kconfig symbol that
the architectures that implement the symbol can select.

I don't want to ask too much of a volunteer but if you are willing
addressing both of those would be a great help.

Eric

> Signed-off-by: Naveen N. Rao 
> ---
>  include/linux/kexec.h | 28 
>  kernel/kexec_file.c   | 19 +--
>  2 files changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 58d1b58a971e34..e656f981f43a73 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -193,10 +193,6 @@ void *kexec_purgatory_get_symbol_addr(struct kimage 
> *image, const char *name);
>  int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
> unsigned long buf_len);
>  void *arch_kexec_kernel_image_load(struct kimage *image);
> -int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
> -  Elf_Shdr *section,
> -  const Elf_Shdr *relsec,
> -  const Elf_Shdr *symtab);
>  int arch_kexec_apply_relocations(struct purgatory_info *pi,
>Elf_Shdr *section,
>const Elf_Shdr *relsec,
> @@ -229,6 +225,30 @@ extern int crash_exclude_mem_range(struct crash_mem *mem,
>  unsigned long long mend);
>  extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
>  void **addr, unsigned long *sz);
> +
> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
> +int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
> +  Elf_Shdr *section,
> +  const Elf_Shdr *relsec,
> +  const Elf_Shdr *symtab);
> +#else
> +/*
> + * arch_kexec_apply_relocations_add - apply relocations of type RELA
> + * @pi:  Purgatory to be relocated.
> + * @section: Section relocations applying to.
> + * @relsec:  Section containing RELAs.
> + * @symtab:  Corresponding symtab.
> + *
> + * Return: 0 on success, negative errno on error.
> + */
> +static inline int
> +arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr 
> *section,
> +  const Elf_Shdr *relsec, const Elf_Shdr *symtab)
> +{
> + pr_err("RELA relocation unsupported.\n");
> + return -ENOEXEC;
> +}
> +#endif /* CONFIG_X86_64 || CONFIG_S390 */
>  #endif /* CONFIG_KEXEC_FILE */
>  
>  #ifdef CONFIG_KEXEC_ELF
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b96..6bae253b4d315e 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -108,23 +108,6 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage 
> *image, void *buf,
>  }
>  #endif
>  
> -/*
> - * arch_kexec_apply_relocations_add - apply relocations of type RELA
> - * @pi:  Purgatory to be relocated.
> - * @section: Section relocations applying to.
> - * @relsec:  Section containing RELAs.
> - * @symtab:  Corresponding symtab.
> - *
> - * Return: 0 on success, negative errno on error.
> - */
> -int __weak
> -arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr 
> *section,
> -  const Elf_Shdr *relsec, const Elf_Shdr *symtab)
> -{
> - pr_err("RELA relocation unsupported.\n");
> - return -ENOEXEC;
> -}
> 

Re: [PATCH 2/2] drm/tiny: Add ofdrm for Open Firmware framebuffers

2022-05-18 Thread Mark Cave-Ayland

On 18/05/2022 19:30, Thomas Zimmermann wrote:


Open Firmware provides basic display output via the 'display' node.
DT platform code already provides a device that represents the node's
framebuffer. Add a DRM driver for the device. The display mode and
color format is pre-initialized by the system's firmware. Runtime
modesetting via DRM is not possible. The display is useful during
early boot stages or as error fallback.

Similar functionality is already provided by fbdev's offb driver,
which is insufficient for modern userspace. The old driver includes
support for BootX device tree, which can be found on old 32-bit
PowerPC Macintosh systems. If these are still in use, the
functionality can be added to ofdrm or implemented in a new
driver. As with simepldrm, the fbdev driver cannot be selected is
ofdrm is already enabled.

Two noteable points about the driver:

  * Reading the framebuffer aperture from the device tree is not
reliable on all systems. Ofdrm takes the heuristics and a comment
from offb to pick the correct range.

  * No resource management may be tied to the underlying PCI device.
Otherwise the handover to the native driver will fail with a resource
conflict. PCI management is therefore done as part of the platform
device's cleanup.

The driver has been tested on qemu's ppc64le emulation. The device
hand-over has been tested with bochs.


Thanks for working on this! Have you tried it on qemu-system-sparc and 
qemu-system-sparc64 at all? At least under QEMU I'd expect it to work for these 
platforms too, unless there is a particular dependency on PCI. A couple of comments 
inline below:



Signed-off-by: Thomas Zimmermann 
---
  MAINTAINERS   |   1 +
  drivers/gpu/drm/tiny/Kconfig  |  12 +
  drivers/gpu/drm/tiny/Makefile |   1 +
  drivers/gpu/drm/tiny/ofdrm.c  | 748 ++
  drivers/video/fbdev/Kconfig   |   1 +
  5 files changed, 763 insertions(+)
  create mode 100644 drivers/gpu/drm/tiny/ofdrm.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 43d833273ae9..090cbe1aa5e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6395,6 +6395,7 @@ L:dri-de...@lists.freedesktop.org
  S:Maintained
  T:git git://anongit.freedesktop.org/drm/drm-misc
  F:drivers/gpu/drm/drm_aperture.c
+F: drivers/gpu/drm/tiny/ofdrm.c
  F:drivers/gpu/drm/tiny/simpledrm.c
  F:include/drm/drm_aperture.h
  
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig

index 627d637a1e7e..0bc54af42e7f 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -51,6 +51,18 @@ config DRM_GM12U320
 This is a KMS driver for projectors which use the GM12U320 chipset
 for video transfer over USB2/3, such as the Acer C120 mini projector.
  
+config DRM_OFDRM

+   tristate "Open Firmware display driver"
+   depends on DRM && MMU && PPC
+   select DRM_GEM_SHMEM_HELPER
+   select DRM_KMS_HELPER
+   help
+ DRM driver for Open Firmware framebuffers.
+
+ This driver assumes that the display hardware has been initialized
+ by the Open Firmware before the kernel boots. Scanout buffer, size,
+ and display format must be provided via device tree.
+
  config DRM_PANEL_MIPI_DBI
tristate "DRM support for MIPI DBI compatible panels"
depends on DRM && SPI
diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile
index 1d9d6227e7ab..76dde89a044b 100644
--- a/drivers/gpu/drm/tiny/Makefile
+++ b/drivers/gpu/drm/tiny/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_DRM_ARCPGU)+= arcpgu.o
  obj-$(CONFIG_DRM_BOCHS)   += bochs.o
  obj-$(CONFIG_DRM_CIRRUS_QEMU) += cirrus.o
  obj-$(CONFIG_DRM_GM12U320)+= gm12u320.o
+obj-$(CONFIG_DRM_OFDRM)+= ofdrm.o
  obj-$(CONFIG_DRM_PANEL_MIPI_DBI)  += panel-mipi-dbi.o
  obj-$(CONFIG_DRM_SIMPLEDRM)   += simpledrm.o
  obj-$(CONFIG_TINYDRM_HX8357D) += hx8357d.o
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
new file mode 100644
index ..aca715b36179
--- /dev/null
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -0,0 +1,748 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#define DRIVER_NAME"ofdrm"
+#define DRIVER_DESC"DRM driver for OF platform devices"
+#define DRIVER_DATE"20220501"
+#define DRIVER_MAJOR   1
+#define DRIVER_MINOR   0
+
+/*
+ * Assume a monitor resolution of 96 dpi to
+ * get a somewhat reasonable screen size.
+ */
+#define RES_MM(d)  \
+   (((d) * 254ul) / (96ul * 10ul))
+
+#define OFDRM_MODE(hd, vd) \
+   DRM_SIMPLE_MODE(hd, vd, RES_MM(hd), RES_MM(vd))
+
+/*
+ * Helpers for display nodes
+ */
+
+static int display_get_validated_int(struct drm_device *dev, const char *name, 

Re: [PATCH] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]

2022-05-18 Thread Andrew Morton
On Wed, 18 May 2022 23:48:28 +0530 "Naveen N. Rao" 
 wrote:

> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [1], binutils (v2.36+) started dropping section symbols that
> it thought were unused.  This isn't an issue in general, but with
> kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
> separate .text.unlikely section and the section symbol ".text.unlikely"
> is being dropped. Due to this, recordmcount is unable to find a non-weak
> symbol in .text.unlikely to generate a relocation record against.
> 
> Address this by dropping the weak attribute from these functions:
> - arch_kexec_apply_relocations() is not overridden by any architecture
>   today, so just drop the weak attribute.
> - arch_kexec_apply_relocations_add() is only overridden by x86 and s390.
>   Retain the function prototype for those and move the weak
>   implementation into the header as a static inline for other
>   architectures.
> 
> ...
>

Sigh.  This patch demonstrates why I like __weak :<

> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -229,6 +225,30 @@ extern int crash_exclude_mem_range(struct crash_mem *mem,
>  unsigned long long mend);
>  extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
>  void **addr, unsigned long *sz);
> +
> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)

Let's avoid listing the architectures here?  Better to add

select ARCH_HAVE_ARCH_KEXEC_APPLY_RELOCATIONS_ADD

to arch//Kconfig?

Please cc me on any additional work on this.


Re: [PATCH] powerpc: check previous kernel's ima-kexec-buffer against memory bounds

2022-05-18 Thread Lakshmi Ramasubramanian

Hi Vaibhav,

On 5/18/2022 1:05 PM, Vaibhav Jain wrote:

Presently ima_get_kexec_buffer() doesn't check if the previous kernel's
ima-kexec-buffer lies outside the addressable memory range. This can result
in a kernel panic if the new kernel is booted with 'mem=X' arg and the
ima-kexec-buffer was allocated beyond that range by the previous kernel.


Thanks for providing this patch.


Fix this issue by checking returned address/size of previous kernel's
ima-kexec-buffer against memblock's memory bounds.

Fixes: fee3ff99bc67("powerpc: Move arch independent ima kexec functions to
drivers/of/kexec.c")

Cc: Frank Rowand 
Cc: Prakhar Srivastava 
Cc: Lakshmi Ramasubramanian 
Cc: Thiago Jung Bauermann 
Cc: Rob Herring 
Signed-off-by: Vaibhav Jain 
---
  drivers/of/kexec.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
index b9bd1cff1793..c73007eda52d 100644
--- a/drivers/of/kexec.c
+++ b/drivers/of/kexec.c
@@ -140,6 +140,13 @@ int ima_get_kexec_buffer(void **addr, size_t *size)
if (ret)
return ret;
  
+	/* if the ima-kexec-buffer goes beyond the addressable memory */

+   if (!memblock_is_region_memory(tmp_addr, tmp_size)) {
+   pr_warn("IMA buffer at 0x%lx, size = 0x%zx beyond memory\n",
+   tmp_addr, tmp_size);
+   return -EINVAL;
+   }
+

Reviewed-by: Lakshmi Ramasubramanian 


*addr = __va(tmp_addr);
*size = tmp_size;
  


[PATCH] powerpc: check previous kernel's ima-kexec-buffer against memory bounds

2022-05-18 Thread Vaibhav Jain
Presently ima_get_kexec_buffer() doesn't check if the previous kernel's
ima-kexec-buffer lies outside the addressable memory range. This can result
in a kernel panic if the new kernel is booted with 'mem=X' arg and the
ima-kexec-buffer was allocated beyond that range by the previous kernel.
The panic is usually of the form below:

$ sudo kexec --initrd initrd vmlinux --append='mem=16G'


 BUG: Unable to handle kernel data access on read at 0xc000c01fff7f
 Faulting instruction address: 0xc0837974
 Oops: Kernel access of bad area, sig: 11 [#1]

 NIP [c0837974] ima_restore_measurement_list+0x94/0x6c0
 LR [c083b55c] ima_load_kexec_buffer+0xac/0x160
 Call Trace:
 [c371fa80] [c083b55c] ima_load_kexec_buffer+0xac/0x160
 [c371fb00] [c20512c4] ima_init+0x80/0x108
 [c371fb70] [c20514dc] init_ima+0x4c/0x120
 [c371fbf0] [c0012240] do_one_initcall+0x60/0x2c0
 [c371fcc0] [c2004ad0] kernel_init_freeable+0x344/0x3ec
 [c371fda0] [c00128a4] kernel_init+0x34/0x1b0
 [c371fe10] [c000ce64] ret_from_kernel_thread+0x5c/0x64
 Instruction dump:
 f92100b8 f92100c0 90e10090 910100a0 4182050c 282a0017 3bc0 40810330
 7c0802a6 fb610198 7c9b2378 f80101d0  2c090001 40820614 e9240010
 ---[ end trace  ]---

Fix this issue by checking returned address/size of previous kernel's
ima-kexec-buffer against memblock's memory bounds.

Fixes: fee3ff99bc67("powerpc: Move arch independent ima kexec functions to
drivers/of/kexec.c")

Cc: Frank Rowand 
Cc: Prakhar Srivastava 
Cc: Lakshmi Ramasubramanian 
Cc: Thiago Jung Bauermann 
Cc: Rob Herring 
Signed-off-by: Vaibhav Jain 
---
 drivers/of/kexec.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
index b9bd1cff1793..c73007eda52d 100644
--- a/drivers/of/kexec.c
+++ b/drivers/of/kexec.c
@@ -140,6 +140,13 @@ int ima_get_kexec_buffer(void **addr, size_t *size)
if (ret)
return ret;
 
+   /* if the ima-kexec-buffer goes beyond the addressable memory */
+   if (!memblock_is_region_memory(tmp_addr, tmp_size)) {
+   pr_warn("IMA buffer at 0x%lx, size = 0x%zx beyond memory\n",
+   tmp_addr, tmp_size);
+   return -EINVAL;
+   }
+
*addr = __va(tmp_addr);
*size = tmp_size;
 
-- 
2.35.1



Re: [PATCH 2/2] drm/tiny: Add ofdrm for Open Firmware framebuffers

2022-05-18 Thread Michal Suchánek
Hello,

On Wed, May 18, 2022 at 08:30:06PM +0200, Thomas Zimmermann wrote:
> Open Firmware provides basic display output via the 'display' node.
> DT platform code already provides a device that represents the node's
> framebuffer. Add a DRM driver for the device. The display mode and
> color format is pre-initialized by the system's firmware. Runtime
> modesetting via DRM is not possible. The display is useful during
> early boot stages or as error fallback.
> 
> Similar functionality is already provided by fbdev's offb driver,
> which is insufficient for modern userspace. The old driver includes
> support for BootX device tree, which can be found on old 32-bit
> PowerPC Macintosh systems. If these are still in use, the
> functionality can be added to ofdrm or implemented in a new
> driver. As with simepldrm, the fbdev driver cannot be selected is
> ofdrm is already enabled.
> 
> Two noteable points about the driver:
> 
>  * Reading the framebuffer aperture from the device tree is not
> reliable on all systems. Ofdrm takes the heuristics and a comment
> from offb to pick the correct range.
> 
>  * No resource management may be tied to the underlying PCI device.
> Otherwise the handover to the native driver will fail with a resource
> conflict. PCI management is therefore done as part of the platform
> device's cleanup.
> 
> The driver has been tested on qemu's ppc64le emulation. The device
> hand-over has been tested with bochs.
> 
> Signed-off-by: Thomas Zimmermann 
> ---
>  MAINTAINERS   |   1 +
>  drivers/gpu/drm/tiny/Kconfig  |  12 +
>  drivers/gpu/drm/tiny/Makefile |   1 +
>  drivers/gpu/drm/tiny/ofdrm.c  | 748 ++
>  drivers/video/fbdev/Kconfig   |   1 +
>  5 files changed, 763 insertions(+)
>  create mode 100644 drivers/gpu/drm/tiny/ofdrm.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 43d833273ae9..090cbe1aa5e3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6395,6 +6395,7 @@ L:  dri-de...@lists.freedesktop.org
>  S:   Maintained
>  T:   git git://anongit.freedesktop.org/drm/drm-misc
>  F:   drivers/gpu/drm/drm_aperture.c
> +F:   drivers/gpu/drm/tiny/ofdrm.c
>  F:   drivers/gpu/drm/tiny/simpledrm.c
>  F:   include/drm/drm_aperture.h
>  
> diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
> index 627d637a1e7e..0bc54af42e7f 100644
> --- a/drivers/gpu/drm/tiny/Kconfig
> +++ b/drivers/gpu/drm/tiny/Kconfig
> @@ -51,6 +51,18 @@ config DRM_GM12U320
>This is a KMS driver for projectors which use the GM12U320 chipset
>for video transfer over USB2/3, such as the Acer C120 mini projector.
>  
> +config DRM_OFDRM
> + tristate "Open Firmware display driver"
> + depends on DRM && MMU && PPC

Does this build with !PCI?

The driver uses some PCI functions, so it might possibly break with
randconfig. I don't think there are practical !PCI PPC configurations.

Thanks

Michal


[PATCH 2/2] drm/tiny: Add ofdrm for Open Firmware framebuffers

2022-05-18 Thread Thomas Zimmermann
Open Firmware provides basic display output via the 'display' node.
DT platform code already provides a device that represents the node's
framebuffer. Add a DRM driver for the device. The display mode and
color format is pre-initialized by the system's firmware. Runtime
modesetting via DRM is not possible. The display is useful during
early boot stages or as error fallback.

Similar functionality is already provided by fbdev's offb driver,
which is insufficient for modern userspace. The old driver includes
support for BootX device tree, which can be found on old 32-bit
PowerPC Macintosh systems. If these are still in use, the
functionality can be added to ofdrm or implemented in a new
driver. As with simepldrm, the fbdev driver cannot be selected is
ofdrm is already enabled.

Two noteable points about the driver:

 * Reading the framebuffer aperture from the device tree is not
reliable on all systems. Ofdrm takes the heuristics and a comment
from offb to pick the correct range.

 * No resource management may be tied to the underlying PCI device.
Otherwise the handover to the native driver will fail with a resource
conflict. PCI management is therefore done as part of the platform
device's cleanup.

The driver has been tested on qemu's ppc64le emulation. The device
hand-over has been tested with bochs.

Signed-off-by: Thomas Zimmermann 
---
 MAINTAINERS   |   1 +
 drivers/gpu/drm/tiny/Kconfig  |  12 +
 drivers/gpu/drm/tiny/Makefile |   1 +
 drivers/gpu/drm/tiny/ofdrm.c  | 748 ++
 drivers/video/fbdev/Kconfig   |   1 +
 5 files changed, 763 insertions(+)
 create mode 100644 drivers/gpu/drm/tiny/ofdrm.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 43d833273ae9..090cbe1aa5e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6395,6 +6395,7 @@ L:dri-de...@lists.freedesktop.org
 S: Maintained
 T: git git://anongit.freedesktop.org/drm/drm-misc
 F: drivers/gpu/drm/drm_aperture.c
+F: drivers/gpu/drm/tiny/ofdrm.c
 F: drivers/gpu/drm/tiny/simpledrm.c
 F: include/drm/drm_aperture.h
 
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 627d637a1e7e..0bc54af42e7f 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -51,6 +51,18 @@ config DRM_GM12U320
 This is a KMS driver for projectors which use the GM12U320 chipset
 for video transfer over USB2/3, such as the Acer C120 mini projector.
 
+config DRM_OFDRM
+   tristate "Open Firmware display driver"
+   depends on DRM && MMU && PPC
+   select DRM_GEM_SHMEM_HELPER
+   select DRM_KMS_HELPER
+   help
+ DRM driver for Open Firmware framebuffers.
+
+ This driver assumes that the display hardware has been initialized
+ by the Open Firmware before the kernel boots. Scanout buffer, size,
+ and display format must be provided via device tree.
+
 config DRM_PANEL_MIPI_DBI
tristate "DRM support for MIPI DBI compatible panels"
depends on DRM && SPI
diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile
index 1d9d6227e7ab..76dde89a044b 100644
--- a/drivers/gpu/drm/tiny/Makefile
+++ b/drivers/gpu/drm/tiny/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_DRM_ARCPGU)+= arcpgu.o
 obj-$(CONFIG_DRM_BOCHS)+= bochs.o
 obj-$(CONFIG_DRM_CIRRUS_QEMU)  += cirrus.o
 obj-$(CONFIG_DRM_GM12U320) += gm12u320.o
+obj-$(CONFIG_DRM_OFDRM)+= ofdrm.o
 obj-$(CONFIG_DRM_PANEL_MIPI_DBI)   += panel-mipi-dbi.o
 obj-$(CONFIG_DRM_SIMPLEDRM)+= simpledrm.o
 obj-$(CONFIG_TINYDRM_HX8357D)  += hx8357d.o
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
new file mode 100644
index ..aca715b36179
--- /dev/null
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -0,0 +1,748 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#define DRIVER_NAME"ofdrm"
+#define DRIVER_DESC"DRM driver for OF platform devices"
+#define DRIVER_DATE"20220501"
+#define DRIVER_MAJOR   1
+#define DRIVER_MINOR   0
+
+/*
+ * Assume a monitor resolution of 96 dpi to
+ * get a somewhat reasonable screen size.
+ */
+#define RES_MM(d)  \
+   (((d) * 254ul) / (96ul * 10ul))
+
+#define OFDRM_MODE(hd, vd) \
+   DRM_SIMPLE_MODE(hd, vd, RES_MM(hd), RES_MM(vd))
+
+/*
+ * Helpers for display nodes
+ */
+
+static int display_get_validated_int(struct drm_device *dev, const char *name, 
uint32_t value)
+{
+   if (value > INT_MAX) {
+   drm_err(dev, "invalid framebuffer %s of %u\n", name, value);
+   return -EINVAL;
+   }
+   return (int)value;
+}
+
+static int display_get_validated_int0(struct drm_device *dev, const char 
*name, uint32_t value)
+{
+   if (!value) 

[PATCH 1/2] MAINTAINERS: Broaden scope of simpledrm entry

2022-05-18 Thread Thomas Zimmermann
There will be more DRM drivers for firmware-provided framebuffers. Use
the existing entry for simpledrm instead of adding a new one for each
driver. Also add DRM's aperture helpers, which are part of the driver's
infrastructure.

Signed-off-by: Thomas Zimmermann 
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5c1fd93d9050..43d833273ae9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6388,13 +6388,15 @@ S:  Orphan / Obsolete
 F: drivers/gpu/drm/savage/
 F: include/uapi/drm/savage_drm.h
 
-DRM DRIVER FOR SIMPLE FRAMEBUFFERS
+DRM DRIVER FOR FIRMWARE FRAMEBUFFERS
 M: Thomas Zimmermann 
 M: Javier Martinez Canillas 
 L: dri-de...@lists.freedesktop.org
 S: Maintained
 T: git git://anongit.freedesktop.org/drm/drm-misc
+F: drivers/gpu/drm/drm_aperture.c
 F: drivers/gpu/drm/tiny/simpledrm.c
+F: include/drm/drm_aperture.h
 
 DRM DRIVER FOR SIS VIDEO CARDS
 S: Orphan / Obsolete
-- 
2.36.1



[PATCH 0/2] drm: Add driverof PowerPC OF displays

2022-05-18 Thread Thomas Zimmermann
PowerPC's Open Firmware offers a simple display buffer for graphics
output. Add ofdrm, a DRM driver for the device. As with the existing
simpledrm driver, the graphics hardware is pre-initialized by the
firmware. The driver only provides blitting, no actual DRM modesetting
is possible.

Thomas Zimmermann (2):
  MAINTAINERS: Broaden scope of simpledrm entry
  drm/tiny: Add ofdrm for Open Firmware framebuffers

 MAINTAINERS   |   5 +-
 drivers/gpu/drm/tiny/Kconfig  |  12 +
 drivers/gpu/drm/tiny/Makefile |   1 +
 drivers/gpu/drm/tiny/ofdrm.c  | 748 ++
 drivers/video/fbdev/Kconfig   |   1 +
 5 files changed, 766 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/tiny/ofdrm.c

-- 
2.36.1



[PATCH] kexec_file: Drop weak attribute from arch_kexec_apply_relocations[_add]

2022-05-18 Thread Naveen N. Rao
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [1], binutils (v2.36+) started dropping section symbols that
it thought were unused.  This isn't an issue in general, but with
kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
separate .text.unlikely section and the section symbol ".text.unlikely"
is being dropped. Due to this, recordmcount is unable to find a non-weak
symbol in .text.unlikely to generate a relocation record against.

Address this by dropping the weak attribute from these functions:
- arch_kexec_apply_relocations() is not overridden by any architecture
  today, so just drop the weak attribute.
- arch_kexec_apply_relocations_add() is only overridden by x86 and s390.
  Retain the function prototype for those and move the weak
  implementation into the header as a static inline for other
  architectures.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao 
---
 include/linux/kexec.h | 28 
 kernel/kexec_file.c   | 19 +--
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 58d1b58a971e34..e656f981f43a73 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -193,10 +193,6 @@ void *kexec_purgatory_get_symbol_addr(struct kimage 
*image, const char *name);
 int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
  unsigned long buf_len);
 void *arch_kexec_kernel_image_load(struct kimage *image);
-int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
-Elf_Shdr *section,
-const Elf_Shdr *relsec,
-const Elf_Shdr *symtab);
 int arch_kexec_apply_relocations(struct purgatory_info *pi,
 Elf_Shdr *section,
 const Elf_Shdr *relsec,
@@ -229,6 +225,30 @@ extern int crash_exclude_mem_range(struct crash_mem *mem,
   unsigned long long mend);
 extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
   void **addr, unsigned long *sz);
+
+#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
+int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
+Elf_Shdr *section,
+const Elf_Shdr *relsec,
+const Elf_Shdr *symtab);
+#else
+/*
+ * arch_kexec_apply_relocations_add - apply relocations of type RELA
+ * @pi:Purgatory to be relocated.
+ * @section:   Section relocations applying to.
+ * @relsec:Section containing RELAs.
+ * @symtab:Corresponding symtab.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+static inline int
+arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
+const Elf_Shdr *relsec, const Elf_Shdr *symtab)
+{
+   pr_err("RELA relocation unsupported.\n");
+   return -ENOEXEC;
+}
+#endif /* CONFIG_X86_64 || CONFIG_S390 */
 #endif /* CONFIG_KEXEC_FILE */
 
 #ifdef CONFIG_KEXEC_ELF
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b96..6bae253b4d315e 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -108,23 +108,6 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage 
*image, void *buf,
 }
 #endif
 
-/*
- * arch_kexec_apply_relocations_add - apply relocations of type RELA
- * @pi:Purgatory to be relocated.
- * @section:   Section relocations applying to.
- * @relsec:Section containing RELAs.
- * @symtab:Corresponding symtab.
- *
- * Return: 0 on success, negative errno on error.
- */
-int __weak
-arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section,
-const Elf_Shdr *relsec, const Elf_Shdr *symtab)
-{
-   pr_err("RELA relocation unsupported.\n");
-   return -ENOEXEC;
-}
-
 /*
  * arch_kexec_apply_relocations - apply relocations of type REL
  * @pi:Purgatory to be relocated.
@@ -134,7 +117,7 @@ arch_kexec_apply_relocations_add(struct purgatory_info *pi, 
Elf_Shdr *section,
  *
  * Return: 0 on success, negative errno on error.
  */
-int __weak
+int
 arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
 const Elf_Shdr *relsec, const Elf_Shdr *symtab)
 {

base-commit: ef1302160bfb19f804451d0e919266703501c875
-- 
2.36.1



Re: [Buildroot] [PATCH] linux: Fix powerpc64le defconfig selection

2022-05-18 Thread Arnout Vandecappelle




On 16/05/2022 15:17, Michael Ellerman wrote:

Arnout Vandecappelle  writes:

On 10/05/2022 04:20, Joel Stanley wrote:

The default defconfig target for the 64 bit powerpc kernel is
ppc64_defconfig, the big endian configuration.

When building for powerpc64le users want the little endian kernel as
they can't boot LE userspace on a BE kernel.

Fix up the defconfig used in this case. This will avoid the following
autobuilder failure:

   VDSO32A arch/powerpc/kernel/vdso32/sigtramp.o
   cc1: error: ‘-m32’ not supported in this configuratioin
   make[4]: *** [arch/powerpc/kernel/vdso32/Makefile:49: 
arch/powerpc/kernel/vdso32/sigtramp.o] Error 1

   
http://autobuild.buildroot.net/results/dd76d53bab56470c0b83e296872d7bb90f9e8296/

Note that the failure indicates the toolchain is configured to disable
the 32 bit target, causing the kernel to fail when building the 32 bit
VDSO. This is only a problem on the BE kernel as the LE kernel disables
CONFIG_COMPAT, aka 32 bit userspace support, by default.

Signed-off-by: Joel Stanley 


   Applied to master, thanks. However, the defconfig mechanism for *all* powerpc
seems pretty broken. Here's what we have in 5.16, before that there was
something similar:

# If we're on a ppc/ppc64/ppc64le machine use that defconfig, otherwise just use
# ppc64_defconfig because we have nothing better to go on.
uname := $(shell uname -m)
KBUILD_DEFCONFIG := $(if $(filter ppc%,$(uname)),$(uname),ppc64)_defconfig

   So I guess we should use a specific defconfig for *all* powerpc.

   The arch-default defconfig is generally not really reliable, for example for
arm it always takes v7_multi, but that won't work for v7m targets...


There's a fundamental problem that just the "arch" is not sufficient
detail when you're building a kernel.


 Yes, which is pretty much unavoidable.


Two CPUs that implement the same user-visible "arch" may differ enough
at the kernel level to require a different defconfig.

Having said that I think we could handle this better in the powerpc
kernel. Other arches allow specifying a different value for ARCH, which
then is fed into the defconfig.


 I don't know if it's worth bothering with that. It certainly would not make 
our life easier, because it would mean we need to set ARCH correctly. If we can 
do that, we can just as well set the defconfig correctly.



That way you could at least pass ARCH=ppc/ppc64/ppc64le, and get an
appropriate defconfig.

I'll work on some kernel changes for that.


 I think the most important thing is that it makes no sense to rely on uname 
when ARCH and/or CROSS_COMPILE are set.


 Regards,
 Arnout



cheers


Re: [PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs

2022-05-18 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 18/05/2022 à 14:03, Michael Ellerman a écrit :

Michael Ellerman  writes:

"Naveen N. Rao"  writes:

Christophe Leroy wrote:

A lot of #ifdefs can be replaced by IS_ENABLED()

Do so.

This requires to have kernel_toc_addr() defined at all time
as well as PPC_INST_LD_TOC and PPC_INST_STD_LR.

Signed-off-by: Christophe Leroy 
---
v2: Moved the setup of pop outside of the big if()/else() in __ftrace_make_nop()
---
  arch/powerpc/include/asm/code-patching.h |   2 -
  arch/powerpc/include/asm/module.h|   2 -
  arch/powerpc/include/asm/sections.h  |  24 +--
  arch/powerpc/kernel/trace/ftrace.c   | 182 +++
  4 files changed, 103 insertions(+), 107 deletions(-)






@@ -710,6 +707,9 @@ void arch_ftrace_update_code(int command)

  #ifdef CONFIG_PPC64
  #define PACATOC offsetof(struct paca_struct, kernel_toc)
+#else
+#define PACATOC 0
+#endif


This conflicts with my fix for the ftrace init tramp:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220516071422.463738-1-naveen.n@linux.vnet.ibm.com/

It probably makes sense to retain #ifdef CONFIG_PPC64, so that we can
get rid of the PACATOC. Here is an incremental diff:


Where is the incremental diff meant to apply?

It doesn't apply on top of patch 19, or at the end of the series.


Ugh, sorry. I had an additional patch that converts those 
ftrace_[regs_]_caller uses to FTRACE_REGS_ADDR, which prevented one of 
the hunks from applying.




I think I worked out what you meant.

Can you check what's in next-test:

   https://github.com/linuxppc/linux/commits/next-test


Yes that looks fine.


+1



As Naveen mentioned we can also get rid of PACATOC completely and use 
offsetof(struct paca_struct, kernel_toc) directly at the only place 
PACATOC is used.


Yes, or we can send it out as a separate cleanup.


Thanks,
Naveen



Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-18 Thread Naveen N. Rao

Eric W. Biederman wrote:

Michael Ellerman  writes:


"Eric W. Biederman"  writes:

Looking at this the pr_err is absolutely needed.  If an unsupported case
winds up in the purgatory blob and the code can't handle it things
will fail silently much worse later.


It won't fail later, it will fail the syscall.

sys_kexec_file_load()
  kimage_file_alloc_init()
kimage_file_prepare_segments()
  arch_kexec_kernel_image_load()
kexec_image_load_default()
  image->fops->load()
elf64_load()# powerpc
bzImage64_load()# x86
  kexec_load_purgatory()
kexec_apply_relocations()

Which does:

if (relsec->sh_type == SHT_RELA)
ret = arch_kexec_apply_relocations_add(pi, section,
   relsec, symtab);
else if (relsec->sh_type == SHT_REL)
ret = arch_kexec_apply_relocations(pi, section,
   relsec, symtab);
if (ret)
return ret;

And that error is bubbled all the way back up. So as long as
arch_kexec_apply_relocations() returns an error the syscall will fail
back to userspace and there'll be an error message at that level.

It's true that having nothing printed in dmesg makes it harder to work
out why the syscall failed. But it's a kernel bug if there are unhandled
relocations in the kernel-supplied purgatory code, so a user really has
no way to do anything about the error even if it is printed.


Good point.  I really hadn't noticed the error code in there when I
looked.

I still don't think changing the functionality of the code because of
a tool issue is the right solution.


Ok.





"Naveen N. Rao"  writes:


Baoquan He wrote:

On 04/25/22 at 11:11pm, Naveen N. Rao wrote:

kexec_load_purgatory() can fail for many reasons - there is no need to
print an error when encountering unsupported relocations.
This solves a build issue on powerpc with binutils v2.36 and newer [1].
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought

I am not familiar with binutils, while wondering if this exists in other
ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
have problem with it?


I'm not aware of this specific file causing a problem on other architectures -
perhaps the config options differ enough. There are however more reports of
similar issues affecting other architectures with the llvm integrated assembler:
https://github.com/ClangBuiltLinux/linux/issues/981




were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol

But arch_kexec_apply_relocations_add is weak symbol on ppc.


Yes. Note that it is just the section symbol that gets dropped. The section is
still present and will continue to hold the symbols for the functions
themselves.


So we have a case where binutils thinks it is doing something useful
and our kernel specific tool gets tripped up by it.


It's not just binutils, the LLVM assembler has the same behavior.


Reading the recordmcount code it looks like it is finding any symbol
within a section but ignoring weak symbols.  So I suspect the only
remaining symbol in the section is __weak and that confuses
recordmcount.

Does removing the __weak annotation on those functions fix the build
error?  If so we can restructure the kexec code to simply not use __weak
symbols.

Otherwise the fix needs to be in recordmcount or binutils, and we should
loop whoever maintains recordmcount in to see what they can do.


It seems that recordmcount is not really maintained anymore now that x86
uses objtool?

There've been several threads about fixing recordmcount, but none of
them seem to have lead to a solution.


That is unfortunate.


These weak symbol vs recordmcount problems have been worked around going
back as far as 2020:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9


I am more than happy to adopt the kind of solution that was adopted
there in elfcore.h and simply get rid of __weak symbols in the kexec
code.

Using __weak symbols is really not the common kernel way of doing
things.  Using __weak symbols introduces a bit of magic in how the
kernel gets built that is unnecessary.

Can someone verify that deleting __weak is enough to get powerpc to
build?  AKA:

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..7f4ca8dbe26f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -117,7 +117,7 @@ int __weak arch_kexec_kernel_verify_sig(struct kimage 
*image, void *buf,
  *
  * Return: 0 on success, negative errno on error.
  */

[powerpc:next-test] BUILD REGRESSION 2e4a9942261f89ad204a8189634029a4b1f0efb6

2022-05-18 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 2e4a9942261f89ad204a8189634029a4b1f0efb6  powerpc/irq: Remove 
arch_local_irq_restore() for !CONFIG_CC_HAS_ASM_GOTO

Error/Warning reports:

https://lore.kernel.org/linuxppc-dev/202205180129.raxtjqf6-...@intel.com
https://lore.kernel.org/llvm/202205180443.cfi2jikj-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

arch/powerpc/kernel/trace/ftrace.c:714:6: error: redefinition of 
'ftrace_free_init_tramp'
arch/powerpc/sysdev/mpc5xxx_clocks.c:26:2: error: call to undeclared function 
'fwnode_for_each_parent_node'; ISO C99 and later do not support implicit 
function declarations [-Wimplicit-function-declaration]
arch/powerpc/sysdev/mpc5xxx_clocks.c:26:45: error: expected ';' after expression

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- powerpc-allmodconfig
|   `-- 
arch-powerpc-kernel-trace-ftrace.c:error:redefinition-of-ftrace_free_init_tramp
|-- powerpc-allyesconfig
|   `-- 
arch-powerpc-kernel-trace-ftrace.c:error:redefinition-of-ftrace_free_init_tramp
|-- powerpc-buildonly-randconfig-r005-20220516
|   `-- 
arch-powerpc-kernel-trace-ftrace.c:error:redefinition-of-ftrace_free_init_tramp
`-- powerpc-ppc64_defconfig
`-- 
arch-powerpc-kernel-trace-ftrace.c:error:redefinition-of-ftrace_free_init_tramp

clang_recent_errors
`-- powerpc-mpc512x_defconfig
|-- 
arch-powerpc-sysdev-mpc5xxx_clocks.c:error:call-to-undeclared-function-fwnode_for_each_parent_node-ISO-C99-and-later-do-not-support-implicit-function-declarations
`-- arch-powerpc-sysdev-mpc5xxx_clocks.c:error:expected-after-expression

elapsed time: 1505m

configs tested: 100
configs skipped: 3

gcc tested configs:
arm64   defconfig
arm64allyesconfig
arm  allmodconfig
arm defconfig
arm  allyesconfig
i386 randconfig-c001-20220516
armmini2440_defconfig
powerpc mpc834x_mds_defconfig
sparc   sparc64_defconfig
mipsgpr_defconfig
powerpc pq2fads_defconfig
sh ecovec24_defconfig
arcnsimosci_defconfig
powerpc   ppc64_defconfig
ia64defconfig
powerpc mpc8540_ads_defconfig
ia64 allmodconfig
ia64 allyesconfig
m68kdefconfig
m68k allyesconfig
m68k allmodconfig
nios2   defconfig
arc  allyesconfig
alpha   defconfig
cskydefconfig
nios2allyesconfig
alphaallyesconfig
h8300allyesconfig
xtensa   allyesconfig
arc defconfig
sh   allmodconfig
s390defconfig
s390 allmodconfig
parisc  defconfig
parisc64defconfig
parisc   allyesconfig
s390 allyesconfig
i386  debian-10.3
i386defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386   debian-10.3-kselftests
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc   allnoconfig
powerpc  allmodconfig
x86_64   randconfig-a012-20220516
x86_64   randconfig-a011-20220516
x86_64   randconfig-a013-20220516
x86_64   randconfig-a014-20220516
x86_64   randconfig-a016-20220516
x86_64   randconfig-a015-20220516
i386 randconfig-a011-20220516
i386 randconfig-a013-20220516
i386 randconfig-a015-20220516
i386 randconfig-a012-20220516
i386 randconfig-a016-20220516
i386 randconfig-a014-20220516
i386  randconfig-a012
i386  randconfig-a014
i386  randconfig-a016
arc  randconfig-r043-20220516
riscvrandconfig-r042-20220516
s390 randconfig-r044-20220516
riscv   defconfig
riscvnommu_virt_defconfig
riscv  rv32_defconfig
riscvnommu_k210_defconfig
riscv  

Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-18 Thread Eric W. Biederman
Michael Ellerman  writes:

> "Eric W. Biederman"  writes:
>> Looking at this the pr_err is absolutely needed.  If an unsupported case
>> winds up in the purgatory blob and the code can't handle it things
>> will fail silently much worse later.
>
> It won't fail later, it will fail the syscall.
>
> sys_kexec_file_load()
>   kimage_file_alloc_init()
> kimage_file_prepare_segments()
>   arch_kexec_kernel_image_load()
> kexec_image_load_default()
>   image->fops->load()
> elf64_load()# powerpc
> bzImage64_load()# x86
>   kexec_load_purgatory()
> kexec_apply_relocations()
>
> Which does:
>
>   if (relsec->sh_type == SHT_RELA)
>   ret = arch_kexec_apply_relocations_add(pi, section,
>  relsec, symtab);
>   else if (relsec->sh_type == SHT_REL)
>   ret = arch_kexec_apply_relocations(pi, section,
>  relsec, symtab);
>   if (ret)
>   return ret;
>
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
>
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.

Good point.  I really hadn't noticed the error code in there when I
looked.

I still don't think changing the functionality of the code because of
a tool issue is the right solution.


>> "Naveen N. Rao"  writes:
>>
>>> Baoquan He wrote:
 On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought
 I am not familiar with binutils, while wondering if this exists in other
 ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
 have problem with it?
>>>
>>> I'm not aware of this specific file causing a problem on other 
>>> architectures -
>>> perhaps the config options differ enough. There are however more reports of
>>> similar issues affecting other architectures with the llvm integrated 
>>> assembler:
>>> https://github.com/ClangBuiltLinux/linux/issues/981
>>>

> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol
 But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>>
>>> Yes. Note that it is just the section symbol that gets dropped. The section 
>>> is
>>> still present and will continue to hold the symbols for the functions
>>> themselves.
>>
>> So we have a case where binutils thinks it is doing something useful
>> and our kernel specific tool gets tripped up by it.
>
> It's not just binutils, the LLVM assembler has the same behavior.
>
>> Reading the recordmcount code it looks like it is finding any symbol
>> within a section but ignoring weak symbols.  So I suspect the only
>> remaining symbol in the section is __weak and that confuses
>> recordmcount.
>>
>> Does removing the __weak annotation on those functions fix the build
>> error?  If so we can restructure the kexec code to simply not use __weak
>> symbols.
>>
>> Otherwise the fix needs to be in recordmcount or binutils, and we should
>> loop whoever maintains recordmcount in to see what they can do.
>
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
>
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.

That is unfortunate.

> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:
>
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

I am more than happy to adopt the kind of solution that was adopted
there in elfcore.h and simply get rid of __weak symbols in the kexec
code.

Using __weak symbols is really not the common kernel way of doing
things.  Using __weak symbols introduces a bit of magic in how the
kernel gets built that is unnecessary.

Can someone verify that deleting __weak is enough to get powerpc to
build?  AKA:

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..7f4ca8dbe26f 100644

[PATCH] powerpc: Fix all occurences of "the the"

2022-05-18 Thread Michael Ellerman
Rather than waiting for the bots to fix these one-by-one, fix all
occurences of "the the" throughout arch/powerpc.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/boot/wrapper  | 2 +-
 arch/powerpc/kernel/eeh_pe.c   | 2 +-
 arch/powerpc/kernel/head_64.S  | 2 +-
 arch/powerpc/kernel/pci-common.c   | 2 +-
 arch/powerpc/kernel/smp.c  | 2 +-
 arch/powerpc/kvm/book3s_64_entry.S | 2 +-
 arch/powerpc/kvm/book3s_xive_native.c  | 2 +-
 arch/powerpc/mm/cacheflush.c   | 2 +-
 arch/powerpc/mm/pgtable.c  | 2 +-
 arch/powerpc/platforms/52xx/mpc52xx_gpt.c  | 2 +-
 arch/powerpc/platforms/chrp/setup.c| 2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c  | 2 +-
 arch/powerpc/platforms/powernv/pci-sriov.c | 2 +-
 arch/powerpc/xmon/xmon.c   | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 9184eda780fd..55978f32fa77 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -162,7 +162,7 @@ while [ "$#" -gt 0 ]; do
fi
;;
 --no-gzip)
-# a "feature" of the the wrapper script is that it can be used outside
+# a "feature" of the wrapper script is that it can be used outside
 # the kernel tree. So keeping this around for backwards compatibility.
 compression=
uboot_comp=none
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index d7a9cf376831..d2873d17d2b1 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -302,7 +302,7 @@ struct eeh_pe *eeh_pe_get(struct pci_controller *phb, int 
pe_no)
  * @new_pe_parent.
  *
  * If @new_pe_parent is NULL then the new PE will be inserted under
- * directly under the the PHB.
+ * directly under the PHB.
  */
 int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 {
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index f85660d054bd..d3eea633d11a 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -111,7 +111,7 @@ END_FTR_SECTION(0, 1)
 #ifdef CONFIG_RELOCATABLE
/* This flag is set to 1 by a loader if the kernel should run
 * at the loaded address instead of the linked address.  This
-* is used by kexec-tools to keep the the kdump kernel in the
+* is used by kexec-tools to keep the kdump kernel in the
 * crash_kernel region.  The loader is responsible for
 * observing the alignment requirement.
 */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 63ed90ba9f0b..068410cd54a3 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -42,7 +42,7 @@
 
 #include "../../../drivers/pci/pci.h"
 
-/* hose_spinlock protects accesses to the the phb_bitmap. */
+/* hose_spinlock protects accesses to the phb_bitmap. */
 static DEFINE_SPINLOCK(hose_spinlock);
 LIST_HEAD(hose_list);
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 4335efcb3184..bcefab484ea6 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -874,7 +874,7 @@ static int parse_thread_groups(struct device_node *dn,
  * @tg : The thread-group structure of the CPU node which @cpu belongs
  *   to.
  *
- * Returns the index to tg->thread_list that points to the the start
+ * Returns the index to tg->thread_list that points to the start
  * of the thread_group that @cpu belongs to.
  *
  * Returns -1 if cpu doesn't belong to any of the groups pointed to by
diff --git a/arch/powerpc/kvm/book3s_64_entry.S 
b/arch/powerpc/kvm/book3s_64_entry.S
index e42d1c609e47..e43704547a1e 100644
--- a/arch/powerpc/kvm/book3s_64_entry.S
+++ b/arch/powerpc/kvm/book3s_64_entry.S
@@ -124,7 +124,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 /*
  * "Skip" interrupts are part of a trick KVM uses a with hash guests to load
- * the faulting instruction in guest memory from the the hypervisor without
+ * the faulting instruction in guest memory from the hypervisor without
  * walking page tables.
  *
  * When the guest takes a fault that requires the hypervisor to load the
diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
b/arch/powerpc/kvm/book3s_xive_native.c
index f81ba6f84e72..5271c33fe79e 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -209,7 +209,7 @@ static int kvmppc_xive_native_reset_mapped(struct kvm *kvm, 
unsigned long irq)
 
/*
 * Clear the ESB pages of the IRQ number being mapped (or
-* unmapped) into the guest and let the the VM fault handler
+* unmapped) into the guest and let the VM fault handler
 * repopulate with the appropriate ESB pages (device or IC)
 */
pr_debug("clearing esb pages for girq 0x%lx\n", irq);
diff --git a/arch/powerpc/mm/cacheflush.c 

[PATCH 4/4] powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING

2022-05-18 Thread Nicholas Piggin
CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not implement
stolen time accounting. Implement it with the paravirt time
accounting option.

Signed-off-by: Nicholas Piggin 
---
 .../admin-guide/kernel-parameters.txt |  6 +++---
 arch/powerpc/include/asm/paravirt.h   | 12 
 arch/powerpc/platforms/pseries/Kconfig|  8 
 arch/powerpc/platforms/pseries/lpar.c | 11 +++
 arch/powerpc/platforms/pseries/setup.c| 19 +++
 5 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 3f1cc5e317ed..855fc7b02261 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3604,9 +3604,9 @@
[X86,PV_OPS] Disable paravirtualized VMware scheduler
clock and use the default one.
 
-   no-steal-acc[X86,PV_OPS,ARM64] Disable paravirtualized steal time
-   accounting. steal time is computed, but won't
-   influence scheduler behaviour
+   no-steal-acc[X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized
+   steal time accounting. steal time is computed, but
+   won't influence scheduler behaviour
 
nolapic [X86-32,APIC] Do not enable or use the local APIC.
 
diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
index eb7df559ae74..f5ba1a3c41f8 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -21,6 +21,18 @@ static inline bool is_shared_processor(void)
return static_branch_unlikely(_processor);
 }
 
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 pseries_paravirt_steal_clock(int cpu);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return pseries_paravirt_steal_clock(cpu);
+}
+#endif
+
 /* If bit 0 is set, the cpu has been ceded, conferred, or preempted */
 static inline u32 yield_count_of(int cpu)
 {
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index f7fd91d153a4..d4306ebdca5e 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -24,13 +24,21 @@ config PPC_PSERIES
select SWIOTLB
default y
 
+config PARAVIRT
+   bool
+
 config PARAVIRT_SPINLOCKS
bool
 
+config PARAVIRT_TIME_ACCOUNTING
+   select PARAVIRT
+   bool
+
 config PPC_SPLPAR
bool "Support for shared-processor logical partitions"
depends on PPC_PSERIES
select PARAVIRT_SPINLOCKS if PPC_QUEUED_SPINLOCKS
+   select PARAVIRT_TIME_ACCOUNTING if VIRT_CPU_ACCOUNTING_GEN
default y
help
  Enabling this option will make the kernel run more efficiently
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 760581c5752f..1965b7d7d8f1 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -661,6 +661,17 @@ static int __init vcpudispatch_stats_procfs_init(void)
 }
 
 machine_device_initcall(pseries, vcpudispatch_stats_procfs_init);
+
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+u64 pseries_paravirt_steal_clock(int cpu)
+{
+   struct lppaca *lppaca = _of(cpu);
+
+   return be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) +
+   be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb));
+}
+#endif
+
 #endif /* CONFIG_PPC_SPLPAR */
 
 void vpa_init(int cpu)
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 955ff8aa1644..691c9add4a5a 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -78,6 +78,20 @@
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL(shared_processor);
 
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static bool steal_acc = true;
+static int __init parse_no_stealacc(char *arg)
+{
+   steal_acc = false;
+   return 0;
+}
+
+early_param("no-steal-acc", parse_no_stealacc);
+#endif
+
 int CMO_PrPSP = -1;
 int CMO_SecPSP = -1;
 unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
@@ -831,6 +845,11 @@ static void __init pSeries_setup_arch(void)
if (lppaca_shared_proc(get_lppaca())) {
static_branch_enable(_processor);
pv_spinlocks_init();
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+   static_key_slow_inc(_steal_enabled);
+   if (steal_acc)
+   static_key_slow_inc(_steal_rq_enabled);
+#endif
}
 
ppc_md.power_save = pseries_lpar_idle;

[PATCH 3/4] KVM: PPC: Book3S HV: Implement scheduling wait interval counters in the VPA

2022-05-18 Thread Nicholas Piggin
PAPR specifies accumulated virtual processor wait intervals that relate
to partition scheduling interval times. Implement these counters in the
same way as they are repoted by dtl.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 62 
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 0a0835edb64a..9f8795d2b0c3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -732,16 +732,15 @@ static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 
now)
 }
 
 static void __kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
+   struct lppaca *vpa,
unsigned int pcpu, u64 now,
unsigned long stolen)
 {
struct dtl_entry *dt;
-   struct lppaca *vpa;
 
dt = vcpu->arch.dtl_ptr;
-   vpa = vcpu->arch.vpa.pinned_addr;
 
-   if (!dt || !vpa)
+   if (!dt)
return;
 
dt->dispatch_reason = 7;
@@ -762,29 +761,23 @@ static void __kvmppc_create_dtl_entry(struct kvm_vcpu 
*vcpu,
/* order writing *dt vs. writing vpa->dtl_idx */
smp_wmb();
vpa->dtl_idx = cpu_to_be64(++vcpu->arch.dtl_index);
-   vcpu->arch.dtl.dirty = true;
-}
-
-static void kvmppc_create_dtl_entry_p9(struct kvm_vcpu *vcpu,
-  struct kvmppc_vcore *vc,
-  u64 now)
-{
-   unsigned long stolen;
 
-   stolen = vc->stolen_tb - vcpu->arch.stolen_logged;
-   vcpu->arch.stolen_logged = vc->stolen_tb;
-
-   __kvmppc_create_dtl_entry(vcpu, vc->pcpu, now, stolen);
+   /* vcpu->arch.dtl.dirty is set by the caller */
 }
 
-static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
-   struct kvmppc_vcore *vc)
+static void kvmppc_update_vpa_dispatch(struct kvm_vcpu *vcpu,
+  struct kvmppc_vcore *vc)
 {
+   struct lppaca *vpa;
unsigned long stolen;
unsigned long core_stolen;
u64 now;
unsigned long flags;
 
+   vpa = vcpu->arch.vpa.pinned_addr;
+   if (!vpa)
+   return;
+
now = mftb();
 
core_stolen = vcore_stolen_time(vc, now);
@@ -795,7 +788,34 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
vcpu->arch.busy_stolen = 0;
spin_unlock_irqrestore(>arch.tbacct_lock, flags);
 
-   __kvmppc_create_dtl_entry(vcpu, vc->pcpu, now + vc->tb_offset, stolen);
+   vpa->enqueue_dispatch_tb = 
cpu_to_be64(be64_to_cpu(vpa->enqueue_dispatch_tb) + stolen);
+
+   __kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + vc->tb_offset, 
stolen);
+
+   vcpu->arch.vpa.dirty = true;
+}
+
+static void kvmppc_update_vpa_dispatch_p9(struct kvm_vcpu *vcpu,
+  struct kvmppc_vcore *vc,
+  u64 now)
+{
+   struct lppaca *vpa;
+   unsigned long stolen;
+   unsigned long stolen_delta;
+
+   vpa = vcpu->arch.vpa.pinned_addr;
+   if (!vpa)
+   return;
+
+   stolen = vc->stolen_tb;
+   stolen_delta = stolen - vcpu->arch.stolen_logged;
+   vcpu->arch.stolen_logged = stolen;
+
+   vpa->enqueue_dispatch_tb = cpu_to_be64(stolen);
+
+   __kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now, stolen_delta);
+
+   vcpu->arch.vpa.dirty = true;
 }
 
 /* See if there is a doorbell interrupt pending for a vcpu */
@@ -3820,7 +3840,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
 * kvmppc_core_prepare_to_enter.
 */
kvmppc_start_thread(vcpu, pvc);
-   kvmppc_create_dtl_entry(vcpu, pvc);
+   kvmppc_update_vpa_dispatch(vcpu, pvc);
trace_kvm_guest_enter(vcpu);
if (!vcpu->arch.ptid)
thr0_done = true;
@@ -4392,7 +4412,7 @@ static int kvmppc_run_vcpu(struct kvm_vcpu *vcpu)
if ((vc->vcore_state == VCORE_PIGGYBACK ||
 vc->vcore_state == VCORE_RUNNING) &&
   !VCORE_IS_EXITING(vc)) {
-   kvmppc_create_dtl_entry(vcpu, vc);
+   kvmppc_update_vpa_dispatch(vcpu, vc);
kvmppc_start_thread(vcpu, vc);
trace_kvm_guest_enter(vcpu);
} else if (vc->vcore_state == VCORE_SLEEPING) {
@@ -4575,7 +4595,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 
tb = mftb();
 
-   kvmppc_create_dtl_entry_p9(vcpu, vc, tb + vc->tb_offset);
+   kvmppc_update_vpa_dispatch_p9(vcpu, vc, tb + vc->tb_offset);
 
trace_kvm_guest_enter(vcpu);
 
-- 
2.35.1



[PATCH 2/4] powerpc/pseries: Add wait interval counters to VPA

2022-05-18 Thread Nicholas Piggin
The hypervisor exposes accumulated partition scheduling interval times
in the VPA (lppaca). These can be used to implement a simple stolen time
in the guest without complex and costly dtl scanning.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/lppaca.h | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index c390ec377bae..34d44cb17c87 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -104,14 +104,18 @@ struct lppaca {
volatile __be32 dispersion_count; /* dispatch changed physical cpu */
volatile __be64 cmo_faults; /* CMO page fault count */
volatile __be64 cmo_fault_time; /* CMO page fault time */
-   u8  reserved10[104];
+   u8  reserved10[64]; /* [S]PURR expropriated/donated */
+   volatile __be64 enqueue_dispatch_tb; /* Total TB enqueue->dispatch */
+   volatile __be64 ready_enqueue_tb; /* Total TB ready->enqueue */
+   volatile __be64 wait_ready_tb;  /* Total TB wait->ready */
+   u8  reserved11[16];
 
/* cacheline 4-5 */
 
__be32  page_ins;   /* CMO Hint - # page ins by OS */
-   u8  reserved11[148];
+   u8  reserved12[148];
volatile __be64 dtl_idx;/* Dispatch Trace Log head index */
-   u8  reserved12[96];
+   u8  reserved13[96];
 } cacheline_aligned;
 
 #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
-- 
2.35.1



[PATCH 1/4] KVM: PPC: Book3S HV P9: Restore stolen time logging in dtl

2022-05-18 Thread Nicholas Piggin
Stolen time logging in dtl was removed from the P9 path, so guests had
no stolen time accounting. Add it back in a simpler way that still
avoids locks and per-core accounting code.

Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 49 +---
 1 file changed, 45 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6fa518f6501d..0a0835edb64a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -248,6 +248,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 
 /*
  * We use the vcpu_load/put functions to measure stolen time.
+ *
  * Stolen time is counted as time when either the vcpu is able to
  * run as part of a virtual core, but the task running the vcore
  * is preempted or sleeping, or when the vcpu needs something done
@@ -277,6 +278,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
  * lock.  The stolen times are measured in units of timebase ticks.
  * (Note that the != TB_NIL checks below are purely defensive;
  * they should never fail.)
+ *
+ * The POWER9 path is simpler, one vcpu per virtual core so the
+ * former case does not exist. If a vcpu is preempted when it is
+ * BUSY_IN_HOST and not ceded or otherwise blocked, then accumulate
+ * the stolen cycles in busy_stolen. RUNNING is not a preemptible
+ * state in the P9 path.
  */
 
 static void kvmppc_core_start_stolen(struct kvmppc_vcore *vc, u64 tb)
@@ -310,8 +317,14 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu 
*vcpu, int cpu)
unsigned long flags;
u64 now;
 
-   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   if (vcpu->arch.busy_preempt != TB_NIL) {
+   WARN_ON_ONCE(vcpu->arch.state != 
KVMPPC_VCPU_BUSY_IN_HOST);
+   vc->stolen_tb += mftb() - vcpu->arch.busy_preempt;
+   vcpu->arch.busy_preempt = TB_NIL;
+   }
return;
+   }
 
now = mftb();
 
@@ -339,8 +352,21 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
unsigned long flags;
u64 now;
 
-   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   /*
+* In the P9 path, RUNNABLE is not preemptible
+* (nor takes host interrupts)
+*/
+   WARN_ON_ONCE(vcpu->arch.state == KVMPPC_VCPU_RUNNABLE);
+   /*
+* Account stolen time when preempted while the vcpu task is
+* running in the kernel (but not in qemu, which is INACTIVE).
+*/
+   if (task_is_running(current) &&
+   vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST)
+   vcpu->arch.busy_preempt = mftb();
return;
+   }
 
now = mftb();
 
@@ -739,6 +765,18 @@ static void __kvmppc_create_dtl_entry(struct kvm_vcpu 
*vcpu,
vcpu->arch.dtl.dirty = true;
 }
 
+static void kvmppc_create_dtl_entry_p9(struct kvm_vcpu *vcpu,
+  struct kvmppc_vcore *vc,
+  u64 now)
+{
+   unsigned long stolen;
+
+   stolen = vc->stolen_tb - vcpu->arch.stolen_logged;
+   vcpu->arch.stolen_logged = vc->stolen_tb;
+
+   __kvmppc_create_dtl_entry(vcpu, vc->pcpu, now, stolen);
+}
+
 static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
struct kvmppc_vcore *vc)
 {
@@ -4470,7 +4508,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
vc = vcpu->arch.vcore;
vcpu->arch.ceded = 0;
vcpu->arch.run_task = current;
-   vcpu->arch.state = KVMPPC_VCPU_RUNNABLE;
vcpu->arch.last_inst = KVM_INST_FETCH_FAILED;
 
/* See if the MMU is ready to go */
@@ -4497,6 +4534,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
/* flags save not required, but irq_pmu has no disable/enable API */
powerpc_local_irq_pmu_save(flags);
 
+   vcpu->arch.state = KVMPPC_VCPU_RUNNABLE;
+
if (signal_pending(current))
goto sigpend;
if (need_resched() || !kvm->arch.mmu_ready)
@@ -4536,7 +4575,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 
tb = mftb();
 
-   __kvmppc_create_dtl_entry(vcpu, pcpu, tb + vc->tb_offset, 0);
+   kvmppc_create_dtl_entry_p9(vcpu, vc, tb + vc->tb_offset);
 
trace_kvm_guest_enter(vcpu);
 
@@ -4577,6 +4616,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 
vcpu->cpu = -1;
vcpu->arch.thread_cpu = -1;
+   vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
powerpc_local_irq_pmu_restore(flags);
 
@@ -4639,6 +4679,7 @@ int 

Re: [PATCH v2 0/6] KASAN support for 64-bit Book 3S powerpc

2022-05-18 Thread Christophe Leroy


Le 18/05/2022 à 12:03, Paul Mackerras a écrit :
> This patch series implements KASAN on 64-bit POWER with radix MMU,
> such as POWER9 or POWER10.  Daniel Axtens posted previous versions of
> these patches, but is no longer working on KASAN, and I have been
> asked to get them ready for inclusion.
> 
> Because of various technical difficulties, mostly around the need to
> allow for code that runs in real mode, we only support "outline" mode
> (as opposed to "inline" mode), where the compiler adds a call to
> a checking procedure before every store to memory.
> 
> This series has known deficiencies, specifically that the kernel will
> crash on boot on a HPT system, and that out-of-bounds accesses to
> module global data are not caught (which leads to one of the KASAN
> tests failing).
> 
> v2: Split the large patch 3/3 of the previous series into three
> patches and addressed review comments; put the generic documentation
> changes in a separate patch at the end of the series; removed the RFC
> tag.
> 
> Comments welcome.

The series looks good to me.

Maybe patch 3 should go after patches 4 and 5 which are preparation patches.

Christophe

> 
> Paul.
> 
>   Documentation/dev-tools/kasan.rst  |   7 +-
>   Documentation/powerpc/kasan.txt|  58 
>   arch/powerpc/Kconfig   |   5 +-
>   arch/powerpc/Kconfig.debug |   3 +-
>   arch/powerpc/include/asm/book3s/64/hash.h  |   4 +
>   arch/powerpc/include/asm/book3s/64/pgtable.h   |   3 +
>   arch/powerpc/include/asm/book3s/64/radix.h |  12 ++-
>   arch/powerpc/include/asm/interrupt.h   |  52 ---
>   arch/powerpc/include/asm/kasan.h   |  22 +
>   arch/powerpc/kernel/Makefile   |  11 +++
>   arch/powerpc/kernel/smp.c  |  22 ++---
>   arch/powerpc/kernel/traps.c|   6 +-
>   arch/powerpc/kexec/Makefile|   2 +
>   arch/powerpc/kvm/Makefile  |   5 +
>   arch/powerpc/lib/Makefile  |   3 +
>   arch/powerpc/mm/book3s64/Makefile  |   9 ++
>   arch/powerpc/mm/kasan/Makefile |   3 +-
>   .../mm/kasan/{kasan_init_32.c => init_32.c}|   0
>   arch/powerpc/mm/kasan/init_book3s_64.c | 103 
> +
>   arch/powerpc/mm/ptdump/ptdump.c|   3 +-
>   arch/powerpc/platforms/Kconfig.cputype |   1 +
>   arch/powerpc/platforms/powernv/Makefile|   8 ++
>   arch/powerpc/platforms/powernv/smp.c   |   2 +-
>   arch/powerpc/platforms/pseries/Makefile|   6 ++
>   arch/powerpc/sysdev/xics/xics-common.c |   4 +-
>   arch/powerpc/sysdev/xive/common.c  |   4 +-
>   26 files changed, 320 insertions(+), 38 deletions(-)

Re: [PATCH 1/2] powerpc: Add generic PAGE_SIZE config symbols

2022-05-18 Thread Christophe Leroy


Le 18/05/2022 à 15:21, Arnd Bergmann a écrit :
> On Wed, May 18, 2022 at 2:00 PM Michael Ellerman  wrote:
>>
>> Christophe Leroy  writes:
>>> Le 05/05/2022 à 14:51, Michael Ellerman a écrit :
 Other arches (sh, mips, hexagon) use standard names for PAGE_SIZE
 related config symbols.

 Add matching symbols for powerpc, which are enabled by default but
 depend on our architecture specific PAGE_SIZE symbols.

 This allows generic/driver code to express dependencies on the PAGE_SIZE
 without needing to refer to architecture specific config symbols.
>>>
>>> I guess next step should be to get rid of powerpc specific symbols and
>>> use generic symbols instead.
>>>
>>> We have (only) 111 occurences of it.
>>
>> I thought about doing that, but it's quite a bit of churn. Maybe it's
>> worth it though to avoid confusion between the two symbols.
> 
> I have actually done this at some point, but for some reason never sent it 
> out,
> see my old patch at:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/commit/?h=randconfig-5.15-next=184c7273ee367fda3626e35f0079f181075690c8
> 
> Feel free to take ideas or the entire patch from that.
> 


Well, at this point I was just talking about renaming the 
CONFIG_PPC_xxK_PAGES symbols to the generic naming while still keeping 
them in powerpc Kconfig.

You are going one step further by making it a generic arch symbol, 
that's also a good idea and can be done more or less independantly.

Christophe

Re: [PATCH 00/35] Add group constraints and event code test as part of selftest

2022-05-18 Thread Michael Ellerman
Kajol Jain  writes:
> Patch series extends the perf interface selftests
> to cover scenarios for event code checking,
> group constraints, and also thresholding/branch related
> interface tests in sampling area.

There are build failures in CI:

  https://github.com/ruscur/linux-ci/actions/runs/2317863271

If you follow the instructions here:

  https://github.com/linuxppc/wiki/wiki/Testing-with-GitHub-Actions

You can have the same tests run against your own tree before you post to
the list.

cheers

> In this series, patches 1 to 14 adds additional tests under
> "powerpc/sampling_tests". These adds support for handling
> sample type PERF_SAMPLE_BRANCH_STACK along with interrupt regs.
> It adds utility functions and test for thresh_cmp and branch
> filters programmed in control register. Some of the tests needs
> to be skipped for "Generic Compat PMU" environment. Hence utility
> functions are added in "include/utils.c" and "sampling_tests/misc.h"
> to detect platform based on "auxv" entries.
>
> Currently in other architectures (like x86), the pmu_name is
> exposed via sysfs caps folder ie:
> "sys/bus/event_source/devices//caps". But in powerpc,
> "caps" is not supported. So, though the approach for detecting
> compat mode currently uses auxv, patchset adds an 
> utility function considering a possibility of
> getting "caps" added for powerpc.
>
> Link to the patch to add support for caps under sysfs in powerpc:
> http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297293
>
> Patches 15 to 35 covers test related to group constraints and event codes.
> These new set of changes are added under new folder:
> "selftests/powerpc/pmu/event_code_tests"
>
> Patch 15 covers changes required for new folder with Makefile changes.
> The other patches add tests for perf interface to check the event
> group constraints, valid/invalid event codes, blacklisted events etc.
> Also add required utility functions under header file "misc.h"
> in sampling_tests folder.
>
> Patch 33 and 34 depend upon thresh_cmp group constraint fix patches
> sent in upstream mailing list.
>
> Link to the thresh_cmp fix patchset:
> http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=298742
>
> Patch 13 of the patchset add selftest for mmcr1 pmcxsel/unit/cache fields,
> which was initially dropeed from sampling test patchset (patch number: 16)
>
> Link to the patch:
> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220127072012.662451-17-kj...@linux.ibm.com/
>
> Athira Rajeev (20):
>   testing/selftests/powerpc: Add support to fetch "platform" and "base
> platform" from auxv to detect platform.
>   selftest/powerpc/pmu: Refactor the platform check and add macros to
> find array size/PVR
>   selftest/powerpc/pmu: Add selftest to check branch stack enablement
> will not crash on any platforms
>   selftest/powerpc/pmu: Add selftest to check PERF_SAMPLE_REGS_INTR
> option will not crash on any platforms
>   selftest/powerpc/pmu: Add selftest for checking valid and invalid bhrb
> filter maps
>   selftest/powerpc/pmu: Add selftest for mmcr1 pmcxsel/unit/cache fields
>   selftest/powerpc/pmu: Add support for perf event code tests
>   selftest/powerpc/pmu: Add selftest for group constraint check for PMC5
> and PMC6
>   selftest/powerpc/pmu: Add selftest to check PMC5/6 is excluded from
> some constraint checks
>   selftest/powerpc/pmu: Add selftest to check constraint for number of
> counters in use.
>   selftest/powerpc/pmu: Add selftest for group constraint check when
> using same PMC
>   selftest/powerpc/pmu: Add selftest for group constraint check for
> radix_scope_qual field
>   selftest/powerpc/pmu: Add selftest for group constraint for MMCRA
> Sampling Mode field
>   selftest/powerpc/pmu: Add selftest for group constraint check MMCRA
> sample bits
>   selftest/powerpc/pmu: Add selftest for checking invalid bits in event
> code
>   selftest/powerpc/pmu: Add selftest for reserved bit check for MMCRA
> thresh_ctl field
>   selftest/powerpc/pmu: Add selftest for blacklist events check in
> power9
>   selftest/powerpc/pmu: Add selftest for event alternatives for power9
>   selftest/powerpc/pmu: Add selftest for event alternatives for power10
>   selftest/powerpc/pmu: Add selftest for PERF_TYPE_HARDWARE events valid
> check
>
> Kajol Jain (15):
>   selftest/powerpc/pmu: Add mask/shift bits for extracting threshold
> compare field
>   selftest/powerpc/pmu: Add interface test for mmcra_thresh_cmp fields
>   selftest/powerpc/pmu: Add support for branch sampling in get_intr_regs
> function
>   selftest/powerpc/pmu: Add interface test for mmcra_ifm field of
> indirect call type
>   selftest/powerpc/pmu: Add interface test for mmcra_ifm field for any
> branch type
>   selftest/powerpc/pmu: Add interface test for mmcra_ifm field for
> conditional branch type
>   selftest/powerpc/pmu: Add interface test for bhrb disable field
>   

Re: [PATCH 1/2] powerpc: Add generic PAGE_SIZE config symbols

2022-05-18 Thread Arnd Bergmann
On Wed, May 18, 2022 at 2:00 PM Michael Ellerman  wrote:
>
> Christophe Leroy  writes:
> > Le 05/05/2022 à 14:51, Michael Ellerman a écrit :
> >> Other arches (sh, mips, hexagon) use standard names for PAGE_SIZE
> >> related config symbols.
> >>
> >> Add matching symbols for powerpc, which are enabled by default but
> >> depend on our architecture specific PAGE_SIZE symbols.
> >>
> >> This allows generic/driver code to express dependencies on the PAGE_SIZE
> >> without needing to refer to architecture specific config symbols.
> >
> > I guess next step should be to get rid of powerpc specific symbols and
> > use generic symbols instead.
> >
> > We have (only) 111 occurences of it.
>
> I thought about doing that, but it's quite a bit of churn. Maybe it's
> worth it though to avoid confusion between the two symbols.

I have actually done this at some point, but for some reason never sent it out,
see my old patch at:

https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/commit/?h=randconfig-5.15-next=184c7273ee367fda3626e35f0079f181075690c8

Feel free to take ideas or the entire patch from that.

  Arnd


Re: [PATCH] tools/perf/test: Fix perf all PMU test to skip hv_24x7/hv_gpci tests on powerpc

2022-05-18 Thread Michael Ellerman
Athira Rajeev  writes:
> "perf all PMU test" picks the input events from
> "perf list --raw-dump pmu" list and runs "perf stat -e"
> for each of the event in the list. In case of powerpc, the
> PowerVM environment supports events from hv_24x7 and hv_gpci
> PMU which is of example format like below:
> - hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
> - hv_gpci/event,partition_id=?/
>
> The value for "?" needs to be filled in depending on
> system and respective event. CPM_ADJUNCT_INST needs have
> core value and domain value. hv_gpci event needs partition_id.
> Similarly, there are other events for hv_24x7 and hv_gpci
> having "?" in event format. Hence skip these events on powerpc
> platform since values like partition_id, domain is specific
> to system and event.
>
> Fixes: 3d5ac9effcc6 ("perf test: Workload test of all PMUs")
> Signed-off-by: Athira Rajeev 
> ---
>  tools/perf/tests/shell/stat_all_pmu.sh | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/tools/perf/tests/shell/stat_all_pmu.sh 
> b/tools/perf/tests/shell/stat_all_pmu.sh
> index b30dba455f36..4a854b545bec 100755
> --- a/tools/perf/tests/shell/stat_all_pmu.sh
> +++ b/tools/perf/tests/shell/stat_all_pmu.sh
> @@ -5,6 +5,16 @@
>  set -e
>  
>  for p in $(perf list --raw-dump pmu); do
> +  # In powerpc, skip the events for hv_24x7 and hv_gpci.
> +  # These events needs input values to be filled in for
> +  # core, chip, patition id based on system.
> +  # Example: hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
> +  # hv_gpci/event,partition_id=?/
> +  # Hence skip these events for ppc.
> +  if lscpu  |grep ppc && echo "$p" |grep -Eq 'hv_24x7|hv_gpci' ; then

My system doesn't have lscpu installed, why not use `uname -m`.

But why check for ppc at all, the name of the pmu seems unique enough -
no one else is going to call their pmu something so odd :)

cheers


Re: [PATCH 1/2] powerpc: Add generic PAGE_SIZE config symbols

2022-05-18 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 05/05/2022 à 14:51, Michael Ellerman a écrit :
>> Other arches (sh, mips, hexagon) use standard names for PAGE_SIZE
>> related config symbols.
>> 
>> Add matching symbols for powerpc, which are enabled by default but
>> depend on our architecture specific PAGE_SIZE symbols.
>> 
>> This allows generic/driver code to express dependencies on the PAGE_SIZE
>> without needing to refer to architecture specific config symbols.
>
> I guess next step should be to get rid of powerpc specific symbols and 
> use generic symbols instead.
>
> We have (only) 111 occurences of it.

I thought about doing that, but it's quite a bit of churn. Maybe it's
worth it though to avoid confusion between the two symbols.

There's probably some that could be converted to IS_ENABLED() at the
same time, especially in hash_utils.c.

cheers


Re: [PATCH 2/2] powerpc/signal: Report minimum signal frame size to userspace via AT_MINSIGSTKSZ

2022-05-18 Thread Tulio Magno Quites Machado Filho
Nicholas Piggin  writes:

> Implement the AT_MINSIGSTKSZ AUXV entry, allowing userspace to
> dynamically size stack allocations in a manner forward-compatible with
> new processor state saved in the signal frame
>
> For now these statically find the maximum signal frame size rather than
> doing any runtime testing of features to minimise the size.
>
> glibc 2.34 will take advantage of this, as will applications that use
> use _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
>
> Cc: Alan Modra 
> Cc: Tulio Magno Quites Machado Filho 
> References: 94b07c1f8c39 ("arm64: signal: Report signal frame size to 
> userspace via auxv")
> Signed-off-by: Nicholas Piggin 

Both patches LGTM from a glibc point of view.

Reviewed-by: Tulio Magno Quites Machado Filho 

Thanks!

-- 
Tulio Magno


Re: [PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs

2022-05-18 Thread Christophe Leroy


Le 18/05/2022 à 14:03, Michael Ellerman a écrit :
> Michael Ellerman  writes:
>> "Naveen N. Rao"  writes:
>>> Christophe Leroy wrote:
 A lot of #ifdefs can be replaced by IS_ENABLED()

 Do so.

 This requires to have kernel_toc_addr() defined at all time
 as well as PPC_INST_LD_TOC and PPC_INST_STD_LR.

 Signed-off-by: Christophe Leroy 
 ---
 v2: Moved the setup of pop outside of the big if()/else() in 
 __ftrace_make_nop()
 ---
   arch/powerpc/include/asm/code-patching.h |   2 -
   arch/powerpc/include/asm/module.h|   2 -
   arch/powerpc/include/asm/sections.h  |  24 +--
   arch/powerpc/kernel/trace/ftrace.c   | 182 +++
   4 files changed, 103 insertions(+), 107 deletions(-)

>>>
>>> 
>>>
 @@ -710,6 +707,9 @@ void arch_ftrace_update_code(int command)

   #ifdef CONFIG_PPC64
   #define PACATOC offsetof(struct paca_struct, kernel_toc)
 +#else
 +#define PACATOC 0
 +#endif
>>>
>>> This conflicts with my fix for the ftrace init tramp:
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220516071422.463738-1-naveen.n@linux.vnet.ibm.com/
>>>
>>> It probably makes sense to retain #ifdef CONFIG_PPC64, so that we can
>>> get rid of the PACATOC. Here is an incremental diff:
>>
>> Where is the incremental diff meant to apply?
>>
>> It doesn't apply on top of patch 19, or at the end of the series.
> 
> I think I worked out what you meant.
> 
> Can you check what's in next-test:
> 
>https://github.com/linuxppc/linux/commits/next-test

Yes that looks fine.

As Naveen mentioned we can also get rid of PACATOC completely and use 
offsetof(struct paca_struct, kernel_toc) directly at the only place 
PACATOC is used.

Thanks
Christophe

Re: [PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs

2022-05-18 Thread Christophe Leroy


Le 18/05/2022 à 13:19, Michael Ellerman a écrit :
> "Naveen N. Rao"  writes:
>> Christophe Leroy wrote:
>>> A lot of #ifdefs can be replaced by IS_ENABLED()
>>>
>>> Do so.
>>>
>>> This requires to have kernel_toc_addr() defined at all time
>>> as well as PPC_INST_LD_TOC and PPC_INST_STD_LR.
>>>
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>> v2: Moved the setup of pop outside of the big if()/else() in 
>>> __ftrace_make_nop()
>>> ---
>>>   arch/powerpc/include/asm/code-patching.h |   2 -
>>>   arch/powerpc/include/asm/module.h|   2 -
>>>   arch/powerpc/include/asm/sections.h  |  24 +--
>>>   arch/powerpc/kernel/trace/ftrace.c   | 182 +++
>>>   4 files changed, 103 insertions(+), 107 deletions(-)
>>>
>>
>> 
>>
>>> @@ -710,6 +707,9 @@ void arch_ftrace_update_code(int command)
>>>
>>>   #ifdef CONFIG_PPC64
>>>   #define PACATOC offsetof(struct paca_struct, kernel_toc)
>>> +#else
>>> +#define PACATOC 0
>>> +#endif
>>
>> This conflicts with my fix for the ftrace init tramp:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220516071422.463738-1-naveen.n@linux.vnet.ibm.com/
>>
>> It probably makes sense to retain #ifdef CONFIG_PPC64, so that we can
>> get rid of the PACATOC. Here is an incremental diff:
> 
> Where is the incremental diff meant to apply?
> 
> It doesn't apply on top of patch 19, or at the end of the series.

White space damage it seems.

I'll send you a proper fixup for that patch in a few minutes.


> 
> cheers
> 
>> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
>> b/arch/powerpc/kernel/trace/ftrace.c
>> index da1a2f8ebb72f3..28169a1ccc7377 100644
>> --- a/arch/powerpc/kernel/trace/ftrace.c
>> +++ b/arch/powerpc/kernel/trace/ftrace.c
>> @@ -701,11 +701,6 @@ void arch_ftrace_update_code(int command)
>>   }
>>   
>>   #ifdef CONFIG_PPC64
>> -#define PACATOC offsetof(struct paca_struct, kernel_toc)
>> -#else
>> -#define PACATOC 0
>> -#endif
>> -
>>   extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
>>   
>>   void ftrace_free_init_tramp(void)
>> @@ -724,7 +719,7 @@ int __init ftrace_dyn_arch_init(void)
>>  int i;
>>  unsigned int *tramp[] = { ftrace_tramp_text, ftrace_tramp_init };
>>  u32 stub_insns[] = {
>> -PPC_RAW_LD(_R12, _R13, PACATOC),
>> +PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
>> kernel_toc)),
>>  PPC_RAW_ADDIS(_R12, _R12, 0),
>>  PPC_RAW_ADDI(_R12, _R12, 0),
>>  PPC_RAW_MTCTR(_R12),
>> @@ -733,9 +728,6 @@ int __init ftrace_dyn_arch_init(void)
>>  unsigned long addr;
>>  long reladdr;
>>   
>> -if (IS_ENABLED(CONFIG_PPC32))
>> -return 0;
>> -
>>  addr = ppc_global_function_entry((void *)FTRACE_REGS_ADDR);
>>  reladdr = addr - kernel_toc_addr();
>>   
>> @@ -754,6 +746,7 @@ int __init ftrace_dyn_arch_init(void)
>>   
>>  return 0;
>>   }
>> +#endif
>>   
>>   #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>   
>>
>> - Naveen

Re: [PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs

2022-05-18 Thread Michael Ellerman
Michael Ellerman  writes:
> "Naveen N. Rao"  writes:
>> Christophe Leroy wrote:
>>> A lot of #ifdefs can be replaced by IS_ENABLED()
>>> 
>>> Do so.
>>> 
>>> This requires to have kernel_toc_addr() defined at all time
>>> as well as PPC_INST_LD_TOC and PPC_INST_STD_LR.
>>> 
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>> v2: Moved the setup of pop outside of the big if()/else() in 
>>> __ftrace_make_nop()
>>> ---
>>>  arch/powerpc/include/asm/code-patching.h |   2 -
>>>  arch/powerpc/include/asm/module.h|   2 -
>>>  arch/powerpc/include/asm/sections.h  |  24 +--
>>>  arch/powerpc/kernel/trace/ftrace.c   | 182 +++
>>>  4 files changed, 103 insertions(+), 107 deletions(-)
>>> 
>>
>> 
>>
>>> @@ -710,6 +707,9 @@ void arch_ftrace_update_code(int command)
>>> 
>>>  #ifdef CONFIG_PPC64
>>>  #define PACATOC offsetof(struct paca_struct, kernel_toc)
>>> +#else
>>> +#define PACATOC 0
>>> +#endif
>>
>> This conflicts with my fix for the ftrace init tramp:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220516071422.463738-1-naveen.n@linux.vnet.ibm.com/
>>
>> It probably makes sense to retain #ifdef CONFIG_PPC64, so that we can 
>> get rid of the PACATOC. Here is an incremental diff:
>
> Where is the incremental diff meant to apply?
>
> It doesn't apply on top of patch 19, or at the end of the series.

I think I worked out what you meant.

Can you check what's in next-test:

  https://github.com/linuxppc/linux/commits/next-test

cheers


Re: [Buildroot] [PATCH] linux: Fix powerpc64le defconfig selection

2022-05-18 Thread Arnd Bergmann
On Mon, May 16, 2022 at 2:17 PM Michael Ellerman  wrote:
> Having said that I think we could handle this better in the powerpc
> kernel. Other arches allow specifying a different value for ARCH, which
> then is fed into the defconfig.
>
> That way you could at least pass ARCH=ppc/ppc64/ppc64le, and get an
> appropriate defconfig.
>
> I'll work on some kernel changes for that.

I would recommend against that. It's always a bit hacky, and I think this was
mainly done on x86 to avoid breaking user workflows after arch/i386
and arch/x86_64
got merged.

Since there was never an arch/ppc64le, and arch/{ppc,ppc64}/ are gone for so
long, I see no point in bringing back those interfaces, just use the right
defconfig for what you want.

Arnd


Re: [PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs

2022-05-18 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> Christophe Leroy wrote:
>> A lot of #ifdefs can be replaced by IS_ENABLED()
>> 
>> Do so.
>> 
>> This requires to have kernel_toc_addr() defined at all time
>> as well as PPC_INST_LD_TOC and PPC_INST_STD_LR.
>> 
>> Signed-off-by: Christophe Leroy 
>> ---
>> v2: Moved the setup of pop outside of the big if()/else() in 
>> __ftrace_make_nop()
>> ---
>>  arch/powerpc/include/asm/code-patching.h |   2 -
>>  arch/powerpc/include/asm/module.h|   2 -
>>  arch/powerpc/include/asm/sections.h  |  24 +--
>>  arch/powerpc/kernel/trace/ftrace.c   | 182 +++
>>  4 files changed, 103 insertions(+), 107 deletions(-)
>> 
>
> 
>
>> @@ -710,6 +707,9 @@ void arch_ftrace_update_code(int command)
>> 
>>  #ifdef CONFIG_PPC64
>>  #define PACATOC offsetof(struct paca_struct, kernel_toc)
>> +#else
>> +#define PACATOC 0
>> +#endif
>
> This conflicts with my fix for the ftrace init tramp:
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220516071422.463738-1-naveen.n@linux.vnet.ibm.com/
>
> It probably makes sense to retain #ifdef CONFIG_PPC64, so that we can 
> get rid of the PACATOC. Here is an incremental diff:

Where is the incremental diff meant to apply?

It doesn't apply on top of patch 19, or at the end of the series.

cheers

> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> b/arch/powerpc/kernel/trace/ftrace.c
> index da1a2f8ebb72f3..28169a1ccc7377 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -701,11 +701,6 @@ void arch_ftrace_update_code(int command)
>  }
>  
>  #ifdef CONFIG_PPC64
> -#define PACATOC offsetof(struct paca_struct, kernel_toc)
> -#else
> -#define PACATOC 0
> -#endif
> -
>  extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
>  
>  void ftrace_free_init_tramp(void)
> @@ -724,7 +719,7 @@ int __init ftrace_dyn_arch_init(void)
>   int i;
>   unsigned int *tramp[] = { ftrace_tramp_text, ftrace_tramp_init };
>   u32 stub_insns[] = {
> - PPC_RAW_LD(_R12, _R13, PACATOC),
> + PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
> kernel_toc)),
>   PPC_RAW_ADDIS(_R12, _R12, 0),
>   PPC_RAW_ADDI(_R12, _R12, 0),
>   PPC_RAW_MTCTR(_R12),
> @@ -733,9 +728,6 @@ int __init ftrace_dyn_arch_init(void)
>   unsigned long addr;
>   long reladdr;
>  
> - if (IS_ENABLED(CONFIG_PPC32))
> - return 0;
> -
>   addr = ppc_global_function_entry((void *)FTRACE_REGS_ADDR);
>   reladdr = addr - kernel_toc_addr();
>  
> @@ -754,6 +746,7 @@ int __init ftrace_dyn_arch_init(void)
>  
>   return 0;
>  }
> +#endif
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  
>
> - Naveen


Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-18 Thread Baoquan He
On 05/18/22 at 02:48pm, Naveen N. Rao wrote:
> Baoquan He wrote:
> > On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> > > 
> > > It seems that recordmcount is not really maintained anymore now that x86
> > > uses objtool?
> > > 
> > > There've been several threads about fixing recordmcount, but none of
> > > them seem to have lead to a solution.
> > > 
> > > These weak symbol vs recordmcount problems have been worked around going
> > > back as far as 2020:
> > 
> > It gives me feeling that llvm or recordmcount should make adjustment,
> > but not innocent kernel code, if there are a lot of places reported.
> > I am curious how llvm or recordmcount dev respond to this.
> 
> As Michael stated, this is not just llvm - binutils has also adopted the
> same and "unused" section symbols are being dropped.
> 
> For recordmcount, there were a few threads and approaches that have been
> tried:
> - 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.le...@csgroup.eu/
> - 
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434=*
> 
> Objtool has picked up a more appropriate fix for this recently, and
> long-term, we would like to move to using objtool for ftrace purposes:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3
> 
> While that is being pursued, we want to unbreak some of the CI and users who
> are hitting this.

I see, thanks for the details. I would persue fix in recordmcount if
possible, while has no objection to fix it in kernel with justification
if have to. Given my limited linking knowledge, leave this to other
expert to decide.



Re: [PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs

2022-05-18 Thread Christophe Leroy


Le 18/05/2022 à 11:45, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>> A lot of #ifdefs can be replaced by IS_ENABLED()
>>
>> Do so.
>>
>> This requires to have kernel_toc_addr() defined at all time
>> as well as PPC_INST_LD_TOC and PPC_INST_STD_LR.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>> v2: Moved the setup of pop outside of the big if()/else() in 
>> __ftrace_make_nop()
>> ---
>>  arch/powerpc/include/asm/code-patching.h |   2 -
>>  arch/powerpc/include/asm/module.h    |   2 -
>>  arch/powerpc/include/asm/sections.h  |  24 +--
>>  arch/powerpc/kernel/trace/ftrace.c   | 182 +++
>>  4 files changed, 103 insertions(+), 107 deletions(-)
>>
> 
> 
> 
>> @@ -710,6 +707,9 @@ void arch_ftrace_update_code(int command)
>>
>>  #ifdef CONFIG_PPC64
>>  #define PACATOC offsetof(struct paca_struct, kernel_toc)
>> +#else
>> +#define PACATOC 0
>> +#endif
> 
> This conflicts with my fix for the ftrace init tramp:
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220516071422.463738-1-naveen.n@linux.vnet.ibm.com/
>  
> 
> 
> It probably makes sense to retain #ifdef CONFIG_PPC64, so that we can 
> get rid of the PACATOC. Here is an incremental diff:

Ah yes, it makes sense.

Initial purpose was to de-duplicate ftrace_dyn_arch_init(), but as 
ftrace_dyn_arch_init() is defined as a weak nop function in 
kernel/trace/ftrace.c we don't need it for PPC32 at all.

And then kernel_toc_addr() could remain inside #ifdef CONFIG_PPC64 in 
asm/section.h

> 
> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> b/arch/powerpc/kernel/trace/ftrace.c
> index da1a2f8ebb72f3..28169a1ccc7377 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -701,11 +701,6 @@ void arch_ftrace_update_code(int command)
> }
> 
> #ifdef CONFIG_PPC64
> -#define PACATOC offsetof(struct paca_struct, kernel_toc)
> -#else
> -#define PACATOC 0
> -#endif
> -
> extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
> 
> void ftrace_free_init_tramp(void)
> @@ -724,7 +719,7 @@ int __init ftrace_dyn_arch_init(void)
>  int i;
>  unsigned int *tramp[] = { ftrace_tramp_text, ftrace_tramp_init };
>  u32 stub_insns[] = {
> -    PPC_RAW_LD(_R12, _R13, PACATOC),
> +    PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, kernel_toc)),
>      PPC_RAW_ADDIS(_R12, _R12, 0),
>      PPC_RAW_ADDI(_R12, _R12, 0),
>      PPC_RAW_MTCTR(_R12),
> @@ -733,9 +728,6 @@ int __init ftrace_dyn_arch_init(void)
>  unsigned long addr;
>  long reladdr;
> 
> -    if (IS_ENABLED(CONFIG_PPC32))
> -    return 0;
> -
>  addr = ppc_global_function_entry((void *)FTRACE_REGS_ADDR);
>  reladdr = addr - kernel_toc_addr();
> 
> @@ -754,6 +746,7 @@ int __init ftrace_dyn_arch_init(void)
> 
>  return 0;
> }
> +#endif
> 
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> 
> 
> - Naveen

[PATCH v2 3/6] powerpc: Book3S 64-bit outline-only KASAN support

2022-05-18 Thread Paul Mackerras
From: Daniel Axtens 

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Document KASAN in powerpc docs.

Background
--

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008 This is a one-to-one mapping with physical
memory from ... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

[pau...@ozlabs.org - Rebased onto 5.17.  Note that a kernel with
 CONFIG_KASAN=y will crash during boot on a machine using HPT
 translation because not all the entry points to the generic
 KASAN code are protected with a call to kasan_arch_is_ready().]

Originally-by: Balbir Singh  # ppc64 out-of-line radix 
version
Signed-off-by: Daniel Axtens 
Signed-off-by: Paul Mackerras 
---
 Documentation/powerpc/kasan.txt  |  48 -
 arch/powerpc/Kconfig |   5 +-
 arch/powerpc/Kconfig.debug   |   3 +-
 arch/powerpc/include/asm/book3s/64/hash.h|   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |   3 +
 arch/powerpc/include/asm/book3s/64/radix.h   |  12 ++-
 arch/powerpc/include/asm/kasan.h |  22 
 arch/powerpc/kernel/Makefile |  11 ++
 arch/powerpc/kvm/Makefile|   5 +
 arch/powerpc/mm/book3s64/Makefile|   9 ++
 arch/powerpc/mm/kasan/Makefile   |   1 +
 arch/powerpc/mm/kasan/init_book3s_64.c   | 103 +++
 arch/powerpc/mm/ptdump/ptdump.c  |   3 +-
 arch/powerpc/platforms/Kconfig.cputype   |   1 +
 arch/powerpc/platforms/powernv/Makefile  |   8 ++
 arch/powerpc/platforms/pseries/Makefile  |   3 +
 16 files changed, 234 insertions(+), 7 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
index 26bb0e8bb18c..f032b4eaf205 100644
--- a/Documentation/powerpc/kasan.txt
+++ b/Documentation/powerpc/kasan.txt
@@ -1,4 +1,4 @@
-KASAN is supported on powerpc on 32-bit only.
+KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
 
 32 bit support
 ==
@@ -10,3 +10,49 @@ fixmap area and occupies one eighth of the total kernel 
virtual memory space.
 
 Instrumentation of the vmalloc area is optional, unless built with modules,
 in which case it is required.
+
+64 bit support
+==
+
+Currently, only the radix MMU is supported. There have been versions for hash
+and Book3E processors floating around on the mailing list, but nothing has been
+merged.
+
+KASAN support on Book3S is a bit tricky to get right:
+
+ - It would be good to support inline instrumentation so as to be able to catch
+   stack issues that cannot be caught with outline mode.
+
+ - Inline instrumentation requires a fixed offset.
+
+ - Book3S runs code with translations off ("real mode") during 

[PATCH v2 1/6] kasan: Document support on 32-bit powerpc

2022-05-18 Thread Paul Mackerras
From: Daniel Axtens 

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Suggested-by: Christophe Leroy 
Reviewed-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
Signed-off-by: Paul Mackerras 
---
 Documentation/powerpc/kasan.txt | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
-- 
2.35.3



[PATCH v2 2/6] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

2022-05-18 Thread Paul Mackerras
From: Daniel Axtens 

kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/mm/kasan/Makefile   | 2 +-
 arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..bcbfd6f2eca3 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC32)+= init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
-- 
2.35.3



[PATCH v2 6/6] Documentation/kasan: Update details of KASAN on powerpc

2022-05-18 Thread Paul Mackerras
From: Daniel Axtens 

Signed-off-by: Paul Mackerras 
---
 Documentation/dev-tools/kasan.rst | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 8089c559d339..448995c11bee 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,7 +36,9 @@ Both software KASAN modes work with SLUB and SLAB memory 
allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures, and tag-based KASAN modes are supported only for 
arm64.
+and riscv architectures. It is also supported on powerpc for 32-bit kernels and
+for 64-bit kernels running under the Radix MMU. Tag-based KASAN modes are
+supported only for arm64.
 
 Usage
 -
@@ -351,6 +353,9 @@ With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc 
space at the
 cost of greater memory usage. Currently, this is supported on x86,
 riscv, s390, and powerpc.
 
+It is optional, except on 64-bit powerpc kernels, and on 32-bit
+powerpc kernels with module support, where it is required.
+
 This works by hooking into vmalloc and vmap and dynamically
 allocating real shadow memory to back the mappings.
 
-- 
2.35.3



[PATCH v2 0/6] KASAN support for 64-bit Book 3S powerpc

2022-05-18 Thread Paul Mackerras
This patch series implements KASAN on 64-bit POWER with radix MMU,
such as POWER9 or POWER10.  Daniel Axtens posted previous versions of
these patches, but is no longer working on KASAN, and I have been
asked to get them ready for inclusion.

Because of various technical difficulties, mostly around the need to
allow for code that runs in real mode, we only support "outline" mode
(as opposed to "inline" mode), where the compiler adds a call to
a checking procedure before every store to memory.

This series has known deficiencies, specifically that the kernel will
crash on boot on a HPT system, and that out-of-bounds accesses to
module global data are not caught (which leads to one of the KASAN
tests failing).

v2: Split the large patch 3/3 of the previous series into three
patches and addressed review comments; put the generic documentation
changes in a separate patch at the end of the series; removed the RFC
tag.

Comments welcome.

Paul.

 Documentation/dev-tools/kasan.rst  |   7 +-
 Documentation/powerpc/kasan.txt|  58 
 arch/powerpc/Kconfig   |   5 +-
 arch/powerpc/Kconfig.debug |   3 +-
 arch/powerpc/include/asm/book3s/64/hash.h  |   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h   |   3 +
 arch/powerpc/include/asm/book3s/64/radix.h |  12 ++-
 arch/powerpc/include/asm/interrupt.h   |  52 ---
 arch/powerpc/include/asm/kasan.h   |  22 +
 arch/powerpc/kernel/Makefile   |  11 +++
 arch/powerpc/kernel/smp.c  |  22 ++---
 arch/powerpc/kernel/traps.c|   6 +-
 arch/powerpc/kexec/Makefile|   2 +
 arch/powerpc/kvm/Makefile  |   5 +
 arch/powerpc/lib/Makefile  |   3 +
 arch/powerpc/mm/book3s64/Makefile  |   9 ++
 arch/powerpc/mm/kasan/Makefile |   3 +-
 .../mm/kasan/{kasan_init_32.c => init_32.c}|   0
 arch/powerpc/mm/kasan/init_book3s_64.c | 103 +
 arch/powerpc/mm/ptdump/ptdump.c|   3 +-
 arch/powerpc/platforms/Kconfig.cputype |   1 +
 arch/powerpc/platforms/powernv/Makefile|   8 ++
 arch/powerpc/platforms/powernv/smp.c   |   2 +-
 arch/powerpc/platforms/pseries/Makefile|   6 ++
 arch/powerpc/sysdev/xics/xics-common.c |   4 +-
 arch/powerpc/sysdev/xive/common.c  |   4 +-
 26 files changed, 320 insertions(+), 38 deletions(-)


[PATCH v2 4/6] powerpc/kasan: Don't instrument non-maskable or raw interrupts

2022-05-18 Thread Paul Mackerras
From: Daniel Axtens 

Disable address sanitization for raw and non-maskable interrupt
handlers, because they can run in real mode, where we cannot access
the shadow memory.  (Note that kasan_arch_is_ready() doesn't test for
real mode, since it is a static branch for speed, and in any case not
all the entry points to the generic KASAN code are protected by
kasan_arch_is_ready guards.)

The changes to interrupt_nmi_enter/exit_prepare() look larger than
they actually are.  The changes are equivalent to adding
!IS_ENABLED(CONFIG_KASAN) to the conditions for calling nmi_enter() or
nmi_exit() in real mode.  That is, the code is equivalent to using the
following condition for calling nmi_enter/exit:

if (((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
!firmware_has_feature(FW_FEATURE_LPAR) ||
radix_enabled()) &&
!IS_ENABLED(CONFIG_KASAN) ||
(mfmsr() & MSR_DR))

That unwieldy condition has been split into several statements with
comments, for easier reading.

The nmi_ipi_lock functions that call atomic functions (i.e.,
nmi_ipi_lock_start(), nmi_ipi_lock() and nmi_ipi_unlock()), besides
being marked noinstr, now call arch_atomic_* functions instead of
atomic_* functions because with KASAN enabled, the atomic_* functions
are wrappers which explicitly do address sanitization on their
arguments.  Since we are trying to avoid address sanitization, we have
to use the lower-level arch_atomic_* versions.

In hv_nmi_check_nonrecoverable(), the regs_set_unrecoverable() call
has been open-coded so as to avoid having to either trust the inlining
or mark regs_set_unrecoverable() as noinstr.

[pau...@ozlabs.org: combined a few work-in-progress commits of
 Daniel's and wrote the commit message.]

Signed-off-by: Daniel Axtens 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/interrupt.h | 52 +---
 arch/powerpc/kernel/smp.c| 22 ++--
 arch/powerpc/kernel/traps.c  |  6 ++--
 arch/powerpc/lib/Makefile|  3 ++
 arch/powerpc/platforms/powernv/smp.c |  2 +-
 5 files changed, 59 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index fc28f46d2f9d..fb244b6ca7f0 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -327,22 +327,46 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
}
 #endif
 
+   /* If data relocations are enabled, it's safe to use nmi_enter() */
+   if (mfmsr() & MSR_DR) {
+   nmi_enter();
+   return;
+   }
+
/*
-* Do not use nmi_enter() for pseries hash guest taking a real-mode
+* But do not use nmi_enter() for pseries hash guest taking a real-mode
 * NMI because not everything it touches is within the RMA limit.
 */
-   if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
-   !firmware_has_feature(FW_FEATURE_LPAR) ||
-   radix_enabled() || (mfmsr() & MSR_DR))
-   nmi_enter();
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
+   firmware_has_feature(FW_FEATURE_LPAR) &&
+   !radix_enabled())
+   return;
+
+   /*
+* Likewise, don't use it if we have some form of instrumentation (like
+* KASAN shadow) that is not safe to access in real mode (even on radix)
+*/
+   if (IS_ENABLED(CONFIG_KASAN))
+   return;
+
+   /* Otherwise, it should be safe to call it */
+   nmi_enter();
 }
 
 static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
 {
-   if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
-   !firmware_has_feature(FW_FEATURE_LPAR) ||
-   radix_enabled() || (mfmsr() & MSR_DR))
+   if (mfmsr() & MSR_DR) {
+   // nmi_exit if relocations are on
nmi_exit();
+   } else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
+  firmware_has_feature(FW_FEATURE_LPAR) &&
+  !radix_enabled()) {
+   // no nmi_exit for a pseries hash guest taking a real mode 
exception
+   } else if (IS_ENABLED(CONFIG_KASAN)) {
+   // no nmi_exit for KASAN in real mode
+   } else {
+   nmi_exit();
+   }
 
/*
 * nmi does not call nap_adjust_return because nmi should not create
@@ -410,7 +434,8 @@ static inline void interrupt_nmi_exit_prepare(struct 
pt_regs *regs, struct inter
  * Specific handlers may have additional restrictions.
  */
 #define DEFINE_INTERRUPT_HANDLER_RAW(func) \
-static __always_inline long ##func(struct pt_regs *regs);  \
+static __always_inline __no_sanitize_address __no_kcsan long   \
+##func(struct pt_regs *regs);  \

[PATCH v2 5/6] powerpc/kasan: Disable address sanitization in kexec paths

2022-05-18 Thread Paul Mackerras
From: Daniel Axtens 

The kexec code paths involve code that necessarily run in real mode,
as CPUs are disabled and control is transferred to the new kernel.
Disable address sanitization for the kexec code and the functions
called in real mode on CPUs being disabled.

[pau...@ozlabs.org: combined a few work-in-progress commits of
 Daniel's and wrote the commit message.]

Signed-off-by: Daniel Axtens 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kexec/Makefile | 2 ++
 arch/powerpc/platforms/pseries/Makefile | 3 +++
 arch/powerpc/sysdev/xics/xics-common.c  | 4 ++--
 arch/powerpc/sysdev/xive/common.c   | 4 ++--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index b6c52608cb49..0c2abe7f9908 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_KEXEC_FILE)  += file_load.o ranges.o 
file_load_$(BITS).o elf_$(BITS)
 GCOV_PROFILE_core_$(BITS).o := n
 KCOV_INSTRUMENT_core_$(BITS).o := n
 UBSAN_SANITIZE_core_$(BITS).o := n
+KASAN_SANITIZE_core.o := n
+KASAN_SANITIZE_core_$(BITS) := n
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index b407fdeb6e78..98e878c32a21 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -35,3 +35,6 @@ obj-$(CONFIG_ARCH_HAS_CC_PLATFORM)+= cc_platform.o
 
 # nothing that operates in real mode is safe for KASAN
 KASAN_SANITIZE_ras.o := n
+KASAN_SANITIZE_kexec.o := n
+#machine_kexec
+KASAN_SANITIZE_setup.o := n
diff --git a/arch/powerpc/sysdev/xics/xics-common.c 
b/arch/powerpc/sysdev/xics/xics-common.c
index f3fb2a12124c..322b2b8bd467 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -146,7 +146,7 @@ void __init xics_smp_probe(void)
 
 #endif /* CONFIG_SMP */
 
-void xics_teardown_cpu(void)
+noinstr void xics_teardown_cpu(void)
 {
struct xics_cppr *os_cppr = this_cpu_ptr(_cppr);
 
@@ -159,7 +159,7 @@ void xics_teardown_cpu(void)
icp_ops->teardown_cpu();
 }
 
-void xics_kexec_teardown_cpu(int secondary)
+noinstr void xics_kexec_teardown_cpu(int secondary)
 {
xics_teardown_cpu();
 
diff --git a/arch/powerpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
index 1ca5564bda9d..87b825b7401d 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1241,7 +1241,7 @@ static int xive_setup_cpu_ipi(unsigned int cpu)
return 0;
 }
 
-static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
+noinstr static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
 {
unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu);
 
@@ -1634,7 +1634,7 @@ void xive_flush_interrupt(void)
 
 #endif /* CONFIG_SMP */
 
-void xive_teardown_cpu(void)
+noinstr void xive_teardown_cpu(void)
 {
struct xive_cpu *xc = __this_cpu_read(xive_cpu);
unsigned int cpu = smp_processor_id();
-- 
2.35.3



Re: [PATCH] powerpc/vdso: Fix incorrect CFI in gettimeofday.S

2022-05-18 Thread Naveen N. Rao

Michael Ellerman wrote:

"Naveen N. Rao"  writes:

Michael Ellerman wrote:


diff --git a/arch/powerpc/kernel/vdso/gettimeofday.S 
b/arch/powerpc/kernel/vdso/gettimeofday.S
index eb9c81e1c218..0aee255e9cbb 100644
--- a/arch/powerpc/kernel/vdso/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso/gettimeofday.S
@@ -22,12 +22,15 @@
 .macro cvdso_call funct call_time=0
   .cfi_startproc
PPC_STLUr1, -PPC_MIN_STKFRM(r1)
+  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
mflrr0
-  .cfi_register lr, r0
PPC_STLUr1, -PPC_MIN_STKFRM(r1)
+  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)





@@ -46,6 +50,7 @@
mtlrr0
   .cfi_restore lr
addir1, r1, 2 * PPC_MIN_STKFRM
+  .cfi_def_cfa_offset 0


Should this be .cfi_adjust_cfa_offset, given that we used that at the
start of the function?
 
AIUI "adjust x" is offset += x, whereas "def x" is offset = x.


So we could use adjust here, but we'd need to adjust by -(2 * PPC_MIN_STKFRM).

It seemed clearer to just set the offset back to 0, which is what it is
at the start of the function.


I read the first .cfi_adjust_cfa_offset directive (rather than the 
.cfi_def_cfa_offset directive) in this macro to be intentionally 
retaining the offset to what it was before the VDSO. If that is 
desirable, then setting it to 0 here will change it, I _think_.




But I'm not a CFI expert at all, so I'll defer to anyone else who has an
opinion :)


Oh, the above is just my hypothesis. Would be good to get confirmation.


- Naveen


Re: [PATCH v3 19/25] powerpc/ftrace: Minimise number of #ifdefs

2022-05-18 Thread Naveen N. Rao

Christophe Leroy wrote:

A lot of #ifdefs can be replaced by IS_ENABLED()

Do so.

This requires to have kernel_toc_addr() defined at all time
as well as PPC_INST_LD_TOC and PPC_INST_STD_LR.

Signed-off-by: Christophe Leroy 
---
v2: Moved the setup of pop outside of the big if()/else() in __ftrace_make_nop()
---
 arch/powerpc/include/asm/code-patching.h |   2 -
 arch/powerpc/include/asm/module.h|   2 -
 arch/powerpc/include/asm/sections.h  |  24 +--
 arch/powerpc/kernel/trace/ftrace.c   | 182 +++
 4 files changed, 103 insertions(+), 107 deletions(-)






@@ -710,6 +707,9 @@ void arch_ftrace_update_code(int command)

 #ifdef CONFIG_PPC64
 #define PACATOC offsetof(struct paca_struct, kernel_toc)
+#else
+#define PACATOC 0
+#endif


This conflicts with my fix for the ftrace init tramp:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220516071422.463738-1-naveen.n@linux.vnet.ibm.com/

It probably makes sense to retain #ifdef CONFIG_PPC64, so that we can 
get rid of the PACATOC. Here is an incremental diff:


diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index da1a2f8ebb72f3..28169a1ccc7377 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -701,11 +701,6 @@ void arch_ftrace_update_code(int command)
}

#ifdef CONFIG_PPC64
-#define PACATOC offsetof(struct paca_struct, kernel_toc)
-#else
-#define PACATOC 0
-#endif
-
extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];

void ftrace_free_init_tramp(void)
@@ -724,7 +719,7 @@ int __init ftrace_dyn_arch_init(void)
int i;
unsigned int *tramp[] = { ftrace_tramp_text, ftrace_tramp_init };
u32 stub_insns[] = {
-   PPC_RAW_LD(_R12, _R13, PACATOC),
+   PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
kernel_toc)),
PPC_RAW_ADDIS(_R12, _R12, 0),
PPC_RAW_ADDI(_R12, _R12, 0),
PPC_RAW_MTCTR(_R12),
@@ -733,9 +728,6 @@ int __init ftrace_dyn_arch_init(void)
unsigned long addr;
long reladdr;

-   if (IS_ENABLED(CONFIG_PPC32))
-   return 0;
-
addr = ppc_global_function_entry((void *)FTRACE_REGS_ADDR);
reladdr = addr - kernel_toc_addr();

@@ -754,6 +746,7 @@ int __init ftrace_dyn_arch_init(void)

return 0;
}
+#endif

#ifdef CONFIG_FUNCTION_GRAPH_TRACER


- Naveen


Re: [PATCH 1/2] powerpc: Add generic PAGE_SIZE config symbols

2022-05-18 Thread Christophe Leroy


Le 05/05/2022 à 14:51, Michael Ellerman a écrit :
> Other arches (sh, mips, hexagon) use standard names for PAGE_SIZE
> related config symbols.
> 
> Add matching symbols for powerpc, which are enabled by default but
> depend on our architecture specific PAGE_SIZE symbols.
> 
> This allows generic/driver code to express dependencies on the PAGE_SIZE
> without needing to refer to architecture specific config symbols.

I guess next step should be to get rid of powerpc specific symbols and 
use generic symbols instead.

We have (only) 111 occurences of it.


> 
> Signed-off-by: Michael Ellerman 
> ---
>   arch/powerpc/Kconfig | 16 
>   1 file changed, 16 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 145af02df3dc..02994361cc7a 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -759,6 +759,22 @@ config PPC_256K_PAGES
>   
>   endchoice
>   
> +config PAGE_SIZE_4KB
> + def_bool y
> + depends on PPC_4K_PAGES
> +
> +config PAGE_SIZE_16KB
> + def_bool y
> + depends on PPC_16K_PAGES
> +
> +config PAGE_SIZE_64KB
> + def_bool y
> + depends on PPC_64K_PAGES
> +
> +config PAGE_SIZE_256KB
> + def_bool y
> + depends on PPC_256K_PAGES
> +
>   config PPC_PAGE_SHIFT
>   int
>   default 18 if PPC_256K_PAGES

[PATCH] tools/perf/test: Fix perf all PMU test to skip hv_24x7/hv_gpci tests on powerpc

2022-05-18 Thread Athira Rajeev
"perf all PMU test" picks the input events from
"perf list --raw-dump pmu" list and runs "perf stat -e"
for each of the event in the list. In case of powerpc, the
PowerVM environment supports events from hv_24x7 and hv_gpci
PMU which is of example format like below:
- hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
- hv_gpci/event,partition_id=?/

The value for "?" needs to be filled in depending on
system and respective event. CPM_ADJUNCT_INST needs have
core value and domain value. hv_gpci event needs partition_id.
Similarly, there are other events for hv_24x7 and hv_gpci
having "?" in event format. Hence skip these events on powerpc
platform since values like partition_id, domain is specific
to system and event.

Fixes: 3d5ac9effcc6 ("perf test: Workload test of all PMUs")
Signed-off-by: Athira Rajeev 
---
 tools/perf/tests/shell/stat_all_pmu.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/perf/tests/shell/stat_all_pmu.sh 
b/tools/perf/tests/shell/stat_all_pmu.sh
index b30dba455f36..4a854b545bec 100755
--- a/tools/perf/tests/shell/stat_all_pmu.sh
+++ b/tools/perf/tests/shell/stat_all_pmu.sh
@@ -5,6 +5,16 @@
 set -e
 
 for p in $(perf list --raw-dump pmu); do
+  # In powerpc, skip the events for hv_24x7 and hv_gpci.
+  # These events needs input values to be filled in for
+  # core, chip, patition id based on system.
+  # Example: hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
+  # hv_gpci/event,partition_id=?/
+  # Hence skip these events for ppc.
+  if lscpu  |grep ppc && echo "$p" |grep -Eq 'hv_24x7|hv_gpci' ; then
+echo "Skipping: Event '$p' in powerpc"
+continue
+  fi
   echo "Testing $p"
   result=$(perf stat -e "$p" true 2>&1)
   if ! echo "$result" | grep -q "$p" && ! echo "$result" | grep -q "" ; then
-- 
2.35.1



Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-18 Thread Naveen N. Rao

Baoquan He wrote:

On 05/18/22 at 12:26pm, Michael Ellerman wrote:


It seems that recordmcount is not really maintained anymore now that x86
uses objtool?

There've been several threads about fixing recordmcount, but none of
them seem to have lead to a solution.

These weak symbol vs recordmcount problems have been worked around going
back as far as 2020:


It gives me feeling that llvm or recordmcount should make adjustment,
but not innocent kernel code, if there are a lot of places reported.
I am curious how llvm or recordmcount dev respond to this.


As Michael stated, this is not just llvm - binutils has also adopted the 
same and "unused" section symbols are being dropped.


For recordmcount, there were a few threads and approaches that have been 
tried:

- 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.le...@csgroup.eu/
- https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=297434=*

Objtool has picked up a more appropriate fix for this recently, and 
long-term, we would like to move to using objtool for ftrace purposes:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/tools/objtool/elf.c?id=4abff6d48dbcea8200c7ea35ba70c242d128ebf3

While that is being pursued, we want to unbreak some of the CI and users 
who are hitting this.



- Naveen



[PATCH V2 2/2] docs: ABI: sysfs-bus-event_source-devices: Document sysfs caps entry for PMU

2022-05-18 Thread Athira Rajeev
Add ABI documentation for "caps" attribute group.
Some of the platform specific PMU features can be exposed
in "caps" attribute group/directory:
/sys/bus/event_source/devices//

Signed-off-by: Athira Rajeev 
---
 .../sysfs-bus-event_source-devices-caps| 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 
Documentation/ABI/testing/sysfs-bus-event_source-devices-caps

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-caps 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-caps
new file mode 100644
index ..ef5f537bdd83
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-caps
@@ -0,0 +1,18 @@
+What:  /sys/bus/event_source/devices//caps
+Date:  May 2022
+KernelVersion: 5.19
+Contact:   Linux kernel mailing list 
+Description:
+   Attribute group to describe the capabilities exposed
+   for a particular pmu. Each attribute of this group can
+   expose information specific to a PMU, say pmu_name, so that
+   userspace can understand some of the feature which the
+   platform specific PMU supports.
+
+   One of the example available capability in supported platform
+   like Intel is pmu_name, which exposes underlying CPU name known
+   to the PMU driver.
+
+   Example output in powerpc:
+   grep -H . /sys/bus/event_source/devices/cpu/caps/*
+   /sys/bus/event_source/devices/cpu/caps/pmu_name:POWER9
-- 
2.31.1



[PATCH V2 1/2] powerpc/perf: Add support for caps under sysfs in powerpc

2022-05-18 Thread Athira Rajeev
Add caps support under "/sys/bus/event_source/devices//"
for powerpc. This directory can be used to expose some of the
specific features that powerpc PMU supports to the user.
Example: pmu_name. The name of PMU registered will depend on
platform, say power9 or power10 or it could be Generic Compat
PMU.

Currently the only way to know which is the registered
PMU is from the dmesg logs. But clearing the dmesg will make it
difficult to know exact PMU backend used. And even extracting
from dmesg will be complicated, as we need  to parse the dmesg
logs and add filters for pmu name. Whereas by exposing it via
caps will make it easy as we just need to directly read it from
the sysfs.

Add a caps directory to /sys/bus/event_source/devices/cpu/
for power8, power9, power10 and generic compat PMU in respective
PMU driver code. Update the pmu_name file under caps folder
in core-book3s using "attr_update".

The information exposed currently:
 - pmu_name : Underlying PMU name from the driver

Example result with power9 pmu:

 # ls /sys/bus/event_source/devices/cpu/caps
pmu_name

 # cat /sys/bus/event_source/devices/cpu/caps/pmu_name
POWER9

Signed-off-by: Athira Rajeev 
---
Changelog:
 v1 -> v2:
 Move the show function as generic in core-book3s
 and update show function using sysfs_emit and ppmu->name
 Added Documention for this ABI in patch 2.
 Notes: The caps directory is implemented in PMU for other
 architectures already. Reference commit for x86:
 commit b00233b53065 ("perf/x86: Export some PMU attributes in caps/ directory")

 arch/powerpc/perf/core-book3s.c| 31 ++
 arch/powerpc/perf/generic-compat-pmu.c | 10 +
 arch/powerpc/perf/power10-pmu.c| 10 +
 arch/powerpc/perf/power8-pmu.c | 10 +
 arch/powerpc/perf/power9-pmu.c | 10 +
 5 files changed, 71 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b5b42cf0a703..a208f502a80b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2488,6 +2488,33 @@ static int power_pmu_prepare_cpu(unsigned int cpu)
return 0;
 }
 
+static ssize_t pmu_name_show(struct device *cdev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   if (ppmu)
+   return sysfs_emit(buf, "%s\n", ppmu->name);
+
+   return 0;
+}
+
+static DEVICE_ATTR_RO(pmu_name);
+
+static struct attribute *pmu_caps_attrs[] = {
+   _attr_pmu_name.attr,
+   NULL
+};
+
+static const struct attribute_group pmu_caps_group = {
+   .name  = "caps",
+   .attrs = pmu_caps_attrs,
+};
+
+static const struct attribute_group *pmu_caps_groups[] = {
+   _caps_group,
+   NULL,
+};
+
 int __init register_power_pmu(struct power_pmu *pmu)
 {
if (ppmu)
@@ -2498,6 +2525,10 @@ int __init register_power_pmu(struct power_pmu *pmu)
pmu->name);
 
power_pmu.attr_groups = ppmu->attr_groups;
+
+   if (ppmu->flags & PPMU_ARCH_207S)
+   power_pmu.attr_update = pmu_caps_groups;
+
power_pmu.capabilities |= (ppmu->capabilities & 
PERF_PMU_CAP_EXTENDED_REGS);
 
 #ifdef MSR_HV
diff --git a/arch/powerpc/perf/generic-compat-pmu.c 
b/arch/powerpc/perf/generic-compat-pmu.c
index f3db88aee4dd..817c69863038 100644
--- a/arch/powerpc/perf/generic-compat-pmu.c
+++ b/arch/powerpc/perf/generic-compat-pmu.c
@@ -151,9 +151,19 @@ static const struct attribute_group 
generic_compat_pmu_format_group = {
.attrs = generic_compat_pmu_format_attr,
 };
 
+static struct attribute *generic_compat_pmu_caps_attrs[] = {
+   NULL
+};
+
+static struct attribute_group generic_compat_pmu_caps_group = {
+   .name  = "caps",
+   .attrs = generic_compat_pmu_caps_attrs,
+};
+
 static const struct attribute_group *generic_compat_pmu_attr_groups[] = {
_compat_pmu_format_group,
_compat_pmu_events_group,
+   _compat_pmu_caps_group,
NULL,
 };
 
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index c6d51e7093cf..d1adcd9f52e2 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -258,6 +258,15 @@ static const struct attribute_group 
power10_pmu_format_group = {
.attrs = power10_pmu_format_attr,
 };
 
+static struct attribute *power10_pmu_caps_attrs[] = {
+   NULL
+};
+
+static struct attribute_group power10_pmu_caps_group = {
+   .name  = "caps",
+   .attrs = power10_pmu_caps_attrs,
+};
+
 static const struct attribute_group *power10_pmu_attr_groups_dd1[] = {
_pmu_format_group,
_pmu_events_group_dd1,
@@ -267,6 +276,7 @@ static const struct attribute_group 
*power10_pmu_attr_groups_dd1[] = {
 static const struct attribute_group *power10_pmu_attr_groups[] = {
_pmu_format_group,
_pmu_events_group,
+   _pmu_caps_group,
NULL,
 };
 
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 

[PATCH] powerpc/irq: remove inline assembly in hard_irq_disable macro

2022-05-18 Thread Christophe Leroy
Use WRITE_ONCE() instead of opencoding the saving of current
stack pointeur.

Signed-off-by: Christophe Leroy 
---
By the way, is WRITE_ONCE() needed at all ? Could we instead do 
local_paca->saved_r1 = current_stack_pointer;
---
 arch/powerpc/include/asm/hw_irq.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 6efab00aa1c8..26ede09c521d 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -282,9 +282,7 @@ static inline bool pmi_irq_pending(void)
flags = irq_soft_mask_set_return(IRQS_ALL_DISABLED);\
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;  \
if (!arch_irqs_disabled_flags(flags)) { \
-   asm ("stdx %%r1, 0, %1 ;"   \
-: "=m" (local_paca->saved_r1)  \
-: "b" (_paca->saved_r1));\
+   WRITE_ONCE(local_paca->saved_r1, current_stack_pointer);\
trace_hardirqs_off();   \
}   \
 } while(0)
-- 
2.35.3



[PATCH 2/2] powerpc/irq: Replace #ifdefs by IS_ENABLED()

2022-05-18 Thread Christophe Leroy
Replace
  #ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG and
  #ifdef CONFIG_PERF_EVENTS
by IS_ENABLED() in hw_irq.h and plpar_wrappers.h

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hw_irq.h | 30 +++
 arch/powerpc/include/asm/plpar_wrappers.h |  5 ++--
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index edc569481faf..6efab00aa1c8 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -123,7 +123,6 @@ static inline notrace unsigned long 
irq_soft_mask_return(void)
  */
 static inline notrace void irq_soft_mask_set(unsigned long mask)
 {
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
/*
 * The irq mask must always include the STD bit if any are set.
 *
@@ -138,8 +137,8 @@ static inline notrace void irq_soft_mask_set(unsigned long 
mask)
 * unmasks to be replayed, among other things. For now, take
 * the simple approach.
 */
-   WARN_ON(mask && !(mask & IRQS_DISABLED));
-#endif
+   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+   WARN_ON(mask && !(mask & IRQS_DISABLED));
 
WRITE_ONCE(local_paca->irq_soft_mask, mask);
barrier();
@@ -324,11 +323,13 @@ bool power_pmu_wants_prompt_pmi(void);
  */
 static inline bool should_hard_irq_enable(void)
 {
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-   WARN_ON(irq_soft_mask_return() == IRQS_ENABLED);
-   WARN_ON(mfmsr() & MSR_EE);
-#endif
-#ifdef CONFIG_PERF_EVENTS
+   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
+   WARN_ON(irq_soft_mask_return() == IRQS_ENABLED);
+   WARN_ON(mfmsr() & MSR_EE);
+   }
+
+   if (!IS_ENABLED(CONFIG_PERF_EVENTS))
+   return false;
/*
 * If the PMU is not running, there is not much reason to enable
 * MSR[EE] in irq handlers because any interrupts would just be
@@ -343,9 +344,6 @@ static inline bool should_hard_irq_enable(void)
return false;
 
return true;
-#else
-   return false;
-#endif
 }
 
 /*
@@ -353,11 +351,11 @@ static inline bool should_hard_irq_enable(void)
  */
 static inline void do_hard_irq_enable(void)
 {
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-   WARN_ON(irq_soft_mask_return() == IRQS_ENABLED);
-   WARN_ON(get_paca()->irq_happened & PACA_IRQ_MUST_HARD_MASK);
-   WARN_ON(mfmsr() & MSR_EE);
-#endif
+   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
+   WARN_ON(irq_soft_mask_return() == IRQS_ENABLED);
+   WARN_ON(get_paca()->irq_happened & PACA_IRQ_MUST_HARD_MASK);
+   WARN_ON(mfmsr() & MSR_EE);
+   }
/*
 * This allows PMI interrupts (and watchdog soft-NMIs) through.
 * There is no other reason to enable this way.
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 83e0f701ebc6..8239c0af5eb2 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -43,11 +43,10 @@ static inline long extended_cede_processor(unsigned long 
latency_hint)
set_cede_latency_hint(latency_hint);
 
rc = cede_processor();
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
+
/* Ensure that H_CEDE returns with IRQs on */
-   if (WARN_ON(!(mfmsr() & MSR_EE)))
+   if (WARN_ON(IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && !(mfmsr() & 
MSR_EE)))
__hard_irq_enable();
-#endif
 
set_cede_latency_hint(old_latency_hint);
 
-- 
2.35.3



[PATCH 1/2] powerpc/irq: Don't open code irq_soft_mask helpers

2022-05-18 Thread Christophe Leroy
Use READ_ONCE() and WRITE_ONCE() instead of open coding
read and write of local PACA irq_soft_mask.

For the write, add a barrier to keep the memory clobber
that was there previously.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hw_irq.h | 43 +--
 1 file changed, 7 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 674e5aaafcbd..edc569481faf 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -113,14 +113,7 @@ static inline void __hard_RI_enable(void)
 
 static inline notrace unsigned long irq_soft_mask_return(void)
 {
-   unsigned long flags;
-
-   asm volatile(
-   "lbz %0,%1(13)"
-   : "=r" (flags)
-   : "i" (offsetof(struct paca_struct, irq_soft_mask)));
-
-   return flags;
+   return READ_ONCE(local_paca->irq_soft_mask);
 }
 
 /*
@@ -148,46 +141,24 @@ static inline notrace void irq_soft_mask_set(unsigned 
long mask)
WARN_ON(mask && !(mask & IRQS_DISABLED));
 #endif
 
-   asm volatile(
-   "stb %0,%1(13)"
-   :
-   : "r" (mask),
- "i" (offsetof(struct paca_struct, irq_soft_mask))
-   : "memory");
+   WRITE_ONCE(local_paca->irq_soft_mask, mask);
+   barrier();
 }
 
 static inline notrace unsigned long irq_soft_mask_set_return(unsigned long 
mask)
 {
-   unsigned long flags;
-
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-   WARN_ON(mask && !(mask & IRQS_DISABLED));
-#endif
+   unsigned long flags = irq_soft_mask_return();
 
-   asm volatile(
-   "lbz %0,%1(13); stb %2,%1(13)"
-   : "=" (flags)
-   : "i" (offsetof(struct paca_struct, irq_soft_mask)),
- "r" (mask)
-   : "memory");
+   irq_soft_mask_set(mask);
 
return flags;
 }
 
 static inline notrace unsigned long irq_soft_mask_or_return(unsigned long mask)
 {
-   unsigned long flags, tmp;
-
-   asm volatile(
-   "lbz %0,%2(13); or %1,%0,%3; stb %1,%2(13)"
-   : "=" (flags), "=r" (tmp)
-   : "i" (offsetof(struct paca_struct, irq_soft_mask)),
- "r" (mask)
-   : "memory");
+   unsigned long flags = irq_soft_mask_return();
 
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-   WARN_ON((mask | flags) && !((mask | flags) & IRQS_DISABLED));
-#endif
+   irq_soft_mask_set(flags | mask);
 
return flags;
 }
-- 
2.35.3



Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-18 Thread Baoquan He
On 05/18/22 at 12:26pm, Michael Ellerman wrote:
> "Eric W. Biederman"  writes:
> > Looking at this the pr_err is absolutely needed.  If an unsupported case
> > winds up in the purgatory blob and the code can't handle it things
> > will fail silently much worse later.
> 
> It won't fail later, it will fail the syscall.
> 
> sys_kexec_file_load()
>   kimage_file_alloc_init()
> kimage_file_prepare_segments()
>   arch_kexec_kernel_image_load()
> kexec_image_load_default()
>   image->fops->load()
> elf64_load()# powerpc
> bzImage64_load()# x86
>   kexec_load_purgatory()
> kexec_apply_relocations()
> 
> Which does:
> 
>   if (relsec->sh_type == SHT_RELA)
>   ret = arch_kexec_apply_relocations_add(pi, section,
>  relsec, symtab);
>   else if (relsec->sh_type == SHT_REL)
>   ret = arch_kexec_apply_relocations(pi, section,
>  relsec, symtab);
>   if (ret)
>   return ret;
> 
> And that error is bubbled all the way back up. So as long as
> arch_kexec_apply_relocations() returns an error the syscall will fail
> back to userspace and there'll be an error message at that level.
> 
> It's true that having nothing printed in dmesg makes it harder to work
> out why the syscall failed. But it's a kernel bug if there are unhandled
> relocations in the kernel-supplied purgatory code, so a user really has
> no way to do anything about the error even if it is printed.
> 
> > "Naveen N. Rao"  writes:
> >
> >> Baoquan He wrote:
> >>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>  kexec_load_purgatory() can fail for many reasons - there is no need to
>  print an error when encountering unsupported relocations.
>  This solves a build issue on powerpc with binutils v2.36 and newer [1].
>  Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>  symbols") [2], binutils started dropping section symbols that it thought
> >>> I am not familiar with binutils, while wondering if this exists in other
> >>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
> >>> have problem with it?
> >>
> >> I'm not aware of this specific file causing a problem on other 
> >> architectures -
> >> perhaps the config options differ enough. There are however more reports of
> >> similar issues affecting other architectures with the llvm integrated 
> >> assembler:
> >> https://github.com/ClangBuiltLinux/linux/issues/981
> >>
> >>>
>  were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>  is placing kexec_arch_apply_relocations[_add] into a separate
>  .text.unlikely section and the section symbol ".text.unlikely" is being
>  dropped. Due to this, recordmcount is unable to find a non-weak symbol
> >>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
> >>
> >> Yes. Note that it is just the section symbol that gets dropped. The 
> >> section is
> >> still present and will continue to hold the symbols for the functions
> >> themselves.
> >
> > So we have a case where binutils thinks it is doing something useful
> > and our kernel specific tool gets tripped up by it.
> 
> It's not just binutils, the LLVM assembler has the same behavior.
> 
> > Reading the recordmcount code it looks like it is finding any symbol
> > within a section but ignoring weak symbols.  So I suspect the only
> > remaining symbol in the section is __weak and that confuses
> > recordmcount.
> >
> > Does removing the __weak annotation on those functions fix the build
> > error?  If so we can restructure the kexec code to simply not use __weak
> > symbols.
> >
> > Otherwise the fix needs to be in recordmcount or binutils, and we should
> > loop whoever maintains recordmcount in to see what they can do.
> 
> It seems that recordmcount is not really maintained anymore now that x86
> uses objtool?
> 
> There've been several threads about fixing recordmcount, but none of
> them seem to have lead to a solution.
> 
> These weak symbol vs recordmcount problems have been worked around going
> back as far as 2020:

It gives me feeling that llvm or recordmcount should make adjustment,
but not innocent kernel code, if there are a lot of places reported.
I am curious how llvm or recordmcount dev respond to this.

> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9
> 
> cheers
> 



[PATCH 1/2] powerpc/irq: Split irq.c

2022-05-18 Thread Christophe Leroy
More than half of irq.c is dedicated to PPC64.

Move PPC64 code out of irq.c into irq_64.c

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/Makefile|   2 +-
 arch/powerpc/kernel/irq.c   | 421 
 arch/powerpc/kernel/{irq.c => irq_64.c} | 331 ---
 3 files changed, 1 insertion(+), 753 deletions(-)
 copy arch/powerpc/kernel/{irq.c => irq_64.c} (60%)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 4ddd161aef32..611e9787a74c 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -50,7 +50,7 @@ obj-y := cputable.o syscalls.o \
   hw_breakpoint_constraints.o interrupt.o \
   kdebugfs.o stacktrace.o
 obj-y  += ptrace/
-obj-$(CONFIG_PPC64)+= setup_64.o \
+obj-$(CONFIG_PPC64)+= setup_64.o irq_64.o\
   paca.o nvram_64.o note.o
 obj-$(CONFIG_COMPAT)   += sys_ppc32.o signal_32.o
 obj-$(CONFIG_VDSO32)   += vdso32_wrapper.o
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index dd09919c3c66..873e6dffb868 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -66,12 +66,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC64
-#include 
-#include 
-#include 
-#include 
-#endif
 #define CREATE_TRACE_POINTS
 #include 
 #include 
@@ -88,411 +82,6 @@ u32 tau_interrupts(unsigned long cpu);
 #endif
 #endif /* CONFIG_PPC32 */
 
-#ifdef CONFIG_PPC64
-
-int distribute_irqs = 1;
-
-static inline notrace unsigned long get_irq_happened(void)
-{
-   unsigned long happened;
-
-   __asm__ __volatile__("lbz %0,%1(13)"
-   : "=r" (happened) : "i" (offsetof(struct paca_struct, irq_happened)));
-
-   return happened;
-}
-
-void replay_soft_interrupts(void)
-{
-   struct pt_regs regs;
-
-   /*
-* Be careful here, calling these interrupt handlers can cause
-* softirqs to be raised, which they may run when calling irq_exit,
-* which will cause local_irq_enable() to be run, which can then
-* recurse into this function. Don't keep any state across
-* interrupt handler calls which may change underneath us.
-*
-* We use local_paca rather than get_paca() to avoid all the
-* debug_smp_processor_id() business in this low level function.
-*/
-
-   ppc_save_regs();
-   regs.softe = IRQS_ENABLED;
-   regs.msr |= MSR_EE;
-
-again:
-   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
-   WARN_ON_ONCE(mfmsr() & MSR_EE);
-
-   /*
-* Force the delivery of pending soft-disabled interrupts on PS3.
-* Any HV call will have this side effect.
-*/
-   if (firmware_has_feature(FW_FEATURE_PS3_LV1)) {
-   u64 tmp, tmp2;
-   lv1_get_version_info(, );
-   }
-
-   /*
-* Check if an hypervisor Maintenance interrupt happened.
-* This is a higher priority interrupt than the others, so
-* replay it first.
-*/
-   if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (local_paca->irq_happened & 
PACA_IRQ_HMI)) {
-   local_paca->irq_happened &= ~PACA_IRQ_HMI;
-   regs.trap = INTERRUPT_HMI;
-   handle_hmi_exception();
-   if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
-   hard_irq_disable();
-   }
-
-   if (local_paca->irq_happened & PACA_IRQ_DEC) {
-   local_paca->irq_happened &= ~PACA_IRQ_DEC;
-   regs.trap = INTERRUPT_DECREMENTER;
-   timer_interrupt();
-   if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
-   hard_irq_disable();
-   }
-
-   if (local_paca->irq_happened & PACA_IRQ_EE) {
-   local_paca->irq_happened &= ~PACA_IRQ_EE;
-   regs.trap = INTERRUPT_EXTERNAL;
-   do_IRQ();
-   if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
-   hard_irq_disable();
-   }
-
-   if (IS_ENABLED(CONFIG_PPC_DOORBELL) && (local_paca->irq_happened & 
PACA_IRQ_DBELL)) {
-   local_paca->irq_happened &= ~PACA_IRQ_DBELL;
-   regs.trap = INTERRUPT_DOORBELL;
-   doorbell_exception();
-   if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
-   hard_irq_disable();
-   }
-
-   /* Book3E does not support soft-masking PMI interrupts */
-   if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (local_paca->irq_happened & 
PACA_IRQ_PMI)) {
-   local_paca->irq_happened &= ~PACA_IRQ_PMI;
-   regs.trap = INTERRUPT_PERFMON;
-   performance_monitor_exception();
-   if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
-   hard_irq_disable();
-   }
-
-   if (local_paca->irq_happened & 

[PATCH 2/2] powerpc/irq64: Remove get_irq_happened()

2022-05-18 Thread Christophe Leroy
No need to open code the read of local_paca->irq_happened in
assembly, we have READ_ONCE() for doing the same.

Replace get_irq_happened() by READ_ONCE(local_paca->irq_happened).

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/irq_64.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/irq_64.c b/arch/powerpc/kernel/irq_64.c
index 488f6eb4553d..0f0e8bcb368c 100644
--- a/arch/powerpc/kernel/irq_64.c
+++ b/arch/powerpc/kernel/irq_64.c
@@ -67,16 +67,6 @@
 
 int distribute_irqs = 1;
 
-static inline notrace unsigned long get_irq_happened(void)
-{
-   unsigned long happened;
-
-   __asm__ __volatile__("lbz %0,%1(13)"
-   : "=r" (happened) : "i" (offsetof(struct paca_struct, irq_happened)));
-
-   return happened;
-}
-
 void replay_soft_interrupts(void)
 {
struct pt_regs regs;
@@ -233,7 +223,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
return;
 
 happened:
-   irq_happened = get_irq_happened();
+   irq_happened = READ_ONCE(local_paca->irq_happened);
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
WARN_ON_ONCE(!irq_happened);
 
@@ -256,7 +246,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 * IRQ_HARD_DIS again and warn if it is still
 * clear.
 */
-   irq_happened = get_irq_happened();
+   irq_happened = 
READ_ONCE(local_paca->irq_happened);
WARN_ON_ONCE(!(irq_happened & 
PACA_IRQ_HARD_DIS));
}
}
-- 
2.35.3



Re: [PATCH kernel] KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent

2022-05-18 Thread Alexey Kardashevskiy




On 5/4/22 17:48, Alexey Kardashevskiy wrote:

When introduced, IRQFD resampling worked on POWER8 with XICS. However
KVM on POWER9 has never implemented it - the compatibility mode code
("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
XIVE mode does not handle INTx in KVM at all.

This moved the capability support advertising to platforms and stops
advertising it on XIVE, i.e. POWER9 and later.

Signed-off-by: Alexey Kardashevskiy 
---


Or I could move this one together with KVM_CAP_IRQFD. Thoughts?



Ping?



---
  arch/arm64/kvm/arm.c   | 3 +++
  arch/mips/kvm/mips.c   | 3 +++
  arch/powerpc/kvm/powerpc.c | 6 ++
  arch/riscv/kvm/vm.c| 3 +++
  arch/s390/kvm/kvm-s390.c   | 3 +++
  arch/x86/kvm/x86.c | 3 +++
  virt/kvm/kvm_main.c| 1 -
  7 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..092f0614bae3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -210,6 +210,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+   case KVM_CAP_IRQFD_RESAMPLE:
+#endif
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..0f3de470a73e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1071,6 +1071,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_READONLY_MEM:
case KVM_CAP_SYNC_MMU:
case KVM_CAP_IMMEDIATE_EXIT:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+   case KVM_CAP_IRQFD_RESAMPLE:
+#endif
r = 1;
break;
case KVM_CAP_NR_VCPUS:
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 875c30c12db0..87698ffef3be 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -591,6 +591,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
  #endif
  
+#ifdef CONFIG_HAVE_KVM_IRQFD

+   case KVM_CAP_IRQFD_RESAMPLE:
+   r = !xive_enabled();
+   break;
+#endif
+
case KVM_CAP_PPC_ALLOC_HTAB:
r = hv_enabled;
break;
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index c768f75279ef..b58579b386bb 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -63,6 +63,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_READONLY_MEM:
case KVM_CAP_MP_STATE:
case KVM_CAP_IMMEDIATE_EXIT:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+   case KVM_CAP_IRQFD_RESAMPLE:
+#endif
r = 1;
break;
case KVM_CAP_NR_VCPUS:
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 156d1c25a3c1..85e093fc8d13 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -564,6 +564,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_S390_DIAG318:
case KVM_CAP_S390_MEM_OP_EXTENSION:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+   case KVM_CAP_IRQFD_RESAMPLE:
+#endif
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c0ca599a353..a0a7b769483d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4273,6 +4273,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_SYS_ATTRIBUTES:
case KVM_CAP_VAPIC:
case KVM_CAP_ENABLE_CAP:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+   case KVM_CAP_IRQFD_RESAMPLE:
+#endif
r = 1;
break;
case KVM_CAP_EXIT_HYPERCALL:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70e05af5ebea..885e72e668a5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4293,7 +4293,6 @@ static long kvm_vm_ioctl_check_extension_generic(struct 
kvm *kvm, long arg)
  #endif
  #ifdef CONFIG_HAVE_KVM_IRQFD
case KVM_CAP_IRQFD:
-   case KVM_CAP_IRQFD_RESAMPLE:
  #endif
case KVM_CAP_IOEVENTFD_ANY_LENGTH:
case KVM_CAP_CHECK_EXTENSION_VM:


--
Alexey