Re: [PATCH] powerpc/bug: Cast to unsigned long before passing to inline asm

2021-08-31 Thread Segher Boessenkool
Hi!

On Tue, Aug 31, 2021 at 11:27:20PM +1000, Michael Ellerman wrote:
> Nathan filed an LLVM bug [2], in which Eli Friedman explained that "if
> you pass a value of a type that's narrower than a register to an inline
> asm, the high bits are undefined". In this case we are passing a bool
> to the inline asm, which is a single bit value, and so the compiler
> thinks it can leave the high bits of r30 unmasked, because only the
> value of bit 0 matters.
> 
> Because the inline asm compares the full width of the register (32 or
> 64-bit) we need to ensure the value passed to the inline asm has all
> bits defined. So fix it by casting to long.
> 
> We also already cast to long for the similar case in BUG_ENTRY(), which
> it turns out was added to fix a similar bug in 2005 in commit
> 32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels").

That points to , which shows the correct
explanation.

The code as it was did **not** pass a bool.  It perhaps passed an int
(so many macros in play, it is hard to tell for sure, but it is int or
long int, perhaps unsigned (which does not change anything here).  But
td wants a 64-bit register, not a 32-bit one (which are the only two
possibilities for the "r" constraint on PowerPC).  The cast to "long" is
fine for powerpc64, but it has nothing to do with "narrower than a
register".

If this is not the correct explanation for LLVM, it needs to solve some
other bug.


Segher


Re: [PATCH v6 00/11] DDW + Indirect Mapping

2021-08-31 Thread Leonardo Brás
On Tue, 2021-08-31 at 13:39 -0700, David Christensen wrote:
> > 
> > This series allow Indirect DMA using DDW when available, which
> > usually
> > means bigger pagesizes and more TCEs, and so more DMA space.
> 
> How is the mapping method selected?  LPAR creation via the HMC, Linux
> kernel load parameter, or some other method?

At device/bus probe, if there is enough DMA space available for Direct
DMA, then it's used. If not, it uses indirect DMA.

> 
> The hcall overhead doesn't seem too worrisome when mapping 1GB pages
> so 
> the Indirect DMA method might be best in my situation (DPDK).

Well, it depends on usage.
I mean, the recommended use of IOMMU is to map, transmit and then
unmap, but this will vary on the implementation of the driver.

If, for example, there is some reuse of the DMA mapping, as in a
previous patchset I sent (IOMMU Pagecache), then the hcall overhead can
be reduced drastically.

> 
> Dave
Best regards,
Leonardo



Re: [PATCH v6 00/11] DDW + Indirect Mapping

2021-08-31 Thread David Christensen




On 8/31/21 1:18 PM, Leonardo Brás wrote:

Hello David,

Sorry for the delay, I did not get your mail because I was not CC'd
in your reply (you sent the mail just to the mailing list).

Replies bellow:

On Mon, 2021-08-30 at 10:48 -0700, David Christensen wrote:

On 8/16/21 11:39 PM, Leonardo Bras wrote:

So far it's assumed possible to map the guest RAM 1:1 to the bus,
which
works with a small number of devices. SRIOV changes it as the user
can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW
creation
is skipped and the default DMA window "ibm,dma-window" is used
instead.

Using the DDW instead of the default DMA window may allow to expand
the
amount of memory that can be DMA-mapped, given the number of pages
(TCEs)
may stay the same (or increase) and the default DMA window offers
only
4k-pages while DDW may offer larger pages (4k, 64k, 16M ...).


So if I'm reading this correctly, VFIO applications requiring hugepage
DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR
after this change, is that correct?


Different DDW IOMMU page sizes were already supported in Linux (4k,
64k, 16M) for a while now, and the remaining page sizes in LoPAR were
enabled in the following patch:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/
(commit 472724111f0f72042deb6a9dcee9578e5398a1a1)

The thing is there are two ways of using DMA:
- Direct DMA, mapping the whole memory space of the host, which
requires a lot of DMA space if the guest memory is huge. This already
supports DDW and allows using the bigger pagesizes.
This happens on device/bus probe.

- Indirect DMA with IOMMU, mapping memory regions on demand, and un-
mapping after use. This requires much less DMA space, but causes an
overhead because an hcall is necessary for mapping and un-mapping.
Before this series, Indirect DMA was only possible with the 'default
DMA window' which allows using only 4k pages.

This series allow Indirect DMA using DDW when available, which usually
means bigger pagesizes and more TCEs, and so more DMA space.


How is the mapping method selected?  LPAR creation via the HMC, Linux 
kernel load parameter, or some other method?


The hcall overhead doesn't seem too worrisome when mapping 1GB pages so 
the Indirect DMA method might be best in my situation (DPDK).


Dave


Re: [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode

2021-08-31 Thread Leonardo Brás
Hello Alexey,

On Fri, 2021-08-27 at 14:07 +1000, Alexey Kardashevskiy wrote:
> Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated
> only when needed. This allows skipping any update when clearing TCEs.
> This works mostly fine as TCE updates are handled when MMU is enabled.
> The realmode handlers fail with H_TOO_HARD when pages are not yet
> allocated except when clearing a TCE in which case KVM prints a warning
> but proceeds to dereference a NULL pointer which crashes the host OS.
> 
> This has not been caught so far as the change is reasonably new,
> POWER9 runs mostly radix which does not use realmode handlers.
> With hash, the default TCE table is memset() by QEMU the machine reset
> which triggers page faults and the KVM TCE device's
> kvm_spapr_tce_fault()
> handles those with MMU on. And the huge DMA windows are not cleared
> by VMs whicn instead successfully create a DMA window big enough to map
> the VM memory 1:1 and then VMs just map everything without clearing.
> 
> This started crashing now as upcoming sriov-under-powervm support added
> a mode when a dymanic DMA window not big enough to map the VM memory
> 1:1
> but it is used anyway, and the VM now is the first (i.e. not QEMU) to
> clear a just created table. Note that the upstream QEMU needs to be
> modified to trigger the VM to trigger the host OS crash.
> 
> This replaces WARN_ON_ONCE_RM() with a check and return.
> This adds another warning if TCE is not being cleared.
> 
> Cc: Leonardo Bras 
> Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand
> too")
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> With recent changes in the printk() department, calling pr_err() when
> MMU
> off causes lockdep lockups which I did not dig any further so we should
> start getting rid of the realmode's WARN_ON_ONCE_RM().
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 083a4e037718..e5ba96c41f3f 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct
> kvmppc_spapr_tce_table *stt,
> idx -= stt->offset;
> page = stt->pages[idx / TCES_PER_PAGE];
> /*
> -    * page must not be NULL in real mode,
> -    * kvmppc_rm_ioba_validate() must have taken care of this.
> +    * kvmppc_rm_ioba_validate() allows pages not be allocated if
> TCE is
> +    * being cleared, otherwise it returns H_TOO_HARD and we skip
> this.
>  */
> -   WARN_ON_ONCE_RM(!page);
> +   if (!page) {
> +   WARN_ON_ONCE_RM(tce != 0);
> +   return;
> +   }
> tbl = kvmppc_page_address(page);
>  
> tbl[idx % TCES_PER_PAGE] = tce;

That looks reasonable IMHO.

FWIW:
Reviewed-by: Leonardo Bras 



Re: [PATCH v6 00/11] DDW + Indirect Mapping

2021-08-31 Thread Leonardo Brás
Hello David,

Sorry for the delay, I did not get your mail because I was not CC'd
in your reply (you sent the mail just to the mailing list).

Replies bellow:

On Mon, 2021-08-30 at 10:48 -0700, David Christensen wrote:
> On 8/16/21 11:39 PM, Leonardo Bras wrote:
> > So far it's assumed possible to map the guest RAM 1:1 to the bus,
> > which
> > works with a small number of devices. SRIOV changes it as the user
> > can
> > configure hundreds VFs and since phyp preallocates TCEs and does not
> > allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> > per a PE to limit waste of physical pages.
> > 
> > As of today, if the assumed direct mapping is not possible, DDW
> > creation
> > is skipped and the default DMA window "ibm,dma-window" is used
> > instead.
> > 
> > Using the DDW instead of the default DMA window may allow to expand
> > the
> > amount of memory that can be DMA-mapped, given the number of pages
> > (TCEs)
> > may stay the same (or increase) and the default DMA window offers
> > only
> > 4k-pages while DDW may offer larger pages (4k, 64k, 16M ...).
> 
> So if I'm reading this correctly, VFIO applications requiring hugepage 
> DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR 
> after this change, is that correct?  

Different DDW IOMMU page sizes were already supported in Linux (4k,
64k, 16M) for a while now, and the remaining page sizes in LoPAR were
enabled in the following patch:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/
(commit 472724111f0f72042deb6a9dcee9578e5398a1a1)

The thing is there are two ways of using DMA:
- Direct DMA, mapping the whole memory space of the host, which
requires a lot of DMA space if the guest memory is huge. This already
supports DDW and allows using the bigger pagesizes.
This happens on device/bus probe.

- Indirect DMA with IOMMU, mapping memory regions on demand, and un-
mapping after use. This requires much less DMA space, but causes an
overhead because an hcall is necessary for mapping and un-mapping.
Before this series, Indirect DMA was only possible with the 'default
DMA window' which allows using only 4k pages. 

This series allow Indirect DMA using DDW when available, which usually
means bigger pagesizes and more TCEs, and so more DMA space.


tl;dr this patchset means you can have more DMA space in Indirect DMA,
because you are using DDW instead of the Default DMA window.

> Any limitations based on processor 
> or pHyp revision levels?

The IOMMU page size will be limited by the sizes offered by processor
and hypervisor. They are announced at  "IO Page Sizes" output of
ibm,query-pe-dma-window, but now the biggest pagesize automatically
selected with commit 472724111f0f72042deb6a9dcee9578e5398a1a1 above
mentioned. 

Hope this helps, please let me know if there is any remaining question.

Best regards,
Leonardo



Re: [PATCH] powerpc/ptdump: Fix generic ptdump for 64-bit

2021-08-31 Thread Nathan Chancellor
On Tue, Aug 31, 2021 at 11:51:51PM +1000, Michael Ellerman wrote:
> Since the conversion to generic ptdump we see crashes on 64-bit:
> 
>   BUG: Unable to handle kernel data access on read at 0xc0eeff7f
>   Faulting instruction address: 0xc045e5fc
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   ...
>   NIP __walk_page_range+0x2bc/0xce0
>   LR  __walk_page_range+0x240/0xce0
>   Call Trace:
> __walk_page_range+0x240/0xce0 (unreliable)
> walk_page_range_novma+0x74/0xb0
> ptdump_walk_pgd+0x98/0x170
> ptdump_check_wx+0x88/0xd0
> mark_rodata_ro+0x48/0x80
> kernel_init+0x74/0x1a0
> ret_from_kernel_thread+0x5c/0x64
> 
> What's happening is that have walked off the end of the kernel page
> tables, and started dereferencing junk values.
> 
> That happens because we initialised the ptdump_range to span all the way
> up to 0x:
> 
> static struct ptdump_range ptdump_range[] __ro_after_init = {
>   {TASK_SIZE_MAX, ~0UL},
> 
> But the kernel page tables don't span that far. So on 64-bit set the end
> of the range to be the address immediately past the end of the kernel
> page tables, to limit the page table walk to valid addresses.
> 
> Fixes: e084728393a5 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
> Reported-by: Nathan Chancellor 
> Signed-off-by: Michael Ellerman 

Tested-by: Nathan Chancellor 

> ---
>  arch/powerpc/mm/ptdump/ptdump.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index 2d80d775d15e..bf251191e78d 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -359,6 +359,8 @@ static int __init ptdump_init(void)
>   ptdump_range[0].start = KERN_VIRT_START;
>   else
>   ptdump_range[0].start = PAGE_OFFSET;
> +
> + ptdump_range[0].end = PAGE_OFFSET + (PGDIR_SIZE * PTRS_PER_PGD);
>  #endif
>  
>   populate_markers();
> 
> base-commit: e1ab9a730b426fadc018f91b7c98412473e542fb
> prerequisite-patch-id: 942553bda7d83bbae8bf6b2b718033d488ee2410
> prerequisite-patch-id: a14c44e671eba8648c4fe385a2552fd57875ec8a
> prerequisite-patch-id: 94f5c890f54da2b46f06c60562e879171fab2be3
> prerequisite-patch-id: 330af32f2aa34a432d450acc9f6e9fd1cec96417
> prerequisite-patch-id: b46c65afa63944f3fb02f4b9bdf940507bb25de6
> prerequisite-patch-id: c4ba00ee949f70d7745f75bad11bbb2416f329f1
> prerequisite-patch-id: f479601944d0aa615716d5349d93bd6e3d5619c1
> prerequisite-patch-id: 9523cde933393b2d68648cecb740efdba9dd8601
> prerequisite-patch-id: 034afc97c841a6dcd2b9932406f391d65d18bf87
> prerequisite-patch-id: effd7ac8a7db6b59a2677c9c3a7ef8b3ef8bdaf8
> prerequisite-patch-id: 23883cf116ee69b452db3c6e10dd49e756e7b5d5
> prerequisite-patch-id: 37b6695321c96db466b0faba9308bacfb79c7ced
> prerequisite-patch-id: 83420e68ca4476c9ba5a67aa19e1fdc0b6d656a4
> prerequisite-patch-id: 362219acf820b78b83c6c09071a636b28976a1ce
> prerequisite-patch-id: 857513c5f431887d16a59d193834dcec636c73dc
> prerequisite-patch-id: 49f6879a819e205b5361280ab923664fcd29daaf
> prerequisite-patch-id: 5a37bcf70c5cb44d78de63a64e5ce920a0a7e419
> prerequisite-patch-id: 2c06dd3833117b0498baa198694f6c7e84975840
> prerequisite-patch-id: 5794a211ebbf7f0d416ae882443201621c00f615
> prerequisite-patch-id: 19ed5ae34e233079c7f66376b8d309cac2b57dbc
> prerequisite-patch-id: 1d4c82277473e8dbecf83faf6c4a6788538b064d
> prerequisite-patch-id: 8cb5ecc4fe23dafb4a43192f93b669c80a548985
> prerequisite-patch-id: 763b8d98c3aefd120862154b94814e3ef3578b5c
> prerequisite-patch-id: f45e04e6d030eb157be550976b07dc891fa0836d
> prerequisite-patch-id: 07b6fb682675845aca694deff1847bc7a40e1fec
> prerequisite-patch-id: 7f1082effa12b1eba445cef90e4749155662888c
> prerequisite-patch-id: 76743814dd8e6151c27676ae2e318579d658bf8b
> prerequisite-patch-id: 8a6b12c11dbbcd5dda0ccc9877dee1be785e0173
> prerequisite-patch-id: e98f013ce41c27d16f75ac3eb1c7eec4235cca0a
> prerequisite-patch-id: 285e11f96169ec82702a69b2fca5318c0e307508
> prerequisite-patch-id: 9fa89fb9f4ac839177307891bb240009f1d55e88
> prerequisite-patch-id: feebaed3f6e0c15e8fa468d64129fe9aa4411d57
> prerequisite-patch-id: 8f1093cf40180a623439d82e563e1dd18194cc19
> prerequisite-patch-id: d042674595d0678e71e5258d55b93d54b5c4
> prerequisite-patch-id: 286812aaed6630139583fd21d388137b8d5a6931
> prerequisite-patch-id: 54af8aa735a12282bb40a0ed87455e268ae356d9
> prerequisite-patch-id: cc5ee85759d99a6ebf18e39634dde65f15476f84
> prerequisite-patch-id: 3f8437c8bfda23c45839596ec432d81a95505061
> prerequisite-patch-id: f30d6fa2c7c7c417ee4bee0827c0ce587570db34
> prerequisite-patch-id: fa402f5deaa301587ced629dfa523728aece4705
> prerequisite-patch-id: 51f326f5de947cea58003cc8b988b54436689d1b
> prerequisite-patch-id: 4003c9a6b2792e797c333875e63a184df8fcc7e7
> prerequisite-patch-id: f73fd878eb9b65ecbed3c3ee8ca6725f7e55d5d2
> prerequisite-patch-id: 5e55b3e9b3809da22b8742f0ed356df6d6fdd301
> prerequisite-patch-id: 1fde98fffabd6313d1921d8b2f28691e9a191

Re: [PATCH] powerpc/bug: Cast to unsigned long before passing to inline asm

2021-08-31 Thread Nathan Chancellor
On Tue, Aug 31, 2021 at 11:27:20PM +1000, Michael Ellerman wrote:
> In commit 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to
> WARN_ON/__WARN_FLAGS() with asm goto") we changed WARN_ON(). Previously
> it would take the warning condition, x, and double negate it before
> converting the result to int, and passing that int to the underlying
> inline asm. ie:
> 
>   #define WARN_ON(x) ({
>   int __ret_warn_on = !!(x);
>   if (__builtin_constant_p(__ret_warn_on)) {
>   ...
>   } else {
>   BUG_ENTRY(PPC_TLNEI " %4, 0",
> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
> "r" (__ret_warn_on));
> 
> The asm then does a full register width comparison with zero and traps
> if it is non-zero (PPC_TLNEI).
> 
> The new code instead passes the full expression, x, with some unknown
> type, to the inline asm:
> 
>   #define WARN_ON(x) ({
>   ...
>   do {
>   if (__builtin_constant_p((x))) {
>   ...
>   } else {
>   ...
>   WARN_ENTRY(PPC_TLNEI " %4, 0",
>  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
>  __label_warn_on, "r" (x));
> 
> This was not seen to cause any issues with GCC, however with clang in at
> least one case it leads to a WARN_ON() that fires incorrectly and
> repeatedly at boot, as reported[1] by Nathan:
> 
>   WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
> 5.14.0-rc7-next-20210825 #1
>   NIP:  c07ff81c LR: c090a038 CTR: 
>   REGS: c73c32a0 TRAP: 0700   Tainted: GW  
> (5.14.0-rc7-next-20210825)
>   MSR:  82029032   CR: 22000a40  XER: 
>   CFAR: c090a034 IRQMASK: 0
>   GPR00: c090a038 c73c3540 c1be3200 0001
>   GPR04: c72d65c0  c91ba798 c91bb0a0
>   GPR08: 0001  c8581918 fc00
>   GPR12: 44000240 c1dd c0012300 
>   GPR16:    
>   GPR20:    
>   GPR24:  c17e3200  c1a0e778
>   GPR28: c72d65b0 c72d65a8 c7de72c8 c73c35d0
>   NIP .klist_add_tail+0x3c/0x110
>   LR  .bus_add_driver+0x148/0x290
>   Call Trace:
> 0xc73c35d0 (unreliable)
> .bus_add_driver+0x148/0x290
> .driver_register+0xb8/0x190
> .__hid_register_driver+0x70/0xd0
> .redragon_driver_init+0x34/0x58
> .do_one_initcall+0x130/0x3b0
> .do_initcall_level+0xd8/0x188
> .do_initcalls+0x7c/0xdc
> .kernel_init_freeable+0x178/0x21c
> .kernel_init+0x34/0x220
> .ret_from_kernel_thread+0x58/0x60
>   Instruction dump:
>   fba10078 7c7d1b78 3861 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
>   fb9d0008 fb9d0010 9064 fbdd <0b1e> e87e0018 2823 41820024
> 
> The instruction dump shows that we are trapping because r30 is not zero:
>   tdnei   r30,0
> 
> Where r30 = c7de72c8
> 
> The WARN_ON() comes from:
> 
>   static void knode_set_klist(struct klist_node *knode, struct klist *klist)
>   {
>   knode->n_klist = klist;
>   /* no knode deserves to start its life dead */
>   WARN_ON(knode_dead(knode));
>   ^
> 
> Where:
> 
>   #define KNODE_DEAD  1LU
> 
>   static bool knode_dead(struct klist_node *knode)
>   {
>   return (unsigned long)knode->n_klist & KNODE_DEAD;
>   }
> 
> The full disassembly shows that the compiler has not generated any code
> to apply the "& KNODE_DEAD" to the n_klist pointer, which is surprising.
> 
> Nathan filed an LLVM bug [2], in which Eli Friedman explained that "if
> you pass a value of a type that's narrower than a register to an inline
> asm, the high bits are undefined". In this case we are passing a bool
> to the inline asm, which is a single bit value, and so the compiler
> thinks it can leave the high bits of r30 unmasked, because only the
> value of bit 0 matters.
> 
> Because the inline asm compares the full width of the register (32 or
> 64-bit) we need to ensure the value passed to the inline asm has all
> bits defined. So fix it by casting to long.
> 
> We also already cast to long for the similar case in BUG_ENTRY(), which
> it turns out was added to fix a similar bug in 2005 in commit
> 32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels").
> 
> [1]: http://lore.kernel.org/r/YSa1O4fcX1nNKqN/@Ryzen-9-3900X.localdomain
> [2]: https://bugs.llvm.org/show_bug.cgi?id=51634
> 
> Fixes: 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to 
> WARN_ON/__WARN_FLAGS() with asm goto")
> Reported-by: Nathan Chan

Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks

2021-08-31 Thread Paul Moore
On Tue, Aug 31, 2021 at 2:58 PM Dan Williams  wrote:
> On Tue, Aug 31, 2021 at 6:53 AM Paul Moore  wrote:
> > On Tue, Aug 31, 2021 at 5:09 AM Ondrej Mosnacek  wrote:
> > > On Sat, Jun 19, 2021 at 12:18 AM Dan Williams  
> > > wrote:
> > > > On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek  
> > > > wrote:
> >
> > ...
> >
> > > > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > > > index 2acc6173da36..c1747b6555c7 100644
> > > > > --- a/drivers/cxl/mem.c
> > > > > +++ b/drivers/cxl/mem.c
> > > > > @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 
> > > > > opcode)
> > > > > if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
> > > > > return false;
> > > > >
> > > > > -   if (security_locked_down(LOCKDOWN_NONE))
> > > > > +   if (security_locked_down(current_cred(), LOCKDOWN_NONE))
> > > >
> > > > Acked-by: Dan Williams 
> > > >
> > > > ...however that usage looks wrong. The expectation is that if kernel
> > > > integrity protections are enabled then raw command access should be
> > > > disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
> > > > in terms of the command capabilities to filter.
> > >
> > > Yes, the LOCKDOWN_NONE seems wrong here... but it's a pre-existing bug
> > > and I didn't want to go down yet another rabbit hole trying to fix it.
> > > I'll look at this again once this patch is settled - it may indeed be
> > > as simple as replacing LOCKDOWN_NONE with LOCKDOWN_PCI_ACCESS.
> >
> > At this point you should be well aware of my distaste for merging
> > patches that have known bugs in them.  Yes, this is a pre-existing
> > condition, but it seems well within the scope of this work to address
> > it as well.
> >
> > This isn't something that is going to get merged while the merge
> > window is open, so at the very least you've got almost two weeks to
> > sort this out - please do that.
>
> Yes, apologies, I should have sent the fix shortly after noticing the
> problem. I'll get the CXL bug fix out of the way so Ondrej can move
> this along.

Thanks Dan.

-- 
paul moore
www.paul-moore.com


Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks

2021-08-31 Thread Dan Williams
On Tue, Aug 31, 2021 at 6:53 AM Paul Moore  wrote:
>
> On Tue, Aug 31, 2021 at 5:09 AM Ondrej Mosnacek  wrote:
> > On Sat, Jun 19, 2021 at 12:18 AM Dan Williams  
> > wrote:
> > > On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek  
> > > wrote:
>
> ...
>
> > > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > > index 2acc6173da36..c1747b6555c7 100644
> > > > --- a/drivers/cxl/mem.c
> > > > +++ b/drivers/cxl/mem.c
> > > > @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
> > > > if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
> > > > return false;
> > > >
> > > > -   if (security_locked_down(LOCKDOWN_NONE))
> > > > +   if (security_locked_down(current_cred(), LOCKDOWN_NONE))
> > >
> > > Acked-by: Dan Williams 
> > >
> > > ...however that usage looks wrong. The expectation is that if kernel
> > > integrity protections are enabled then raw command access should be
> > > disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
> > > in terms of the command capabilities to filter.
> >
> > Yes, the LOCKDOWN_NONE seems wrong here... but it's a pre-existing bug
> > and I didn't want to go down yet another rabbit hole trying to fix it.
> > I'll look at this again once this patch is settled - it may indeed be
> > as simple as replacing LOCKDOWN_NONE with LOCKDOWN_PCI_ACCESS.
>
> At this point you should be well aware of my distaste for merging
> patches that have known bugs in them.  Yes, this is a pre-existing
> condition, but it seems well within the scope of this work to address
> it as well.
>
> This isn't something that is going to get merged while the merge
> window is open, so at the very least you've got almost two weeks to
> sort this out - please do that.

Yes, apologies, I should have sent the fix shortly after noticing the
problem. I'll get the CXL bug fix out of the way so Ondrej can move
this along.


[powerpc:merge] BUILD SUCCESS eba39ec79104b8b498e9862da658748e7539b100

2021-08-31 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: eba39ec79104b8b498e9862da658748e7539b100  Automatic merge of 
'next' into merge (2021-08-29 12:19)

elapsed time: 2995m

configs tested: 117
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm64   defconfig
arm defconfig
arm64allyesconfig
arm  allyesconfig
arm  allmodconfig
armzeus_defconfig
h8300   h8s-sim_defconfig
sh ap325rxa_defconfig
sh  sdk7786_defconfig
shtitan_defconfig
sh   se7722_defconfig
arc haps_hs_defconfig
sh espt_defconfig
shedosk7705_defconfig
arm  imote2_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a005-20210831
i386 randconfig-a002-20210831
i386 randconfig-a003-20210831
i386 randconfig-a006-20210831
i386 randconfig-a004-20210831
i386 randconfig-a001-20210831
x86_64   randconfig-a014-20210829
x86_64   randconfig-a016-20210829
x86_64   randconfig-a015-20210829
x86_64   randconfig-a012-20210829
x86_64   randconfig-a013-20210829
x86_64   randconfig-a011-20210829
x86_64   randconfig-a014-20210830
x86_64   randconfig-a015-20210830
x86_64   randconfig-a013-20210830
x86_64   randconfig-a016-20210830
x86_64   randconfig-a012-20210830
x86_64   randconfig-a011-20210830
i386 randconfig-a011-20210829
i386 randconfig-a016-20210829
i386 randconfig-a012-20210829
i386 randconfig-a014-20210829
i386 randconfig-a013-20210829
i386 randconfig-a015-20210829
i386 randconfig-a016-20210830
i386 randconfig-a011-20210830
i386 randconfig-a015-20210830
i386 randconfig-a014-20210830
i386 randconfig-a012-20210830
i386 randconfig-a013-20210830
arc  randconfig-r043-20210829
riscvrandconfig-r042-20210829
s390 randconfig-r044-20210829
s390 randconfig-r044-20210830
arc  randconfig-r043-20210830
riscvrandconfig-r042-20210830
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
um   x86_64_defconfig
um i386_defconfig
x86_64   allyesconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a001-20210829
x86_64   randconfig-a006-20210829
x86_64   randc

Re: [PATCH] powerpc/head_check: Fix shellcheck errors

2021-08-31 Thread Michael Ellerman
On Tue, 17 Aug 2021 22:51:54 +1000, Michael Ellerman wrote:
> Replace "cat file | grep pattern" with "grep pattern file", and quote a
> few variables. Together that fixes all shellcheck errors.

Applied to powerpc/next.

[1/1] powerpc/head_check: Fix shellcheck errors
  https://git.kernel.org/powerpc/c/e95ad5f21693a37c0d318b5988c5a0de324eb3e3

cheers


Re: [PATCH v2] powerpc/32: Add support for out-of-line static calls

2021-08-31 Thread Peter Zijlstra
On Tue, Aug 31, 2021 at 01:12:26PM +, Christophe Leroy wrote:
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 36b72d972568..a0fe69d8ec83 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -247,6 +247,7 @@ config PPC
>   select HAVE_SOFTIRQ_ON_OWN_STACK
>   select HAVE_STACKPROTECTOR  if PPC32 && 
> $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>   select HAVE_STACKPROTECTOR  if PPC64 && 
> $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
> + select HAVE_STATIC_CALL if PPC32
>   select HAVE_SYSCALL_TRACEPOINTS
>   select HAVE_VIRT_CPU_ACCOUNTING
>   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
> diff --git a/arch/powerpc/include/asm/static_call.h 
> b/arch/powerpc/include/asm/static_call.h
> new file mode 100644
> index ..2402c6d32439
> --- /dev/null
> +++ b/arch/powerpc/include/asm/static_call.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_STATIC_CALL_H
> +#define _ASM_POWERPC_STATIC_CALL_H
> +
> +#define __POWERPC_SCT(name, inst)\
> + asm(".pushsection .text, \"ax\" \n" \
> + ".align 5   \n" \
> + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> + STATIC_CALL_TRAMP_STR(name) ":  \n" \
> + inst "  \n" \
> + "   lis 12,1f@ha\n" \
> + "   lwz 12,1f@l(12) \n" \
> + "   mtctr   12  \n" \
> + "   bctr\n" \
> + "1: .long 0 \n" \
> + "   nop \n" \
> + "   nop \n" \
> + ".type " STATIC_CALL_TRAMP_STR(name) ", @function   \n" \
> + ".size " STATIC_CALL_TRAMP_STR(name) ", . - " 
> STATIC_CALL_TRAMP_STR(name) " \n" \
> + ".popsection\n")
> +
> +#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func)__POWERPC_SCT(name, "b 
> " #func)
> +#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) __POWERPC_SCT(name, 
> "blr")
> +
> +#endif /* _ASM_POWERPC_STATIC_CALL_H */

> diff --git a/arch/powerpc/kernel/static_call.c 
> b/arch/powerpc/kernel/static_call.c
> new file mode 100644
> index ..e5e78205ccb4
> --- /dev/null
> +++ b/arch/powerpc/kernel/static_call.c
> @@ -0,0 +1,36 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +
> +#include 
> +
> +void arch_static_call_transform(void *site, void *tramp, void *func, bool 
> tail)
> +{
> + int err;
> + unsigned long target = (long)func;
> + bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
> +
> + if (!tramp)
> + return;
> +
> + mutex_lock(&text_mutex);
> +
> + if (func && !is_short) {
> + err = patch_instruction(tramp + 20, ppc_inst(target));
> + if (err)
> + goto out;
> + }
> +
> + if (!func)
> + err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
> + else if (is_short)
> + err = patch_branch(tramp, target, 0);
> + else
> + err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
> +out:
> + mutex_unlock(&text_mutex);
> +
> + if (err)
> + panic("%s: patching failed %pS at %pS\n", __func__, func, 
> tramp);
> +}
> +EXPORT_SYMBOL_GPL(arch_static_call_transform);

Yes, this should work nicely!

Since you have the two nop's at the end, you could frob in an
optimization for __static_call_return0 without too much issue.

Replace the two nops with (excuse my ppc asm):

li r3, 0
blr

and augment arch_static_call_transform() with something like:

if (func == &__static_call_return0)
err = patch_branch(tramp, tramp+24, 0);



Re: [PATCH v3 1/3] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()

2021-08-31 Thread Michael Ellerman
On Mon, 23 Aug 2021 08:24:20 + (UTC), Christophe Leroy wrote:
> In those hot functions that are called at every interrupt, any saved
> cycle is worth it.
> 
> interrupt_exit_user_prepare() and interrupt_exit_kernel_prepare() are
> called from three places:
> - From entry_32.S
> - From interrupt_64.S
> - From interrupt_exit_user_restart() and interrupt_exit_kernel_restart()
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
  https://git.kernel.org/powerpc/c/133c17a1788d68c9fff59d5f724a4ba14647a16d

cheers


Re: [PATCH v3 0/5] Updates to powerpc for robust CPU online/offline

2021-08-31 Thread Michael Ellerman
On Thu, 26 Aug 2021 15:35:16 +0530, Srikar Dronamraju wrote:
> Changelog v2 -> v3:
> v2: 
> https://lore.kernel.org/linuxppc-dev/20210821102535.169643-1-sri...@linux.vnet.ibm.com/t/#u
> Add patch 1: to drop dbg and numa=debug (Suggested by Michael Ellerman)
> Add patch 2: to convert printk to pr_xxx (Suggested by Michael Ellerman)
>   Use pr_warn instead of pr_debug(WARNING) (Suggested by Laurent)
> 
> Changelog v1 -> v2:
> Moved patch to this series: powerpc/numa: Fill distance_lookup_table for 
> offline nodes
> fixed a missing prototype warning
> 
> [...]

Patches 1-4 applied to powerpc/next.

[1/5] powerpc/numa: Drop dbg in favour of pr_debug
  https://git.kernel.org/powerpc/c/544af6429777cefae2f8af9a9866df5e8cb21763
[2/5] powerpc/numa: convert printk to pr_xxx
  https://git.kernel.org/powerpc/c/506c2075ffd8db352c53201ef166948a272e3bce
[3/5] powerpc/numa: Print debug statements only when required
  https://git.kernel.org/powerpc/c/544a09ee7434b949552266d20ece538d35535bd5
[4/5] powerpc/numa: Update cpu_cpu_map on CPU online/offline
  https://git.kernel.org/powerpc/c/9a245d0e1f006bc7ccf0285d0d520ed304d00c4a

cheers


Re: [PATCH v2 0/3] powerpc/smp: Misc fixes

2021-08-31 Thread Michael Ellerman
On Thu, 26 Aug 2021 15:33:58 +0530, Srikar Dronamraju wrote:
> Changelog : v1 -> v2:
> v1: 
> https://lore.kernel.org/linuxppc-dev/20210821092419.167454-1-sri...@linux.vnet.ibm.com/t/#u``
> [ patch 1: Updated to use DIV_ROUND_UP instead of max to handle more 
> situations ]
> [ patch 2: updated changelog to make it more generic powerpc issue ]
> 
> The 1st patch fixes a regression which causes a crash when booted with
> nr_cpus=2.
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/smp: Fix a crash while booting kvm guest with nr_cpus=2
  https://git.kernel.org/powerpc/c/8efd249babea2fec268cff90b9f5ca723dbb7499
[2/3] powerpc/smp: Update cpu_core_map on all PowerPc systems
  https://git.kernel.org/powerpc/c/b8b928030332a0ca16d42433eb2c3085600d8704
[3/3] powerpc/smp: Enable CACHE domain for shared processor
  https://git.kernel.org/powerpc/c/5bf63497b8ddf53d8973f68076119e482639b2bb

cheers


Re: [PATCH v6 00/11] DDW + Indirect Mapping

2021-08-31 Thread Michael Ellerman
On Tue, 17 Aug 2021 03:39:18 -0300, Leonardo Bras wrote:
> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> works with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
> 
> As of today, if the assumed direct mapping is not possible, DDW creation
> is skipped and the default DMA window "ibm,dma-window" is used instead.
> 
> [...]

Applied to powerpc/next.

[01/11] powerpc/pseries/iommu: Replace hard-coded page shift

https://git.kernel.org/powerpc/c/0c634bafe3bbee7a36dca7f1277057e05bf14d91
[02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper

https://git.kernel.org/powerpc/c/3c33066a21903076722a2881556a92aa3cd7d359
[03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper

https://git.kernel.org/powerpc/c/4ff8677a0b192a58d998d1d34fc5168203041a24
[04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper

https://git.kernel.org/powerpc/c/92a23219299cedde52e3298788484f4875d5ce0f
[05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00

https://git.kernel.org/powerpc/c/2ca73c54ce24489518a56d816331b774044c2445
[06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor 
enable_ddw()

https://git.kernel.org/powerpc/c/7ed2ed2db2685a285cb09ab330dc4efea0b64022
[07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new 
helper

https://git.kernel.org/powerpc/c/fc8cba8f989fb98e496b33a78476861e246c42a0
[08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property 
name

https://git.kernel.org/powerpc/c/a5fd95120c653962a9e75e260a35436b96d2c991
[09/11] powerpc/pseries/iommu: Find existing DDW with given property name

https://git.kernel.org/powerpc/c/8599395d34f2dd7b77bef42da1d99798e7a3d58f
[10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping

https://git.kernel.org/powerpc/c/381ceda88c4c4c8345cad1cffa6328892f15dca6
[11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window"

https://git.kernel.org/powerpc/c/57dbbe590f152e5e8a3ff8bf5ba163df34eeae0b

cheers


Re: [PATCH 0/3] powerpc/microwatt: Device tree and config updates

2021-08-31 Thread Michael Ellerman
On Thu, 26 Aug 2021 21:56:50 +0930, Joel Stanley wrote:
> This enables the liteeth network device for microwatt which will be
> merged in v5.15.
> 
> It also turns on some options so the microwatt defconfig can be used to
> boot a userspace with systemd.
> 
> Joel Stanley (3):
>   powerpc/microwatt: Add Ethernet to device tree
>   powerpc/configs/microwattt: Enable Liteeth
>   powerpc/configs/microwatt: Enable options for systemd
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/microwatt: Add Ethernet to device tree
  https://git.kernel.org/powerpc/c/602d0f96563c2e0b8e1ddb22ac46bf7f58480d64
[2/3] powerpc/configs/microwattt: Enable Liteeth
  https://git.kernel.org/powerpc/c/ef4fcaf99cd27eb48790f2adc4eff456dbe1dec4
[3/3] powerpc/configs/microwatt: Enable options for systemd
  https://git.kernel.org/powerpc/c/3e18e271182206c996a3a7efbbe70c66307ef137

cheers


Re: [PATCH] powerpc: Redefine HMT_xxx macros as empty on PPC32

2021-08-31 Thread Michael Ellerman
On Wed, 25 Aug 2021 13:34:45 + (UTC), Christophe Leroy wrote:
> HMT_xxx macros are macros for adjusting thread priority
> (hardware multi-threading) are macros inherited from PPC64
> via commit 5f7c690728ac ("[PATCH] powerpc: Merged ppc_asm.h")
> 
> Those instructions are pointless on PPC32, but some common
> fonctions like arch_cpu_idle() use them.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Redefine HMT_xxx macros as empty on PPC32
  https://git.kernel.org/powerpc/c/8149238ffd210875f5a77e3c654bb59b58da35e3

cheers


Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks

2021-08-31 Thread Paul Moore
On Tue, Aug 31, 2021 at 5:09 AM Ondrej Mosnacek  wrote:
> On Sat, Jun 19, 2021 at 12:18 AM Dan Williams  
> wrote:
> > On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek  wrote:

...

> > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > > index 2acc6173da36..c1747b6555c7 100644
> > > --- a/drivers/cxl/mem.c
> > > +++ b/drivers/cxl/mem.c
> > > @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
> > > if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
> > > return false;
> > >
> > > -   if (security_locked_down(LOCKDOWN_NONE))
> > > +   if (security_locked_down(current_cred(), LOCKDOWN_NONE))
> >
> > Acked-by: Dan Williams 
> >
> > ...however that usage looks wrong. The expectation is that if kernel
> > integrity protections are enabled then raw command access should be
> > disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
> > in terms of the command capabilities to filter.
>
> Yes, the LOCKDOWN_NONE seems wrong here... but it's a pre-existing bug
> and I didn't want to go down yet another rabbit hole trying to fix it.
> I'll look at this again once this patch is settled - it may indeed be
> as simple as replacing LOCKDOWN_NONE with LOCKDOWN_PCI_ACCESS.

At this point you should be well aware of my distaste for merging
patches that have known bugs in them.  Yes, this is a pre-existing
condition, but it seems well within the scope of this work to address
it as well.

This isn't something that is going to get merged while the merge
window is open, so at the very least you've got almost two weeks to
sort this out - please do that.

-- 
paul moore
www.paul-moore.com


[PATCH] powerpc/ptdump: Fix generic ptdump for 64-bit

2021-08-31 Thread Michael Ellerman
Since the conversion to generic ptdump we see crashes on 64-bit:

  BUG: Unable to handle kernel data access on read at 0xc0eeff7f
  Faulting instruction address: 0xc045e5fc
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP __walk_page_range+0x2bc/0xce0
  LR  __walk_page_range+0x240/0xce0
  Call Trace:
__walk_page_range+0x240/0xce0 (unreliable)
walk_page_range_novma+0x74/0xb0
ptdump_walk_pgd+0x98/0x170
ptdump_check_wx+0x88/0xd0
mark_rodata_ro+0x48/0x80
kernel_init+0x74/0x1a0
ret_from_kernel_thread+0x5c/0x64

What's happening is that have walked off the end of the kernel page
tables, and started dereferencing junk values.

That happens because we initialised the ptdump_range to span all the way
up to 0x:

static struct ptdump_range ptdump_range[] __ro_after_init = {
{TASK_SIZE_MAX, ~0UL},

But the kernel page tables don't span that far. So on 64-bit set the end
of the range to be the address immediately past the end of the kernel
page tables, to limit the page table walk to valid addresses.

Fixes: e084728393a5 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
Reported-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/ptdump/ptdump.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 2d80d775d15e..bf251191e78d 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -359,6 +359,8 @@ static int __init ptdump_init(void)
ptdump_range[0].start = KERN_VIRT_START;
else
ptdump_range[0].start = PAGE_OFFSET;
+
+   ptdump_range[0].end = PAGE_OFFSET + (PGDIR_SIZE * PTRS_PER_PGD);
 #endif
 
populate_markers();

base-commit: e1ab9a730b426fadc018f91b7c98412473e542fb
prerequisite-patch-id: 942553bda7d83bbae8bf6b2b718033d488ee2410
prerequisite-patch-id: a14c44e671eba8648c4fe385a2552fd57875ec8a
prerequisite-patch-id: 94f5c890f54da2b46f06c60562e879171fab2be3
prerequisite-patch-id: 330af32f2aa34a432d450acc9f6e9fd1cec96417
prerequisite-patch-id: b46c65afa63944f3fb02f4b9bdf940507bb25de6
prerequisite-patch-id: c4ba00ee949f70d7745f75bad11bbb2416f329f1
prerequisite-patch-id: f479601944d0aa615716d5349d93bd6e3d5619c1
prerequisite-patch-id: 9523cde933393b2d68648cecb740efdba9dd8601
prerequisite-patch-id: 034afc97c841a6dcd2b9932406f391d65d18bf87
prerequisite-patch-id: effd7ac8a7db6b59a2677c9c3a7ef8b3ef8bdaf8
prerequisite-patch-id: 23883cf116ee69b452db3c6e10dd49e756e7b5d5
prerequisite-patch-id: 37b6695321c96db466b0faba9308bacfb79c7ced
prerequisite-patch-id: 83420e68ca4476c9ba5a67aa19e1fdc0b6d656a4
prerequisite-patch-id: 362219acf820b78b83c6c09071a636b28976a1ce
prerequisite-patch-id: 857513c5f431887d16a59d193834dcec636c73dc
prerequisite-patch-id: 49f6879a819e205b5361280ab923664fcd29daaf
prerequisite-patch-id: 5a37bcf70c5cb44d78de63a64e5ce920a0a7e419
prerequisite-patch-id: 2c06dd3833117b0498baa198694f6c7e84975840
prerequisite-patch-id: 5794a211ebbf7f0d416ae882443201621c00f615
prerequisite-patch-id: 19ed5ae34e233079c7f66376b8d309cac2b57dbc
prerequisite-patch-id: 1d4c82277473e8dbecf83faf6c4a6788538b064d
prerequisite-patch-id: 8cb5ecc4fe23dafb4a43192f93b669c80a548985
prerequisite-patch-id: 763b8d98c3aefd120862154b94814e3ef3578b5c
prerequisite-patch-id: f45e04e6d030eb157be550976b07dc891fa0836d
prerequisite-patch-id: 07b6fb682675845aca694deff1847bc7a40e1fec
prerequisite-patch-id: 7f1082effa12b1eba445cef90e4749155662888c
prerequisite-patch-id: 76743814dd8e6151c27676ae2e318579d658bf8b
prerequisite-patch-id: 8a6b12c11dbbcd5dda0ccc9877dee1be785e0173
prerequisite-patch-id: e98f013ce41c27d16f75ac3eb1c7eec4235cca0a
prerequisite-patch-id: 285e11f96169ec82702a69b2fca5318c0e307508
prerequisite-patch-id: 9fa89fb9f4ac839177307891bb240009f1d55e88
prerequisite-patch-id: feebaed3f6e0c15e8fa468d64129fe9aa4411d57
prerequisite-patch-id: 8f1093cf40180a623439d82e563e1dd18194cc19
prerequisite-patch-id: d042674595d0678e71e5258d55b93d54b5c4
prerequisite-patch-id: 286812aaed6630139583fd21d388137b8d5a6931
prerequisite-patch-id: 54af8aa735a12282bb40a0ed87455e268ae356d9
prerequisite-patch-id: cc5ee85759d99a6ebf18e39634dde65f15476f84
prerequisite-patch-id: 3f8437c8bfda23c45839596ec432d81a95505061
prerequisite-patch-id: f30d6fa2c7c7c417ee4bee0827c0ce587570db34
prerequisite-patch-id: fa402f5deaa301587ced629dfa523728aece4705
prerequisite-patch-id: 51f326f5de947cea58003cc8b988b54436689d1b
prerequisite-patch-id: 4003c9a6b2792e797c333875e63a184df8fcc7e7
prerequisite-patch-id: f73fd878eb9b65ecbed3c3ee8ca6725f7e55d5d2
prerequisite-patch-id: 5e55b3e9b3809da22b8742f0ed356df6d6fdd301
prerequisite-patch-id: 1fde98fffabd6313d1921d8b2f28691e9a191b1d
prerequisite-patch-id: 51c0595fe54ad077c736b7a4351c2f2700ab66d7
prerequisite-patch-id: e490360db8c2dc7cbf693258ca93e4597f165c6f
prerequisite-patch-id: c4354b3226d31d8ddb6992956cf0ed12ea97cb8e
prerequisite-patch-id: c67a26ed658da4b11a3319e0e99c4a84afb68d80
prerequisite-patch-i

Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks

2021-08-31 Thread Paul Moore
On Tue, Aug 31, 2021 at 5:08 AM Ondrej Mosnacek  wrote:
> Can we move this forward somehow, please?

As mentioned previously, I can merge this via the SELinux tree but I
need to see some ACKs from the other subsystems first, not to mention
some resolution to the outstanding questions.

-- 
paul moore
www.paul-moore.com


Re: [PATCH v4 4/4] powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP

2021-08-31 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 30/08/2021 à 13:55, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>> Le 30/08/2021 à 09:52, Michael Ellerman a écrit :
 Christophe Leroy  writes:
> Le 29/08/2021 à 20:55, Nathan Chancellor a écrit :
>> On Thu, Jul 08, 2021 at 04:49:43PM +, Christophe Leroy wrote:
>>> This patch converts powerpc to the generic PTDUMP implementation.
>>>
>>
>> This patch as commit e084728393a5 ("powerpc/ptdump: Convert powerpc to
>> GENERIC_PTDUMP") in powerpc/next causes a panic with Fedora's ppc64le
>> config [1] when booting up in QEMU with [2]:
>>
>> [1.621864] BUG: Unable to handle kernel data access on read at 
>> 0xc0eeff7f
>> [1.623058] Faulting instruction address: 0xc045e5fc
>> [1.623832] Oops: Kernel access of bad area, sig: 11 [#1]
>> [1.624318] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>> [1.625015] Modules linked in:
>> [1.625463] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
>> 5.14.0-rc7-next-20210827 #16
>> [1.626237] NIP:  c045e5fc LR: c045e580 CTR: 
>> c0518220
>> [1.626839] REGS: c752b820 TRAP: 0380   Not tainted  
>> (5.14.0-rc7-next-20210827)
>> [1.627528] MSR:  92009033   CR: 
>> 84002482  XER: 2000
>> [1.628449] CFAR: c0518300 IRQMASK: 0
>> [1.628449] GPR00: c045e580 c752bac0 c28a9300 
>> 
>> [1.628449] GPR04: c2008000  000a 
>> 0001
>> [1.628449] GPR08: c0eeff7f 0012  
>> 
>> [1.628449] GPR12:  c2b2 fffe 
>> c2971a70
>> [1.628449] GPR16: c2960040 c11a8f98 c752bbf0 
>> 
>> [1.628449] GPR20: c2008fff c0eeff7f c2971a68 
>> c00a0003ff00
>> [1.628449] GPR24: c2971a78 0002 0001 
>> c11a8f98
>> [1.628449] GPR28: c11a8f98 c28daef8 c2008000 
>> c2009000
>> [1.634090] NIP [c045e5fc] __walk_page_range+0x2bc/0xce0
>> [1.635117] LR [c045e580] __walk_page_range+0x240/0xce0
>> [1.635755] Call Trace:
>> [1.636018] [c752bac0] [c045e580] 
>> __walk_page_range+0x240/0xce0 (unreliable)
>> [1.636811] [c752bbd0] [c045f234] 
>> walk_page_range_novma+0x74/0xb0
>> [1.637459] [c752bc20] [c0518448] 
>> ptdump_walk_pgd+0x98/0x170
>> [1.638138] [c752bc70] [c00aa988] 
>> ptdump_check_wx+0x88/0xd0
>> [1.638738] [c752bd50] [c008d6d8] 
>> mark_rodata_ro+0x48/0x80
>> [1.639299] [c752bdb0] [c0012a34] 
>> kernel_init+0x74/0x1a0
>> [1.639842] [c752be10] [c000cfd4] 
>> ret_from_kernel_thread+0x5c/0x64
>> [1.640597] Instruction dump:
>> [1.641021] 38e7 39490010 7ce707b4 7fca5436 79081564 7d4a3838 
>> 7908f082 794a1f24
>> [1.641740] 78a8f00e 30e6 7ea85214 7ce73110 <7d48502a> 78f90fa4 
>> 2c2a 39290010
>> [1.642771] ---[ end trace 6cf72b085097ad52 ]---
>> [1.643220]
>> [2.644228] Kernel panic - not syncing: Attempted to kill init! 
>> exitcode=0x000b
>> [2.645523] ---[ end Kernel panic - not syncing: Attempted to kill 
>> init! exitcode=0x000b ]---
>>
>> This is not compiler specific, I can reproduce it with GCC 11.2.0 and
>> binutils 2.37. If there is any additional information I can provide,
>> please let me know.
>
> Can you provide a dissassembly of __walk_page_range() ? Or provide your 
> vmlinux binary.

 It seems to be walking of the end of the pgd.

 [3.373800] walk_p4d_range: addr c00fff00 end c00fff80
 [3.373852] walk_p4d_range: addr c00fff80 end c010  
 <- end of pgd at PAGE_OFFSET + 4PB
 [3.373905] walk_p4d_range: addr c010 end c0100080
>>>
>>> Yes, I want it to walk from TASK_SIZE_MAX up to 0x :)
>> 
>> But the page table doesn't span that far? 0_o
>> 
>>> static struct ptdump_range ptdump_range[] __ro_after_init = {
>>> {TASK_SIZE_MAX, ~0UL},
>>> {0, 0}
>>> };
>>>
>>> Ok, well, ppc32 go up to 0x
>>>
>>> What's the top address to be used for ppc64 ?
>> 
>> It's different for (hash | radix) x page size.
>> 
>> The below works, and matches what we used to do.
>> 
>> Possibly we can come up with something cleaner, not sure.
>> 
>> cheers
>> 
>> 
>> diff --git a/arch/powerpc/mm/ptdump/ptdump.c 
>> b/arch/powerpc/mm/ptdump/ptdump.c
>> index 2d80d775d15e..3d3778a74969 100644
>> --- a

[PATCH] powerpc/bug: Cast to unsigned long before passing to inline asm

2021-08-31 Thread Michael Ellerman
In commit 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to
WARN_ON/__WARN_FLAGS() with asm goto") we changed WARN_ON(). Previously
it would take the warning condition, x, and double negate it before
converting the result to int, and passing that int to the underlying
inline asm. ie:

  #define WARN_ON(x) ({
int __ret_warn_on = !!(x);
if (__builtin_constant_p(__ret_warn_on)) {
...
} else {
BUG_ENTRY(PPC_TLNEI " %4, 0",
  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
  "r" (__ret_warn_on));

The asm then does a full register width comparison with zero and traps
if it is non-zero (PPC_TLNEI).

The new code instead passes the full expression, x, with some unknown
type, to the inline asm:

  #define WARN_ON(x) ({
...
do {
if (__builtin_constant_p((x))) {
...
} else {
...
WARN_ENTRY(PPC_TLNEI " %4, 0",
   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
   __label_warn_on, "r" (x));

This was not seen to cause any issues with GCC, however with clang in at
least one case it leads to a WARN_ON() that fires incorrectly and
repeatedly at boot, as reported[1] by Nathan:

  WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
5.14.0-rc7-next-20210825 #1
  NIP:  c07ff81c LR: c090a038 CTR: 
  REGS: c73c32a0 TRAP: 0700   Tainted: GW  
(5.14.0-rc7-next-20210825)
  MSR:  82029032   CR: 22000a40  XER: 
  CFAR: c090a034 IRQMASK: 0
  GPR00: c090a038 c73c3540 c1be3200 0001
  GPR04: c72d65c0  c91ba798 c91bb0a0
  GPR08: 0001  c8581918 fc00
  GPR12: 44000240 c1dd c0012300 
  GPR16:    
  GPR20:    
  GPR24:  c17e3200  c1a0e778
  GPR28: c72d65b0 c72d65a8 c7de72c8 c73c35d0
  NIP .klist_add_tail+0x3c/0x110
  LR  .bus_add_driver+0x148/0x290
  Call Trace:
0xc73c35d0 (unreliable)
.bus_add_driver+0x148/0x290
.driver_register+0xb8/0x190
.__hid_register_driver+0x70/0xd0
.redragon_driver_init+0x34/0x58
.do_one_initcall+0x130/0x3b0
.do_initcall_level+0xd8/0x188
.do_initcalls+0x7c/0xdc
.kernel_init_freeable+0x178/0x21c
.kernel_init+0x34/0x220
.ret_from_kernel_thread+0x58/0x60
  Instruction dump:
  fba10078 7c7d1b78 3861 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
  fb9d0008 fb9d0010 9064 fbdd <0b1e> e87e0018 2823 41820024

The instruction dump shows that we are trapping because r30 is not zero:
  tdnei   r30,0

Where r30 = c7de72c8

The WARN_ON() comes from:

  static void knode_set_klist(struct klist_node *knode, struct klist *klist)
  {
knode->n_klist = klist;
/* no knode deserves to start its life dead */
WARN_ON(knode_dead(knode));
^

Where:

  #define KNODE_DEAD1LU

  static bool knode_dead(struct klist_node *knode)
  {
return (unsigned long)knode->n_klist & KNODE_DEAD;
  }

The full disassembly shows that the compiler has not generated any code
to apply the "& KNODE_DEAD" to the n_klist pointer, which is surprising.

Nathan filed an LLVM bug [2], in which Eli Friedman explained that "if
you pass a value of a type that's narrower than a register to an inline
asm, the high bits are undefined". In this case we are passing a bool
to the inline asm, which is a single bit value, and so the compiler
thinks it can leave the high bits of r30 unmasked, because only the
value of bit 0 matters.

Because the inline asm compares the full width of the register (32 or
64-bit) we need to ensure the value passed to the inline asm has all
bits defined. So fix it by casting to long.

We also already cast to long for the similar case in BUG_ENTRY(), which
it turns out was added to fix a similar bug in 2005 in commit
32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels").

[1]: http://lore.kernel.org/r/YSa1O4fcX1nNKqN/@Ryzen-9-3900X.localdomain
[2]: https://bugs.llvm.org/show_bug.cgi?id=51634

Fixes: 1e688dd2a3d6 ("powerpc/bug: Provide better flexibility to 
WARN_ON/__WARN_FLAGS() with asm goto")
Reported-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/bug.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..02c08d1492f8 100644
--- a/a

[PATCH v2] powerpc/32: Add support for out-of-line static calls

2021-08-31 Thread Christophe Leroy
Add support for out-of-line static calls on PPC32. This change
improve performance of calls to global function pointers by
using direct calls instead of indirect calls.

The trampoline is initialy populated with a 'blr' or branch to target,
followed by an unreachable long jump sequence.

In order to cater with parallele execution, the trampoline needs to
be updated in a way that ensures it remains consistent at all time.
This means we can't use the traditional lis/addi to load r12 with
the target address, otherwise there would be a window during which
the first instruction contains the upper part of the new target
address while the second instruction still contains the lower part of
the old target address. To avoid that the target address is stored
just after the 'bctr' and loaded from there with a single instruction.

Then, depending on the target distance, arch_static_call_transform()
will either replace the first instruction by a direct 'bl ' or
'nop' in order to have the trampoline fall through the long jump
sequence.

Performancewise the long jump sequence is probably not better than
the indirect calls set by GCC when we don't use static calls, but
such calls are unlikely to be required on powerpc32: With most
configurations the kernel size is far below 32 Mbytes so only
modules may happen to be too far. And even modules are likely to
be close enough as they are allocated below the kernel core and
as close as possible of the kernel text.

static_call selftest is running successfully with this change.

With this patch, __do_irq() has the following sequence to trace
irq entries:

c0004a00 <__SCT__tp_func_irq_entry>:
c0004a00:   48 00 00 e0 b   c0004ae0 <__traceiter_irq_entry>
c0004a04:   3d 80 c0 00 lis r12,-16384
c0004a08:   81 8c 4a 14 lwz r12,18964(r12)
c0004a0c:   7d 89 03 a6 mtctr   r12
c0004a10:   4e 80 04 20 bctr
c0004a14:   00 00 00 00 .long 0x0
c0004a18:   60 00 00 00 nop
c0004a1c:   60 00 00 00 nop
...
c0005654 <__do_irq>:
...
c0005664:   7c 7f 1b 78 mr  r31,r3
...
c00056a0:   81 22 00 00 lwz r9,0(r2)
c00056a4:   39 29 00 01 addir9,r9,1
c00056a8:   91 22 00 00 stw r9,0(r2)
c00056ac:   3d 20 c0 af lis r9,-16209
c00056b0:   81 29 74 cc lwz r9,29900(r9)
c00056b4:   2c 09 00 00 cmpwi   r9,0
c00056b8:   41 82 00 10 beq c00056c8 <__do_irq+0x74>
c00056bc:   80 69 00 04 lwz r3,4(r9)
c00056c0:   7f e4 fb 78 mr  r4,r31
c00056c4:   4b ff f3 3d bl  c0004a00 
<__SCT__tp_func_irq_entry>

Before this patch, __do_irq() was doing the following to trace irq
entries:

c0005700 <__do_irq>:
...
c0005710:   7c 7e 1b 78 mr  r30,r3
...
c000574c:   93 e1 00 0c stw r31,12(r1)
c0005750:   81 22 00 00 lwz r9,0(r2)
c0005754:   39 29 00 01 addir9,r9,1
c0005758:   91 22 00 00 stw r9,0(r2)
c000575c:   3d 20 c0 af lis r9,-16209
c0005760:   83 e9 f4 cc lwz r31,-2868(r9)
c0005764:   2c 1f 00 00 cmpwi   r31,0
c0005768:   41 82 00 24 beq c000578c <__do_irq+0x8c>
c000576c:   81 3f 00 00 lwz r9,0(r31)
c0005770:   80 7f 00 04 lwz r3,4(r31)
c0005774:   7d 29 03 a6 mtctr   r9
c0005778:   7f c4 f3 78 mr  r4,r30
c000577c:   4e 80 04 21 bctrl
c0005780:   85 3f 00 0c lwzur9,12(r31)
c0005784:   2c 09 00 00 cmpwi   r9,0
c0005788:   40 82 ff e4 bne c000576c <__do_irq+0x6c>

Behind the fact of now using a direct 'bl' instead of a
'load/mtctr/bctr' sequence, we can also see that we get one less
register on the stack.

Signed-off-by: Christophe Leroy 
---
v2: Use indirect load in long jump sequence to cater with parallele execution 
and preemption.
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/static_call.h | 25 ++
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kernel/static_call.c  | 36 ++
 4 files changed, 63 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/static_call.h
 create mode 100644 arch/powerpc/kernel/static_call.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 36b72d972568..a0fe69d8ec83 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -247,6 +247,7 @@ config PPC
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_STACKPROTECTOR  if PPC64 && 
$(cc-

Re: [PATCH] powerpc/64: Avoid link stack corruption in kexec_wait()

2021-08-31 Thread Daniel Axtens
Christophe Leroy  writes:

> Le 31/08/2021 à 08:17, Daniel Axtens a écrit :
>> Hi Christophe,
>> 
>>> Use bcl 20,31,+4 instead of bl in order to preserve link stack.
>>>
>>> See commit c974809a26a1 ("powerpc/vdso: Avoid link stack corruption
>>> in __get_datapage()") for details.
>> 
>>  From my understanding of that commit message, the change helps to keep
>> the link stack correctly balanced which is helpful for performance,
>> rather than for correctness. If I understand correctly, kexec_wait is
>> not in a hot path - rather it is where CPUs spin while waiting for
>> kexec. Is there any benefit in using the more complicated opcode in this
>> situation?
>
> AFAICS the main benefit is to keep things consistent over the kernel and not 
> have to wonder "is it a 
> hot path or not ? If it is I use bcl 20,31, if it is not I use bl". The best 
> way to keep things in 
> order is to always use the right instruction.

Yeah, Nick Piggin convinced me of this offline as well.

>
>> 
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>>   arch/powerpc/kernel/misc_64.S | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
>>> index 4b761a18a74d..613509907166 100644
>>> --- a/arch/powerpc/kernel/misc_64.S
>>> +++ b/arch/powerpc/kernel/misc_64.S
>>> @@ -255,7 +255,7 @@ _GLOBAL(scom970_write)
>>>* Physical (hardware) cpu id should be in r3.
>>>*/
>>>   _GLOBAL(kexec_wait)
>>> -   bl  1f
>>> +   bcl 20,31,1f
>>>   1:mflrr5
>> 
>> Would it be better to create a macro of some sort to wrap this unusual
>> special form so that the meaning is more clear?
>
> Not sure, I think people working with assembly will easily recognise that 
> form whereas an obscure 
> macro is always puzzling.
>
> I like macros when they allow you to not repeat again and again the same 
> sequence of several 
> instructions, but here it is a single quite simple instruction which is not 
> worth a macro in my mind.
>


Sure - I was mostly thinking specifically of the bcl; mflr situation but
I agree that for the single instruction it's not needed.

In short, I am convinced, and so:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Christophe


Re: [PATCH] powerpc/32: Add support for out-of-line static calls

2021-08-31 Thread Ard Biesheuvel
On Tue, 31 Aug 2021 at 10:53, Peter Zijlstra  wrote:
>
> On Tue, Aug 31, 2021 at 08:05:21AM +, Christophe Leroy wrote:
>
> > +#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \
> > + asm(".pushsection .text, \"ax\" \n" \
> > + ".align 4   \n" \
> > + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> > + STATIC_CALL_TRAMP_STR(name) ":  \n" \
> > + "   blr \n" \
> > + "   nop \n" \
> > + "   nop \n" \
> > + "   nop \n" \
> > + ".type " STATIC_CALL_TRAMP_STR(name) ", @function   \n" \
> > + ".size " STATIC_CALL_TRAMP_STR(name) ", . - " 
> > STATIC_CALL_TRAMP_STR(name) " \n" \
> > + ".popsection\n")
>
> > +static int patch_trampoline_32(u32 *addr, unsigned long target)
> > +{
> > + int err;
> > +
> > + err = patch_instruction(addr++, ppc_inst(PPC_RAW_LIS(_R12, 
> > PPC_HA(target;
> > + err |= patch_instruction(addr++, ppc_inst(PPC_RAW_ADDI(_R12, _R12, 
> > PPC_LO(target;
> > + err |= patch_instruction(addr++, ppc_inst(PPC_RAW_MTCTR(_R12)));
> > + err |= patch_instruction(addr, ppc_inst(PPC_RAW_BCTR()));
> > +
> > + return err;
> > +}
>
> There can be concurrent execution and modification; the above doesn't
> look safe in that regard. What happens if you've say, done the first
> two, but not the latter two and execution happens (on a different
> CPU or through IRQ context, etc..)?
>
> > +void arch_static_call_transform(void *site, void *tramp, void *func, bool 
> > tail)
> > +{
> > + int err;
> > + unsigned long target = (long)func;
> > +
> > + if (!tramp)
> > + return;
> > +
> > + mutex_lock(&text_mutex);
> > +
> > + if (!func)
> > + err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
> > + else if (is_offset_in_branch_range((long)target - (long)tramp))
> > + err = patch_branch(tramp, target, 0);
>
> These two are single instruction modifications and I'm assuming the
> hardware is sane enough that execution sees either the old or the new
> instruction. So this should work.
>
> > + else if (IS_ENABLED(CONFIG_PPC32))
> > + err = patch_trampoline_32(tramp, target);
> > + else
> > + BUILD_BUG();
> > +
> > + mutex_unlock(&text_mutex);
> > +
> > + if (err)
> > + panic("%s: patching failed %pS at %pS\n", __func__, func, 
> > tramp);
> > +}
> > +EXPORT_SYMBOL_GPL(arch_static_call_transform);
>
> One possible solution that we explored on ARM64, was having the
> trampoline be in 2 slots:
>
>
> b 1f
>
> 1:  blr
> nop
> nop
> nop
>
> 2:  blr
> nop
> nop
> nop
>
> Where initially the first slot is active (per "b 1f"), then you write
> the second slot, and as a final act, re-write the initial branch to
> point to slot 2.
>
> Then you execute synchronize_rcu_tasks() under your text mutex
> (careful!) to ensure all users of your slot1 are gone and the next
> modification repeats the whole thing, except for using slot1 etc..
>
> Eventually I think Ard came up with the latest ARM64 proposal which puts
> a literal in a RO section (could be the text section I suppose) and
> loads and branches to that.
>

Yes. The main reason is simply that anything else is premature
optimization: we have a clear use case (CFI) where out-of-line static
calls are faster than compiler generated indirect calls, even if the
static call sequence is based on a literal load and an indirect
branch, but CFI is not upstream [yet].

Once other use cases emerge, we will revisit this.



> Anyway, the thing is, you can really only modify a single instruction at
> the time and need to ensure concurrent execution is correct.


Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks

2021-08-31 Thread Ondrej Mosnacek
On Sat, Jun 19, 2021 at 12:18 AM Dan Williams  wrote:
> On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek  wrote:
> >
> > Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
> > lockdown") added an implementation of the locked_down LSM hook to
> > SELinux, with the aim to restrict which domains are allowed to perform
> > operations that would breach lockdown.
> >
> > However, in several places the security_locked_down() hook is called in
> > situations where the current task isn't doing any action that would
> > directly breach lockdown, leading to SELinux checks that are basically
> > bogus.
> >
> > To fix this, add an explicit struct cred pointer argument to
> > security_lockdown() and define NULL as a special value to pass instead
> > of current_cred() in such situations. LSMs that take the subject
> > credentials into account can then fall back to some default or ignore
> > such calls altogether. In the SELinux lockdown hook implementation, use
> > SECINITSID_KERNEL in case the cred argument is NULL.
> >
> > Most of the callers are updated to pass current_cred() as the cred
> > pointer, thus maintaining the same behavior. The following callers are
> > modified to pass NULL as the cred pointer instead:
> > 1. arch/powerpc/xmon/xmon.c
> >  Seems to be some interactive debugging facility. It appears that
> >  the lockdown hook is called from interrupt context here, so it
> >  should be more appropriate to request a global lockdown decision.
> > 2. fs/tracefs/inode.c:tracefs_create_file()
> >  Here the call is used to prevent creating new tracefs entries when
> >  the kernel is locked down. Assumes that locking down is one-way -
> >  i.e. if the hook returns non-zero once, it will never return zero
> >  again, thus no point in creating these files. Also, the hook is
> >  often called by a module's init function when it is loaded by
> >  userspace, where it doesn't make much sense to do a check against
> >  the current task's creds, since the task itself doesn't actually
> >  use the tracing functionality (i.e. doesn't breach lockdown), just
> >  indirectly makes some new tracepoints available to whoever is
> >  authorized to use them.
> > 3. net/xfrm/xfrm_user.c:copy_to_user_*()
> >  Here a cryptographic secret is redacted based on the value returned
> >  from the hook. There are two possible actions that may lead here:
> >  a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
> > task context is relevant, since the dumped data is sent back to
> > the current task.
> >  b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
> > dumped SA is broadcasted to tasks subscribed to XFRM events -
> > here the current task context is not relevant as it doesn't
> > represent the tasks that could potentially see the secret.
> >  It doesn't seem worth it to try to keep using the current task's
> >  context in the a) case, since the eventual data leak can be
> >  circumvented anyway via b), plus there is no way for the task to
> >  indicate that it doesn't care about the actual key value, so the
> >  check could generate a lot of "false alert" denials with SELinux.
> >  Thus, let's pass NULL instead of current_cred() here faute de
> >  mieux.
> >
> > Improvements-suggested-by: Casey Schaufler 
> > Improvements-suggested-by: Paul Moore 
> > Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux 
> > lockdown")
> > Signed-off-by: Ondrej Mosnacek 
> [..]
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 2acc6173da36..c1747b6555c7 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
> > if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
> > return false;
> >
> > -   if (security_locked_down(LOCKDOWN_NONE))
> > +   if (security_locked_down(current_cred(), LOCKDOWN_NONE))
>
> Acked-by: Dan Williams 
>
> ...however that usage looks wrong. The expectation is that if kernel
> integrity protections are enabled then raw command access should be
> disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
> in terms of the command capabilities to filter.

Yes, the LOCKDOWN_NONE seems wrong here... but it's a pre-existing bug
and I didn't want to go down yet another rabbit hole trying to fix it.
I'll look at this again once this patch is settled - it may indeed be
as simple as replacing LOCKDOWN_NONE with LOCKDOWN_PCI_ACCESS.

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.



Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks

2021-08-31 Thread Ondrej Mosnacek
On Fri, Jun 18, 2021 at 5:40 AM Paul Moore  wrote:
> On Wed, Jun 16, 2021 at 4:51 AM Ondrej Mosnacek  wrote:
> >
> > Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
> > lockdown") added an implementation of the locked_down LSM hook to
> > SELinux, with the aim to restrict which domains are allowed to perform
> > operations that would breach lockdown.
> >
> > However, in several places the security_locked_down() hook is called in
> > situations where the current task isn't doing any action that would
> > directly breach lockdown, leading to SELinux checks that are basically
> > bogus.
> >
> > To fix this, add an explicit struct cred pointer argument to
> > security_lockdown() and define NULL as a special value to pass instead
> > of current_cred() in such situations. LSMs that take the subject
> > credentials into account can then fall back to some default or ignore
> > such calls altogether. In the SELinux lockdown hook implementation, use
> > SECINITSID_KERNEL in case the cred argument is NULL.
> >
> > Most of the callers are updated to pass current_cred() as the cred
> > pointer, thus maintaining the same behavior. The following callers are
> > modified to pass NULL as the cred pointer instead:
> > 1. arch/powerpc/xmon/xmon.c
> >  Seems to be some interactive debugging facility. It appears that
> >  the lockdown hook is called from interrupt context here, so it
> >  should be more appropriate to request a global lockdown decision.
> > 2. fs/tracefs/inode.c:tracefs_create_file()
> >  Here the call is used to prevent creating new tracefs entries when
> >  the kernel is locked down. Assumes that locking down is one-way -
> >  i.e. if the hook returns non-zero once, it will never return zero
> >  again, thus no point in creating these files. Also, the hook is
> >  often called by a module's init function when it is loaded by
> >  userspace, where it doesn't make much sense to do a check against
> >  the current task's creds, since the task itself doesn't actually
> >  use the tracing functionality (i.e. doesn't breach lockdown), just
> >  indirectly makes some new tracepoints available to whoever is
> >  authorized to use them.
> > 3. net/xfrm/xfrm_user.c:copy_to_user_*()
> >  Here a cryptographic secret is redacted based on the value returned
> >  from the hook. There are two possible actions that may lead here:
> >  a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
> > task context is relevant, since the dumped data is sent back to
> > the current task.
> >  b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
> > dumped SA is broadcasted to tasks subscribed to XFRM events -
> > here the current task context is not relevant as it doesn't
> > represent the tasks that could potentially see the secret.
> >  It doesn't seem worth it to try to keep using the current task's
> >  context in the a) case, since the eventual data leak can be
> >  circumvented anyway via b), plus there is no way for the task to
> >  indicate that it doesn't care about the actual key value, so the
> >  check could generate a lot of "false alert" denials with SELinux.
> >  Thus, let's pass NULL instead of current_cred() here faute de
> >  mieux.
> >
> > Improvements-suggested-by: Casey Schaufler 
> > Improvements-suggested-by: Paul Moore 
> > Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux 
> > lockdown")
> > Signed-off-by: Ondrej Mosnacek 
>
> This seems reasonable to me, but before I merge it into the SELinux
> tree I think it would be good to get some ACKs from the relevant
> subsystem folks.  I don't believe we ever saw a response to the last
> question for the PPC folks, did we?

Can we move this forward somehow, please?

Quoting the yet-unanswered question from the v2 thread for convenience:

> > > The callers migrated to the new hook, passing NULL as cred:
> > > 1. arch/powerpc/xmon/xmon.c
[...]
> >
> > This definitely sounds like kernel_t based on the description above.
>
> Here I'm a little concerned that the hook might be called from some
> unusual interrupt, which is not masked by spin_lock_irqsave()... We
> ran into this with PMI (Platform Management Interrupt) before, see
> commit 5ae5fbd21079 ("powerpc/perf: Fix handling of privilege level
> checks in perf interrupt context"). While I can't see anything that
> would suggest something like this happening here, the whole thing is
> so foreign to me that I'm wary of making assumptions :)
>
> @Michael/PPC devs, can you confirm to us that xmon_is_locked_down() is
> only called from normal syscall/interrupt context (as opposed to
> something tricky like PMI)?

I strongly suspect the answer will be just "Of course it is, why would
you even ask such a silly question?", but please let's have it on
record so we can finally get this patch merged...


> > ---
> >
> > v3:
> > - add the c

Re: [PATCH] powerpc/64: Avoid link stack corruption in kexec_wait()

2021-08-31 Thread Christophe Leroy




Le 31/08/2021 à 08:17, Daniel Axtens a écrit :

Hi Christophe,


Use bcl 20,31,+4 instead of bl in order to preserve link stack.

See commit c974809a26a1 ("powerpc/vdso: Avoid link stack corruption
in __get_datapage()") for details.


 From my understanding of that commit message, the change helps to keep
the link stack correctly balanced which is helpful for performance,
rather than for correctness. If I understand correctly, kexec_wait is
not in a hot path - rather it is where CPUs spin while waiting for
kexec. Is there any benefit in using the more complicated opcode in this
situation?


AFAICS the main benefit is to keep things consistent over the kernel and not have to wonder "is it a 
hot path or not ? If it is I use bcl 20,31, if it is not I use bl". The best way to keep things in 
order is to always use the right instruction.





Signed-off-by: Christophe Leroy 
---
  arch/powerpc/kernel/misc_64.S | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 4b761a18a74d..613509907166 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -255,7 +255,7 @@ _GLOBAL(scom970_write)
   * Physical (hardware) cpu id should be in r3.
   */
  _GLOBAL(kexec_wait)
-   bl  1f
+   bcl 20,31,1f
  1:mflrr5


Would it be better to create a macro of some sort to wrap this unusual
special form so that the meaning is more clear?


Not sure, I think people working with assembly will easily recognise that form whereas an obscure 
macro is always puzzling.


I like macros when they allow you to not repeat again and again the same sequence of several 
instructions, but here it is a single quite simple instruction which is not worth a macro in my mind.


Christophe


Re: [PATCH] powerpc/32: Add support for out-of-line static calls

2021-08-31 Thread Peter Zijlstra
On Tue, Aug 31, 2021 at 08:05:21AM +, Christophe Leroy wrote:

> +#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \
> + asm(".pushsection .text, \"ax\" \n" \
> + ".align 4   \n" \
> + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> + STATIC_CALL_TRAMP_STR(name) ":  \n" \
> + "   blr \n" \
> + "   nop \n" \
> + "   nop \n" \
> + "   nop \n" \
> + ".type " STATIC_CALL_TRAMP_STR(name) ", @function   \n" \
> + ".size " STATIC_CALL_TRAMP_STR(name) ", . - " 
> STATIC_CALL_TRAMP_STR(name) " \n" \
> + ".popsection\n")

> +static int patch_trampoline_32(u32 *addr, unsigned long target)
> +{
> + int err;
> +
> + err = patch_instruction(addr++, ppc_inst(PPC_RAW_LIS(_R12, 
> PPC_HA(target;
> + err |= patch_instruction(addr++, ppc_inst(PPC_RAW_ADDI(_R12, _R12, 
> PPC_LO(target;
> + err |= patch_instruction(addr++, ppc_inst(PPC_RAW_MTCTR(_R12)));
> + err |= patch_instruction(addr, ppc_inst(PPC_RAW_BCTR()));
> +
> + return err;
> +}

There can be concurrent execution and modification; the above doesn't
look safe in that regard. What happens if you've say, done the first
two, but not the latter two and execution happens (on a different
CPU or through IRQ context, etc..)?

> +void arch_static_call_transform(void *site, void *tramp, void *func, bool 
> tail)
> +{
> + int err;
> + unsigned long target = (long)func;
> +
> + if (!tramp)
> + return;
> +
> + mutex_lock(&text_mutex);
> +
> + if (!func)
> + err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
> + else if (is_offset_in_branch_range((long)target - (long)tramp))
> + err = patch_branch(tramp, target, 0);

These two are single instruction modifications and I'm assuming the
hardware is sane enough that execution sees either the old or the new
instruction. So this should work.

> + else if (IS_ENABLED(CONFIG_PPC32))
> + err = patch_trampoline_32(tramp, target);
> + else
> + BUILD_BUG();
> +
> + mutex_unlock(&text_mutex);
> +
> + if (err)
> + panic("%s: patching failed %pS at %pS\n", __func__, func, 
> tramp);
> +}
> +EXPORT_SYMBOL_GPL(arch_static_call_transform);

One possible solution that we explored on ARM64, was having the
trampoline be in 2 slots:


b 1f

1:  blr
nop
nop
nop

2:  blr
nop
nop
nop

Where initially the first slot is active (per "b 1f"), then you write
the second slot, and as a final act, re-write the initial branch to
point to slot 2.

Then you execute synchronize_rcu_tasks() under your text mutex
(careful!) to ensure all users of your slot1 are gone and the next
modification repeats the whole thing, except for using slot1 etc..

Eventually I think Ard came up with the latest ARM64 proposal which puts
a literal in a RO section (could be the text section I suppose) and
loads and branches to that.

Anyway, the thing is, you can really only modify a single instruction at
the time and need to ensure concurrent execution is correct.


[PATCH] powerpc/machdep: Remove stale functions from ppc_md structure

2021-08-31 Thread Christophe Leroy
ppc_md.iommu_save() is not set anymore by any platform after
commit c40785ad305b ("powerpc/dart: Use a cachable DART").
So iommu_save() has become a nop and can be removed.

ppc_md.show_percpuinfo() is not set anymore by any platform after
commit 4350147a816b ("[PATCH] ppc64: SMU based macs cpufreq support").

Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were
removed by commit 0f03a43b8f0f ("[POWERPC] Remove todc code from
ARCH=powerpc")

Last user of kgdb_map_scc() was removed by commit 17ce452f7ea3 ("kgdb,
powerpc: arch specific powerpc kgdb support").

ppc.machine_kexec_prepare() has not been used since
commit 8ee3e0d69623 ("powerpc: Remove the main legacy iSerie platform
code"). This allows the removal of machine_kexec_prepare() and the
rename of default_machine_kexec_prepare() into machine_kexec_prepare()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/iommu.h   |  6 --
 arch/powerpc/include/asm/machdep.h | 13 -
 arch/powerpc/kernel/setup-common.c |  3 ---
 arch/powerpc/kernel/swsusp_64.c|  5 -
 arch/powerpc/kernel/swsusp_asm64.S |  1 -
 arch/powerpc/kexec/core.c  | 13 -
 arch/powerpc/kexec/core_32.c   |  2 +-
 arch/powerpc/kexec/core_64.c   |  2 +-
 8 files changed, 2 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index bf3b84128525..c361212ac160 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -280,12 +280,6 @@ extern void iommu_init_early_dart(struct 
pci_controller_ops *controller_ops);
 extern void iommu_init_early_pasemi(void);
 
 #if defined(CONFIG_PPC64) && defined(CONFIG_PM)
-static inline void iommu_save(void)
-{
-   if (ppc_md.iommu_save)
-   ppc_md.iommu_save();
-}
-
 static inline void iommu_restore(void)
 {
if (ppc_md.iommu_restore)
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 764f2732a821..a68311077d32 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -29,7 +29,6 @@ struct machdep_calls {
char*name;
 #ifdef CONFIG_PPC64
 #ifdef CONFIG_PM
-   void(*iommu_save)(void);
void(*iommu_restore)(void);
 #endif
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
@@ -43,7 +42,6 @@ struct machdep_calls {
void(*setup_arch)(void); /* Optional, may be NULL */
/* Optional, may be NULL. */
void(*show_cpuinfo)(struct seq_file *m);
-   void(*show_percpuinfo)(struct seq_file *m, int i);
/* Returns the current operating frequency of "cpu" in Hz */
unsigned long   (*get_proc_freq)(unsigned int cpu);
 
@@ -74,8 +72,6 @@ struct machdep_calls {
int (*set_rtc_time)(struct rtc_time *);
void(*get_rtc_time)(struct rtc_time *);
time64_t(*get_boot_time)(void);
-   unsigned char   (*rtc_read_val)(int addr);
-   void(*rtc_write_val)(int addr, unsigned char val);
 
void(*calibrate_decr)(void);
 
@@ -141,8 +137,6 @@ struct machdep_calls {
   May be NULL. */
void(*init)(void);
 
-   void(*kgdb_map_scc)(void);
-
/*
 * optional PCI "hooks"
 */
@@ -187,13 +181,6 @@ struct machdep_calls {
 #ifdef CONFIG_KEXEC_CORE
void (*kexec_cpu_down)(int crash_shutdown, int secondary);
 
-   /* Called to do what every setup is needed on image and the
-* reboot code buffer. Returns 0 on success.
-* Provide your own (maybe dummy) implementation if your platform
-* claims to support kexec.
-*/
-   int (*machine_kexec_prepare)(struct kimage *image);
-
/* Called to perform the _real_ kexec.
 * Do NOT allocate memory or fail here. We are past the point of
 * no return.
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index b1e43b69a559..0b7894eed58d 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -278,9 +278,6 @@ static int show_cpuinfo(struct seq_file *m, void *v)
seq_printf(m, "clock\t\t: %lu.%06luMHz\n",
   proc_freq / 100, proc_freq % 100);
 
-   if (ppc_md.show_percpuinfo != NULL)
-   ppc_md.show_percpuinfo(m, cpu_id);
-
/* If we are a Freescale core do a simple check so
 * we dont have to keep adding cases in the future */
if (PVR_VER(pvr) & 0x8000) {
diff --git a/arch/powerpc/kernel/swsusp_64.c b/arch/powerpc/kernel/swsusp_64.c
index aeea97ad85cf..16ee3baaf09a 100644
--- a/arch/powerpc/kernel/swsusp_64.c
+++ b/arch/powerpc/kernel/swsusp_64.c
@@ -17,8 +17,3 @@ void do_after_copyback(void)
touch_softlockup_watchdog();
mb();
 }
-
-void _iommu_save(void)
-{
-   iommu_save();
-}
diff --git a/arch/powerpc/kerne

[PATCH] powerpc/time: Remove generic_suspend_{dis/en}able_irqs()

2021-08-31 Thread Christophe Leroy
Commit d75d68cfef49 ("powerpc: Clean up obsolete code relating to
decrementer and timebase") made generic_suspend_enable_irqs() and
generic_suspend_disable_irqs() static.

Fold them into their only caller.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/time.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 934d8ae66cc6..cae8f03a44fe 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -631,8 +631,12 @@ void timer_broadcast_interrupt(void)
 #endif
 
 #ifdef CONFIG_SUSPEND
-static void generic_suspend_disable_irqs(void)
+/* Overrides the weak version in kernel/power/main.c */
+void arch_suspend_disable_irqs(void)
 {
+   if (ppc_md.suspend_disable_irqs)
+   ppc_md.suspend_disable_irqs();
+
/* Disable the decrementer, so that it doesn't interfere
 * with suspending.
 */
@@ -642,23 +646,11 @@ static void generic_suspend_disable_irqs(void)
set_dec(decrementer_max);
 }
 
-static void generic_suspend_enable_irqs(void)
-{
-   local_irq_enable();
-}
-
-/* Overrides the weak version in kernel/power/main.c */
-void arch_suspend_disable_irqs(void)
-{
-   if (ppc_md.suspend_disable_irqs)
-   ppc_md.suspend_disable_irqs();
-   generic_suspend_disable_irqs();
-}
-
 /* Overrides the weak version in kernel/power/main.c */
 void arch_suspend_enable_irqs(void)
 {
-   generic_suspend_enable_irqs();
+   local_irq_enable();
+
if (ppc_md.suspend_enable_irqs)
ppc_md.suspend_enable_irqs();
 }
-- 
2.25.0



[PATCH] powerpc/32: Add support for out-of-line static calls

2021-08-31 Thread Christophe Leroy
Add support for out-of-line static calls on PPC32. This change
improve performance of calls to global function pointers by
using direct calls instead of indirect calls.

The trampoline is initialy populated with a 'blr' and 3 'nop'.
Then, depending on the target distance, arch_static_call_transform()
will either replace the 'blr' by a direct 'bl ' or an
indirect jump throught CTR register via a lis/addi/mtctr/bctr sequence.

static_call selftest is running successfully with this change.

With this patch, __do_irq() has the following sequence to trace
irq entries:

c00049c0 <__SCT__tp_func_irq_entry>:
c00049c0:   48 00 00 70 b   c0004a30 <__traceiter_irq_entry>
c00049c4:   60 00 00 00 nop
c00049c8:   60 00 00 00 nop
c00049cc:   60 00 00 00 nop
...
c00055a4 <__do_irq>:
...
c00055b4:   7c 7f 1b 78 mr  r31,r3
...
c00055f0:   81 22 00 00 lwz r9,0(r2)
c00055f4:   39 29 00 01 addir9,r9,1
c00055f8:   91 22 00 00 stw r9,0(r2)
c00055fc:   3d 20 c0 af lis r9,-16209
c0005600:   81 29 74 cc lwz r9,29900(r9)
c0005604:   2c 09 00 00 cmpwi   r9,0
c0005608:   41 82 00 10 beq c0005618 <__do_irq+0x74>
c000560c:   80 69 00 04 lwz r3,4(r9)
c0005610:   7f e4 fb 78 mr  r4,r31
c0005614:   4b ff f3 ad bl  c00049c0 
<__SCT__tp_func_irq_entry>

Before this patch, __do_irq() was doing the following to trace irq
entries:

c0005700 <__do_irq>:
...
c0005710:   7c 7e 1b 78 mr  r30,r3
...
c000574c:   93 e1 00 0c stw r31,12(r1)
c0005750:   81 22 00 00 lwz r9,0(r2)
c0005754:   39 29 00 01 addir9,r9,1
c0005758:   91 22 00 00 stw r9,0(r2)
c000575c:   3d 20 c0 af lis r9,-16209
c0005760:   83 e9 f4 cc lwz r31,-2868(r9)
c0005764:   2c 1f 00 00 cmpwi   r31,0
c0005768:   41 82 00 24 beq c000578c <__do_irq+0x8c>
c000576c:   81 3f 00 00 lwz r9,0(r31)
c0005770:   80 7f 00 04 lwz r3,4(r31)
c0005774:   7d 29 03 a6 mtctr   r9
c0005778:   7f c4 f3 78 mr  r4,r30
c000577c:   4e 80 04 21 bctrl
c0005780:   85 3f 00 0c lwzur9,12(r31)
c0005784:   2c 09 00 00 cmpwi   r9,0
c0005788:   40 82 ff e4 bne c000576c <__do_irq+0x6c>

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/static_call.h | 31 +++
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kernel/static_call.c  | 43 ++
 4 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/static_call.h
 create mode 100644 arch/powerpc/kernel/static_call.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 36b72d972568..a0fe69d8ec83 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -247,6 +247,7 @@ config PPC
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_STACKPROTECTOR  if PPC64 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
+   select HAVE_STATIC_CALL if PPC32
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/static_call.h 
b/arch/powerpc/include/asm/static_call.h
new file mode 100644
index ..7cbefd845afc
--- /dev/null
+++ b/arch/powerpc/include/asm/static_call.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_STATIC_CALL_H
+#define _ASM_POWERPC_STATIC_CALL_H
+
+#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func)  \
+   asm(".pushsection .text, \"ax\" \n" \
+   ".align 4   \n" \
+   ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
+   STATIC_CALL_TRAMP_STR(name) ":  \n" \
+   "   b " #func " \n" \
+   "   nop \n" \
+   "   nop \n" \
+   "   nop \n" \
+   ".type " STATIC_CALL_TRAMP_STR(name) ", @function   \n" \
+   ".size " STATIC_CALL_TRAMP_STR(name) ", . - " 
STATIC_CALL_TRAMP_STR(name) " \n" \
+   ".popsection

Re: [PATCH v2 1/3] powerpc/smp: Fix a crash while booting kvm guest with nr_cpus=2

2021-08-31 Thread Gautham R Shenoy
On Thu, Aug 26, 2021 at 03:33:59PM +0530, Srikar Dronamraju wrote:
> Aneesh reported a crash with a fairly recent upstream kernel when
> booting kernel whose commandline was appended with nr_cpus=2
> 
> 1:mon> e
> cpu 0x1: Vector: 300 (Data Access) at [c8a67bd0]
> pc: c002557c: cpu_to_chip_id+0x3c/0x100
> lr: c0058380: start_secondary+0x460/0xb00
> sp: c8a67e70
>msr: 80001033
>dar: 10
>  dsisr: 8
>   current = 0xc891bb00
>   paca= 0xc018ff981f80   irqmask: 0x03   irq_happened: 0x01
> pid   = 0, comm = swapper/1
> Linux version 5.13.0-rc3-15704-ga050a6d2b7e8 (kvaneesh@ltc-boston8) (gcc 
> (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) 
> #433 SMP Tue May 25 02:38:49 CDT 2021
> 1:mon> t
> [link register   ] c0058380 start_secondary+0x460/0xb00
> [c8a67e70] c8a67eb0 (unreliable)
> [c8a67eb0] c00589d4 start_secondary+0xab4/0xb00
> [c8a67f90] c000c654 start_secondary_prolog+0x10/0x14
> 
> Current code assumes that num_possible_cpus() is always greater than
> threads_per_core. However this may not be true when using nr_cpus=2 or
> similar options. Handle the case where num_possible_cpus() is not an
> exact multiple of  threads_per_core.
> 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Aneesh Kumar K.V 
> Cc: Nathan Lynch 
> Cc: Michael Ellerman 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> Cc: Gautham R Shenoy 
> Cc: Vincent Guittot 
> Fixes: c1e53367dab1 ("powerpc/smp: Cache CPU to chip lookup")
> Reported-by: Aneesh Kumar K.V 
> Debugged-by: Michael Ellerman 
> Signed-off-by: Srikar Dronamraju 


This looks good to me.
Reviewed-by: Gautham R. Shenoy 

> ---
> Changelog: v1 -> v2:
> v1: - 
> https://lore.kernel.org/linuxppc-dev/20210821092419.167454-2-sri...@linux.vnet.ibm.com/t/#u
> Handled comment from Gautham Shenoy
> [ Updated to use DIV_ROUND_UP instead of max to handle more situations ]
> 
>  arch/powerpc/kernel/smp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 6c6e4d934d86..bf11b3c4eb28 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1074,7 +1074,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>   }
> 
>   if (cpu_to_chip_id(boot_cpuid) != -1) {
> - int idx = num_possible_cpus() / threads_per_core;
> + int idx = DIV_ROUND_UP(num_possible_cpus(), threads_per_core);
> 
>   /*
>* All threads of a core will all belong to the same core,
> -- 
> 2.18.2
>