Re: [5.16.0] build error on unrecognized opcode: ptesync

2022-01-10 Thread Christophe Leroy


Le 10/01/2022 à 13:32, Mike a écrit :
> Hey, so I originally sat down to compile the fast headers V2 patch, but 
> quickly discovered other things at play, and grabbed 5.16.0 a few hours 
> after it lifted off,  arch/powerpc/mm/mmu_context.c I had to 
> specifically say had to include -maltivec or it barfed on a 'dssall', 
> I'm fine with that, I've spent years in kernel land, I can deal with 
> that, then came arch/powerpc/lib/step.c with the ptesync. This seems 
> like a totally normal instruction that shouldn't need any extra flags or 
> anything, yet the assembler throws up, and no flag I can think of fixes 
> it. This is a G4 7447. I reverted back to the Debian 5.15. defconfig 
> before dropping this mail as I had tweaked my config to be more G4.
> 

Hi Mike,

Can you provide a bit more details about your setup and config ?

Are you using GCC or LLVM ?
What version of GCC and BINUTILS or what version of LLVM ?

What is DEBIAN defconfig ? Does it correspond to one of the standard 
mainline kernel defconfigs ? If not can you provide it ?

Thanks
Christophe

Re: [PATCH v2 1/2] powerpc: Fix virt_addr_valid() check

2022-01-10 Thread Christophe Leroy


Le 11/01/2022 à 05:37, Nicholas Piggin a écrit :
> Excerpts from Kefeng Wang's message of January 8, 2022 9:58 pm:
>> Hi PPC maintainers, ping..
> 
> Hmm. I might have confused myself about this. I'm going back and
> trying to work out what I was thinking when I suggested it. This
> works on 64e because vmalloc space is below the kernel linear map,
> right?
> 
> On 64s it is the other way around and it is still possible to enable
> flatmem on 64s. Altough we might just not hit the problem there because
> __pa() will not mask away the vmalloc offset for 64s so it will still
> return something that's outside the pfn_valid range for flatmem. That's
> very subtle though.

That's the way it works on PPC32 at least, so for me it's not chocking 
to have it work the same way on PPC64s.

The main issue here is the way __pa() works. On PPC32 __pa = va - 
PAGE_OFFSET, so it works correctly for any address.
On PPC64, __pa() works by masking out the 2 top bits instead of 
substracting PAGE_OFFSET, so the test must add a verification that we 
really have the 2 top bits set at first. This is what (addr >= 
PAGE_OFFSET) does. Once this first test is done, we can perfectly rely 
on pfn_valid() just like PPC32, I see absolutely no point in an 
additionnal test checking the addr is below KERN_VIRT_START.


> 
> The checks added to __pa actually don't prevent vmalloc memory from
> being passed to it either on 64s, only a more basic test.

That's correct. It is the role of pfn_valid() to check that.

Christophe

> 
> I think 64s wants (addr >= PAGE_OFFSET && addr < KERN_VIRT_START) as
> the condition.  Could possibly add that check to __pa as well to
> catch vmalloc addresses.
> 
> Thanks,
> Nick
> 
>>
>> On 2021/12/25 20:06, Kefeng Wang wrote:
>>> When run ethtool eth0, the BUG occurred,
>>>
>>> usercopy: Kernel memory exposure attempt detected from SLUB object not 
>>> in SLUB page?! (offset 0, size 1048)!
>>> kernel BUG at mm/usercopy.c:99
>>> ...
>>> usercopy_abort+0x64/0xa0 (unreliable)
>>> __check_heap_object+0x168/0x190
>>> __check_object_size+0x1a0/0x200
>>> dev_ethtool+0x2494/0x2b20
>>> dev_ioctl+0x5d0/0x770
>>> sock_do_ioctl+0xf0/0x1d0
>>> sock_ioctl+0x3ec/0x5a0
>>> __se_sys_ioctl+0xf0/0x160
>>> system_call_exception+0xfc/0x1f0
>>> system_call_common+0xf8/0x200
>>>
>>> The code shows below,
>>>
>>> data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
>>> copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
>>>
>>> The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
>>> on PowerPC64, which leads to the panic.
>>>
>>> As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
>>> and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
>>> the virt_addr_valid().
>>>
>>> Signed-off-by: Kefeng Wang 
>>> ---
>>>arch/powerpc/include/asm/page.h | 5 -
>>>1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/page.h 
>>> b/arch/powerpc/include/asm/page.h
>>> index 254687258f42..300d4c105a3a 100644
>>> --- a/arch/powerpc/include/asm/page.h
>>> +++ b/arch/powerpc/include/asm/page.h
>>> @@ -132,7 +132,10 @@ static inline bool pfn_valid(unsigned long pfn)
>>>#define virt_to_page(kaddr)  pfn_to_page(virt_to_pfn(kaddr))
>>>#define pfn_to_kaddr(pfn)__va((pfn) << PAGE_SHIFT)
>>>
>>> -#define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr))
>>> +#define virt_addr_valid(vaddr) ({  
>>> \
>>> +   unsigned long _addr = (unsigned long)vaddr; 
>>> \
>>> +   (unsigned long)(_addr) >= PAGE_OFFSET && pfn_valid(virt_to_pfn(_addr)); 
>>> \
>>> +})
>>>
>>>/*
>>> * On Book-E parts we need __va to parse the device tree and we can't
>>

Re: [PATCH v2 1/2] powerpc: Fix virt_addr_valid() check

2022-01-10 Thread Nicholas Piggin
Excerpts from Kefeng Wang's message of January 8, 2022 9:58 pm:
> Hi PPC maintainers, ping..

Hmm. I might have confused myself about this. I'm going back and
trying to work out what I was thinking when I suggested it. This
works on 64e because vmalloc space is below the kernel linear map,
right?

On 64s it is the other way around and it is still possible to enable 
flatmem on 64s. Altough we might just not hit the problem there because 
__pa() will not mask away the vmalloc offset for 64s so it will still 
return something that's outside the pfn_valid range for flatmem. That's 
very subtle though.

The checks added to __pa actually don't prevent vmalloc memory from
being passed to it either on 64s, only a more basic test.

I think 64s wants (addr >= PAGE_OFFSET && addr < KERN_VIRT_START) as
the condition.  Could possibly add that check to __pa as well to
catch vmalloc addresses.

Thanks,
Nick

> 
> On 2021/12/25 20:06, Kefeng Wang wrote:
>> When run ethtool eth0, the BUG occurred,
>>
>>usercopy: Kernel memory exposure attempt detected from SLUB object not in 
>> SLUB page?! (offset 0, size 1048)!
>>kernel BUG at mm/usercopy.c:99
>>...
>>usercopy_abort+0x64/0xa0 (unreliable)
>>__check_heap_object+0x168/0x190
>>__check_object_size+0x1a0/0x200
>>dev_ethtool+0x2494/0x2b20
>>dev_ioctl+0x5d0/0x770
>>sock_do_ioctl+0xf0/0x1d0
>>sock_ioctl+0x3ec/0x5a0
>>__se_sys_ioctl+0xf0/0x160
>>system_call_exception+0xfc/0x1f0
>>system_call_common+0xf8/0x200
>>
>> The code shows below,
>>
>>data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
>>copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
>>
>> The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
>> on PowerPC64, which leads to the panic.
>>
>> As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
>> and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
>> the virt_addr_valid().
>>
>> Signed-off-by: Kefeng Wang 
>> ---
>>   arch/powerpc/include/asm/page.h | 5 -
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/page.h 
>> b/arch/powerpc/include/asm/page.h
>> index 254687258f42..300d4c105a3a 100644
>> --- a/arch/powerpc/include/asm/page.h
>> +++ b/arch/powerpc/include/asm/page.h
>> @@ -132,7 +132,10 @@ static inline bool pfn_valid(unsigned long pfn)
>>   #define virt_to_page(kaddr)pfn_to_page(virt_to_pfn(kaddr))
>>   #define pfn_to_kaddr(pfn)  __va((pfn) << PAGE_SHIFT)
>>   
>> -#define virt_addr_valid(kaddr)  pfn_valid(virt_to_pfn(kaddr))
>> +#define virt_addr_valid(vaddr)  ({  
>> \
>> +unsigned long _addr = (unsigned long)vaddr; 
>> \
>> +(unsigned long)(_addr) >= PAGE_OFFSET && pfn_valid(virt_to_pfn(_addr)); 
>> \
>> +})
>>   
>>   /*
>>* On Book-E parts we need __va to parse the device tree and we can't
> 


Re: [PATCH v3 5/6] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-10 Thread Nicholas Piggin
Excerpts from Alexey Kardashevskiy's message of January 11, 2022 9:51 am:
> 
> 
> On 1/10/22 18:36, Nicholas Piggin wrote:
>> Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am:
>>> If MMIO emulation fails we don't want to crash the whole guest by
>>> returning to userspace.
>>>
>>> The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
>>> implementation") added a todo:
>>>
>>>/* XXX Deliver Program interrupt to guest. */
>>>
>>> and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
>>> emulation from priv emulation") added the Program interrupt injection
>>> but in another file, so I'm assuming it was missed that this block
>>> needed to be altered.
>>>
>>> Signed-off-by: Fabiano Rosas 
>>> Reviewed-by: Alexey Kardashevskiy 
>>> ---
>>>   arch/powerpc/kvm/powerpc.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>>> index 6daeea4a7de1..56b0faab7a5f 100644
>>> --- a/arch/powerpc/kvm/powerpc.c
>>> +++ b/arch/powerpc/kvm/powerpc.c
>>> @@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
>>> kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
>>> kvmppc_core_queue_program(vcpu, 0);
>>> pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
>>> -   r = RESUME_HOST;
>>> +   r = RESUME_GUEST;
>> 
>> So at this point can the pr_info just go away?
>> 
>> I wonder if this shouldn't be a DSI rather than a program check.
>> DSI with DSISR[37] looks a bit more expected. Not that Linux
>> probably does much with it but at least it would give a SIGBUS
>> rather than SIGILL.
> 
> It does not like it is more expected to me, it is not about wrong memory 
> attributes, it is the instruction itself which cannot execute.

It's not an illegal instruction though, it can't execute because of the
nature of the data / address it is operating on. That says d-side to me.

DSISR[37] isn't perfect but if you squint it's not terrible. It's about
certain instructions that have restrictions operating on other than
normal cacheable mappings.

Thanks,
Nick


> 
> DSISR[37]:
> Set to 1 if the access is due to a lq, stq, lwat, ldat, lbarx, lharx, 
> lwarx, ldarx, lqarx, stwat,
> stdat, stbcx., sthcx., stwcx., stdcx., or stqcx. instruction that 
> addresses storage that is Write
> Through Required or Caching Inhibited; or if the access is due to a copy 
> or paste. instruction
> that addresses storage that is Caching Inhibited; or if the access is 
> due to a lwat, ldat, stwat, or
> stdat instruction that addresses storage that is Guarded; otherwise set 
> to 0.
> 


Re: [PATCH] powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32

2022-01-10 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of January 11, 2022 1:29 am:
> CC  arch/powerpc/kernel/time.o
>   In file included from :
>   ./arch/powerpc/include/asm/hw_irq.h: In function 'do_hard_irq_enable':
>   ././include/linux/compiler_types.h:335:45: error: call to 
> '__compiletime_assert_35' declared with attribute error: BUILD_BUG failed
> 335 | _compiletime_assert(condition, msg, 
> __compiletime_assert_, __COUNTER__)
> | ^
>   ././include/linux/compiler_types.h:316:25: note: in definition of macro 
> '__compiletime_assert'
> 316 | prefix ## suffix(); 
> \
> | ^~
>   ././include/linux/compiler_types.h:335:9: note: in expansion of macro 
> '_compiletime_assert'
> 335 | _compiletime_assert(condition, msg, 
> __compiletime_assert_, __COUNTER__)
> | ^~~
>   ./include/linux/build_bug.h:39:37: note: in expansion of macro 
> 'compiletime_assert'
>  39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), 
> msg)
> | ^~
>   ./include/linux/build_bug.h:59:21: note: in expansion of macro 
> 'BUILD_BUG_ON_MSG'
>  59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> | ^~~~
>   ./arch/powerpc/include/asm/hw_irq.h:483:9: note: in expansion of macro 
> 'BUILD_BUG'
> 483 | BUILD_BUG();
> | ^
> 
> should_hard_irq_enable() returns false on PPC32 so this BUILD_BUG() shouldn't 
> trigger.
> 
> Force inlining of should_hard_irq_enable()
> 
> Signed-off-by: Christophe Leroy 
> Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq 
> handlers unless perf is in use")
> Cc: Nicholas Piggin 

Acked-by: Nicholas Piggin 

Thanks,
Nick

> ---
>  arch/powerpc/include/asm/hw_irq.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h 
> b/arch/powerpc/include/asm/hw_irq.h
> index a58fb4aa6c81..674e5aaafcbd 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -473,7 +473,7 @@ static inline bool arch_irq_disabled_regs(struct pt_regs 
> *regs)
>   return !(regs->msr & MSR_EE);
>  }
>  
> -static inline bool should_hard_irq_enable(void)
> +static __always_inline bool should_hard_irq_enable(void)
>  {
>   return false;
>  }
> -- 
> 2.33.1
> 


Re: [PATCH 3/6] KVM: Remove opaque from kvm_arch_check_processor_compat

2022-01-10 Thread Chao Gao
On Mon, Jan 10, 2022 at 11:06:44PM +, Sean Christopherson wrote:
>On Mon, Dec 27, 2021, Chao Gao wrote:
>> No arch implementation uses this opaque now.
>
>Except for the RISC-V part, this can be a pure revert of commit b99040853738 
>("KVM:
>Pass kvm_init()'s opaque param to additional arch funcs").  I think it makes 
>sense
>to process it as a revert, with a short blurb in the changelog to note that 
>RISC-V
>is manually modified as RISC-V support came along in the interim.

commit b99040853738 adds opaque param to kvm_arch_hardware_setup(), which isn't
reverted in this patch. I.e., this patch is a partial revert of b99040853738
plus manual changes to RISC-V. Given that, "process it as a revert" means
clearly say in changelog that this commit contains a partial revert of commit
b99040853738 ("KVM: Pass kvm_init()'s opaque param to additional arch funcs").

Right?


Re: [PATCH V2 03/17] asm-generic: fcntl: compat: Remove duplicate definitions

2022-01-10 Thread Guo Ren
On Mon, Jan 10, 2022 at 9:35 PM Arnd Bergmann  wrote:
>
> On Tue, Dec 28, 2021 at 3:39 PM  wrote:
> >
> > From: Guo Ren 
> >
> > Remove duplicate F_GETLK64,F_SETLK64,F_SETLKW64 definitions in
> > arch/*/include/asm/compat.h.
> >
> > Signed-off-by: Guo Ren 
> > Signed-off-by: Guo Ren 
> > Cc: Arnd Bergmann 
>
> Unfortunately, this one does not look correct to me:
>
> > @@ -116,7 +116,7 @@
> >  #define F_GETSIG   11  /* for sockets. */
> >  #endif
> >
> > -#ifndef CONFIG_64BIT
> > +#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
> >  #ifndef F_GETLK64
> >  #define F_GETLK64  12  /*  using 'struct flock64' */
> >  #define F_SETLK64  13
>
> The problem here is that include/uapi/ headers cannot contain checks for
> CONFIG_* symbols because those may have different meanings in user space
> compared to kernel.
>
> This is a preexisting problem in the header, but I think the change
> makes it worse.
>
> With the current behavior, user space will always see the definitions,
> unless it happens to have its own definition for CONFIG_64BIT already.
> On 64-bit parisc, this has the effect of defining the macros to the
> same values as F_SETOWN/F_SETSIG/F_GETSIG, which is potentially
> harmful. On MIPS, it uses values that are different from the 32-bit numbers
> but are otherwise unused. Everywhere else, we get the definition from
> the 32-bit architecture in user space, which will do nothing in the kernel.
>
> The correct check for a uapi header would be to test for
> __BITS_PER_LONG==32. We should probably do that here, but
> this won't help you move the definitions, and it is a user-visible change
> as the incorrect definition will no longer be visible. [Adding Jeff and Bruce
> (the flock mainainers) to Cc for additional feedback on this]
>
> For your series, I would suggest just moving the macro definitions to
> include/linux/compat.h along with the 'struct compat_flock64'
> definition, and leaving the duplicate one in the uapi header unchanged
> until we have decided on a solution.
Okay.

>
> Arnd



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH V2 11/17] riscv: compat: Add elf.h implementation

2022-01-10 Thread Guo Ren
On Mon, Jan 10, 2022 at 10:29 PM Arnd Bergmann  wrote:
>
> On Tue, Dec 28, 2021 at 3:39 PM  wrote:
> >
> > From: Guo Ren 
> >
> > Implement necessary type and macro for compat elf. See the code
> > comment for detail.
> >
> > Signed-off-by: Guo Ren 
> > Signed-off-by: Guo Ren 
> > Cc: Arnd Bergmann 
>
> This looks mostly correct,
>
> > +/*
> > + * FIXME: not sure SET_PERSONALITY for compat process is right!
> > + */
> > +#define SET_PERSONALITY(ex)   \
> > +do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32) \
> > +   set_thread_flag(TIF_32BIT);\
> > +   else   \
> > +   clear_thread_flag(TIF_32BIT);  \
> > +   set_personality(PER_LINUX | (current->personality & (~PER_MASK))); \
> > +} while (0)
>
> This means the personality after exec is always set to PER_LINUX, not
> PER_LINUX32, which I think is wrong: you want the PER_LINUX32
> setting to stick, just like the upper bits do in the default implementation.
>
> What the other ones do is:
>
> | arch/parisc/include/asm/elf.h-
> set_personality((current->personality & ~PER_MASK) | PER_LINUX); \
>
> This looks like the same problem you introduce here: always forcing PER_LINUX
> instead of PER_LINUX32 makes it impossible to use PER_LINUX32.
>
> | arch/alpha/include/asm/elf.h:#define SET_PERSONALITY(EX)
>\
> | arch/alpha/include/asm/elf.h-   set_personality(((EX).e_flags &
> EF_ALPHA_32BIT) \
> | arch/alpha/include/asm/elf.h-  ? PER_LINUX_32BIT : PER_LINUX)
> | arch/csky/include/asm/elf.h:#define SET_PERSONALITY(ex)
> set_personality(PER_LINUX)
> | arch/nds32/include/asm/elf.h:#define SET_PERSONALITY(ex)
> set_personality(PER_LINUX)
>
> These look even worse: instead of forcing the lower bits to
> PER_LINUX/PER_LINUX32 and
> leaving the upper bits untouched, these also clear the upper bits
> unconditionally.
>
> | arch/arm64/include/asm/elf.h:#define SET_PERSONALITY(ex)
>\
> | arch/arm64/include/asm/elf.h-   current->personality &=
> ~READ_IMPLIES_EXEC; \
> | arch/x86/um/asm/elf.h:#define SET_PERSONALITY(ex) do {} while(0)
> | arch/x86/include/asm/elf.h:#define set_personality_64bit()  do {
> } while (0)
> | arch/x86/kernel/process_64.c:static void __set_personality_ia32(void)
> | current->personality |= force_personality32;
>
> Inconsistent: does not enforce PER_LINUX/PER_LINUX32 as the default
> implementation
> does, but just leaves the value untouched (other than (re)setting
> READ_IMPLIES_EXEC).
> I think this is harmless otherwise, as we generally ignore the lower
> bits, except for the
> bit of code that checks for PER_LINUX32 in override_architecture() to mangle 
> the
> output of sys_newuname() or in /proc/cpuinfo.
>
> | arch/s390/include/asm/elf.h-if
> (personality(current->personality) != PER_LINUX32)   \
> | arch/s390/include/asm/elf.h-set_personality(PER_LINUX |
>\
> | arch/s390/include/asm/elf.h-
> (current->personality & ~PER_MASK));\
> | arch/powerpc/include/asm/elf.h- if
> (personality(current->personality) != PER_LINUX32)   \
> | arch/powerpc/include/asm/elf.h- set_personality(PER_LINUX |
>\
> | arch/powerpc/include/asm/elf.h-
> (current->personality & (~PER_MASK)));  \
> | arch/sparc/include/asm/elf_64.h-if
> (personality(current->personality) != PER_LINUX32)   \
> | arch/sparc/include/asm/elf_64.h-
> set_personality(PER_LINUX | \
> | arch/sparc/include/asm/elf_64.h-
> (current->personality & (~PER_MASK)));  \
>
> This is probably the behavior you want to copy.
Thank you very much for your detailed explanation. Here is my modification.

+#define SET_PERSONALITY(ex)\
+do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32)  \
+   set_thread_flag(TIF_32BIT); \
+   else\
+   clear_thread_flag(TIF_32BIT);   \
+   if (personality(current->personality) != PER_LINUX32)   \
+   set_personality(PER_LINUX | \
+   (current->personality & (~PER_MASK)));  \
+} while (0)

>
>   Arnd




--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


[PATCH kernel v5] KVM: PPC: Merge powerpc's debugfs entry content into generic entry

2022-01-10 Thread Alexey Kardashevskiy
At the moment KVM on PPC creates 4 types of entries under the kvm debugfs:
1) "%pid-%fd" per a KVM instance (for all platforms);
2) "vm%pid" (for PPC Book3s HV KVM);
3) "vm%u_vcpu%u_timing" (for PPC Book3e KVM);
4) "kvm-xive-%p" (for XIVE PPC Book3s KVM, the same for XICS);

The problem with this is that multiple VMs per process is not allowed for
2) and 3) which makes it possible for userspace to trigger errors when
creating duplicated debugfs entries.

This merges all these into 1).

This defines kvm_arch_create_kvm_debugfs() similar to
kvm_arch_create_vcpu_debugfs().

This defines 2 hooks in kvmppc_ops that allow specific KVM implementations
add necessary entries, this adds the _e500 suffix to
kvmppc_create_vcpu_debugfs_e500() to make it clear what platform it is for.

This makes use of already existing kvm_arch_create_vcpu_debugfs() on PPC.

This removes no more used debugfs_dir pointers from PPC kvm_arch structs.

This stops removing vcpu entries as once created vcpus stay around
for the entire life of a VM and removed when the KVM instance is closed,
see commit d56f5136b010 ("KVM: let kvm_destroy_vm_debugfs clean up vCPU
debugfs directories").

Suggested-by: Fabiano Rosas 
Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v5:
* fixed e500mc2

v4:
* added "kvm-xive-%p"

v3:
* reworked commit log, especially, the bit about removing vcpus

v2:
* handled powerpc-booke
* s/kvm/vm/ in arch hooks
---
 arch/powerpc/include/asm/kvm_host.h|  6 ++---
 arch/powerpc/include/asm/kvm_ppc.h |  2 ++
 arch/powerpc/kvm/timing.h  | 12 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c|  2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
 arch/powerpc/kvm/book3s_hv.c   | 31 ++
 arch/powerpc/kvm/book3s_xics.c | 13 ++-
 arch/powerpc/kvm/book3s_xive.c | 13 ++-
 arch/powerpc/kvm/book3s_xive_native.c  | 13 ++-
 arch/powerpc/kvm/e500.c|  1 +
 arch/powerpc/kvm/e500mc.c  |  1 +
 arch/powerpc/kvm/powerpc.c | 16 ++---
 arch/powerpc/kvm/timing.c  | 21 +
 13 files changed, 51 insertions(+), 82 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 17263276189e..f5e14fa683f4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -26,6 +26,8 @@
 #include 
 #include 
 
+#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
+
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
 
@@ -295,7 +297,6 @@ struct kvm_arch {
bool dawr1_enabled;
pgd_t *pgtable;
u64 process_table;
-   struct dentry *debugfs_dir;
struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
@@ -673,7 +674,6 @@ struct kvm_vcpu_arch {
u64 timing_min_duration[__NUMBER_OF_KVM_EXIT_TYPES];
u64 timing_max_duration[__NUMBER_OF_KVM_EXIT_TYPES];
u64 timing_last_exit;
-   struct dentry *debugfs_exit_timing;
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S
@@ -829,8 +829,6 @@ struct kvm_vcpu_arch {
struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
struct kvmhv_tb_accumulator guest_time; /* guest execution */
struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
-
-   struct dentry *debugfs_dir;
 #endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 33db83b82fbd..d2b192dea0d2 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -316,6 +316,8 @@ struct kvmppc_ops {
int (*svm_off)(struct kvm *kvm);
int (*enable_dawr1)(struct kvm *kvm);
bool (*hash_v3_possible)(void);
+   int (*create_vm_debugfs)(struct kvm *kvm);
+   int (*create_vcpu_debugfs)(struct kvm_vcpu *vcpu, struct dentry 
*debugfs_dentry);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/kvm/timing.h b/arch/powerpc/kvm/timing.h
index feef7885ba82..45817ab82bb4 100644
--- a/arch/powerpc/kvm/timing.h
+++ b/arch/powerpc/kvm/timing.h
@@ -14,8 +14,8 @@
 #ifdef CONFIG_KVM_EXIT_TIMING
 void kvmppc_init_timing_stats(struct kvm_vcpu *vcpu);
 void kvmppc_update_timing_stats(struct kvm_vcpu *vcpu);
-void kvmppc_create_vcpu_debugfs(struct kvm_vcpu *vcpu, unsigned int id);
-void kvmppc_remove_vcpu_debugfs(struct kvm_vcpu *vcpu);
+int kvmppc_create_vcpu_debugfs_e500(struct kvm_vcpu *vcpu,
+   struct dentry *debugfs_dentry);
 
 static inline void kvmppc_set_exit_type(struct kvm_vcpu *vcpu, int type)
 {
@@ -26,9 +26,11 @@ static inline void kvmppc_set_exit_type(struct kvm_vcpu 
*vcpu, int type)
 /* if exit timing is not configured there is no need to build the c file */
 static inline void kvmppc_init_timing_stats(struct kvm_vcpu *vcpu) {}
 static inline void 

Re: [PATCH v3 5/6] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-10 Thread Alexey Kardashevskiy




On 1/10/22 18:36, Nicholas Piggin wrote:

Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am:

If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

   /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Alexey Kardashevskiy 
---
  arch/powerpc/kvm/powerpc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6daeea4a7de1..56b0faab7a5f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
-   r = RESUME_HOST;
+   r = RESUME_GUEST;


So at this point can the pr_info just go away?

I wonder if this shouldn't be a DSI rather than a program check.
DSI with DSISR[37] looks a bit more expected. Not that Linux
probably does much with it but at least it would give a SIGBUS
rather than SIGILL.


It does not like it is more expected to me, it is not about wrong memory 
attributes, it is the instruction itself which cannot execute.


DSISR[37]:
Set to 1 if the access is due to a lq, stq, lwat, ldat, lbarx, lharx, 
lwarx, ldarx, lqarx, stwat,
stdat, stbcx., sthcx., stwcx., stdcx., or stqcx. instruction that 
addresses storage that is Write
Through Required or Caching Inhibited; or if the access is due to a copy 
or paste. instruction
that addresses storage that is Caching Inhibited; or if the access is 
due to a lwat, ldat, stwat, or
stdat instruction that addresses storage that is Guarded; otherwise set 
to 0.


[5.16.0] build error on unrecognized opcode: ptesync

2022-01-10 Thread Mike
Hey, so I originally sat down to compile the fast headers V2 patch, but
quickly discovered other things at play, and grabbed 5.16.0 a few hours
after it lifted off,  arch/powerpc/mm/mmu_context.c I had to specifically
say had to include -maltivec or it barfed on a 'dssall', I'm fine with
that, I've spent years in kernel land, I can deal with that, then came
arch/powerpc/lib/step.c with the ptesync. This seems like a totally normal
instruction that shouldn't need any extra flags or anything, yet the
assembler throws up, and no flag I can think of fixes it. This is a G4
7447. I reverted back to the Debian 5.15. defconfig before dropping this
mail as I had tweaked my config to be more G4.

Best regards.
Michael Heltne


Re: [PATCH 00/16] Remove usage of the deprecated "pci-dma-compat.h" API

2022-01-10 Thread Martin K. Petersen


Christophe,

> This serie axes all the remaining usages of the deprecated
> "pci-dma-compat.h" API.

Applied patches 10-15 to 5.17/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2 03/17] asm-generic: fcntl: compat: Remove duplicate definitions

2022-01-10 Thread Christoph Hellwig
On Mon, Jan 10, 2022 at 02:35:19PM +0100, Arnd Bergmann wrote:
> > +#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
> >  #ifndef F_GETLK64
> >  #define F_GETLK64  12  /*  using 'struct flock64' */
> >  #define F_SETLK64  13
> 
> The problem here is that include/uapi/ headers cannot contain checks for
> CONFIG_* symbols because those may have different meanings in user space
> compared to kernel.
> 
> This is a preexisting problem in the header, but I think the change
> makes it worse.

FYI, this is what I did in my old branch, which also sidesteps the
duplicate value problem on parisc. The rebase is untested so far,
but I can spend some cycles on finishing it:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/fcntl-asm-generic-cleanup


Re: [PATCH V2 03/17] asm-generic: fcntl: compat: Remove duplicate definitions

2022-01-10 Thread Christophe Leroy


Le 28/12/2021 à 15:39, guo...@kernel.org a écrit :
> From: Guo Ren 
> 
> Remove duplicate F_GETLK64,F_SETLK64,F_SETLKW64 definitions in
> arch/*/include/asm/compat.h.
> 
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> Cc: Arnd Bergmann 
> ---
>   arch/arm64/include/asm/compat.h   | 4 
>   arch/powerpc/include/asm/compat.h | 4 
>   arch/s390/include/asm/compat.h| 4 
>   arch/sparc/include/asm/compat.h   | 4 
>   arch/x86/include/asm/compat.h | 4 
>   include/uapi/asm-generic/fcntl.h  | 2 +-
>   6 files changed, 1 insertion(+), 21 deletions(-)
> 

...

> diff --git a/include/uapi/asm-generic/fcntl.h 
> b/include/uapi/asm-generic/fcntl.h
> index ecd0f5bdfc1d..5bc1e51d73b1 100644
> --- a/include/uapi/asm-generic/fcntl.h
> +++ b/include/uapi/asm-generic/fcntl.h
> @@ -116,7 +116,7 @@
>   #define F_GETSIG11  /* for sockets. */
>   #endif
>   
> -#ifndef CONFIG_64BIT
> +#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
>   #ifndef F_GETLK64
>   #define F_GETLK64   12  /*  using 'struct flock64' */
>   #define F_SETLK64   13

There seems to be a problem with this change:

error: /linux/include/uapi/asm-generic/fcntl.h: leak CONFIG_COMPAT to 
user-space
make[3]: *** [/linux/scripts/Makefile.headersinst:63: 
usr/include/asm-generic/fcntl.h] Error 1
make[3]: *** Waiting for unfinished jobs
make[2]: *** [/linux/Makefile:1283: headers] Error 2
make[1]: *** [Makefile:219: __sub-make] Error 2
make[2]: Leaving directory '/output'
make[1]: Leaving directory '/linux'
make: *** [Makefile:157: khdr] Error 2
make: Leaving directory '/linux/tools/testing/selftests'
## Selftest build completed rc = 2
## Found 2 binaries

!! Error build failed rc 2

Error: Process completed with exit code 2.

[PATCH] powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32

2022-01-10 Thread Christophe Leroy
  CC  arch/powerpc/kernel/time.o
In file included from :
./arch/powerpc/include/asm/hw_irq.h: In function 'do_hard_irq_enable':
././include/linux/compiler_types.h:335:45: error: call to 
'__compiletime_assert_35' declared with attribute error: BUILD_BUG failed
  335 | _compiletime_assert(condition, msg, 
__compiletime_assert_, __COUNTER__)
  | ^
././include/linux/compiler_types.h:316:25: note: in definition of macro 
'__compiletime_assert'
  316 | prefix ## suffix(); 
\
  | ^~
././include/linux/compiler_types.h:335:9: note: in expansion of macro 
'_compiletime_assert'
  335 | _compiletime_assert(condition, msg, 
__compiletime_assert_, __COUNTER__)
  | ^~~
./include/linux/build_bug.h:39:37: note: in expansion of macro 
'compiletime_assert'
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), 
msg)
  | ^~
./include/linux/build_bug.h:59:21: note: in expansion of macro 
'BUILD_BUG_ON_MSG'
   59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
  | ^~~~
./arch/powerpc/include/asm/hw_irq.h:483:9: note: in expansion of macro 
'BUILD_BUG'
  483 | BUILD_BUG();
  | ^

should_hard_irq_enable() returns false on PPC32 so this BUILD_BUG() shouldn't 
trigger.

Force inlining of should_hard_irq_enable()

Signed-off-by: Christophe Leroy 
Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq 
handlers unless perf is in use")
Cc: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index a58fb4aa6c81..674e5aaafcbd 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -473,7 +473,7 @@ static inline bool arch_irq_disabled_regs(struct pt_regs 
*regs)
return !(regs->msr & MSR_EE);
 }
 
-static inline bool should_hard_irq_enable(void)
+static __always_inline bool should_hard_irq_enable(void)
 {
return false;
 }
-- 
2.33.1


[PATCH v3] powerpc/32s: Fix kasan_init_region() for KASAN

2022-01-10 Thread Christophe Leroy
It has been reported some configuration where the kernel doesn't
boot with KASAN enabled.

This is due to wrong BAT allocation for the KASAN area:

---[ Data Block Address Translation ]---
0: 0xc000-0xcfff 0x   256M Kernel rw  m
1: 0xd000-0xdfff 0x1000   256M Kernel rw  m
2: 0xe000-0xefff 0x2000   256M Kernel rw  m
3: 0xf800-0xf9ff 0x2a0032M Kernel rw  m
4: 0xfa00-0xfdff 0x2c0064M Kernel rw  m

A BAT must have both virtual and physical addresses alignment matching
the size of the BAT. This is not the case for BAT 4 above.

Fix kasan_init_region() by using block_size() function that is in
book3s32/mmu.c. To be able to reuse it here, make it non static and
change its name to bat_block_size() in order to avoid name conflict
with block_size() defined in 

Also reuse find_free_bat() to avoid an error message from setbat()
when no BAT is available.

And allocate memory outside of linear memory mapping to avoid
wasting that precious space.

With this change we get correct alignment for BATs and KASAN shadow
memory is allocated outside the linear memory space.

---[ Data Block Address Translation ]---
0: 0xc000-0xcfff 0x   256M Kernel rw
1: 0xd000-0xdfff 0x1000   256M Kernel rw
2: 0xe000-0xefff 0x2000   256M Kernel rw
3: 0xf800-0xfbff 0x7c0064M Kernel rw
4: 0xfc00-0xfdff 0x7a0032M Kernel rw

Reported-by: Maxime Bizon 
Fixes: 7974c4732642 ("powerpc/32s: Implement dedicated kasan_init_region()")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
Tested-by: Maxime Bizon 
---
v3: Rebased

v2:
- Allocate kasan shadow memory outside precious kernel linear memory
- Properly zeroise kasan shadow memory
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  2 +
 arch/powerpc/mm/book3s32/mmu.c| 10 ++--
 arch/powerpc/mm/kasan/book3s_32.c | 59 ++-
 3 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 7be27862329f..78c6a5fde1d6 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -223,6 +223,8 @@ static __always_inline void update_user_segments(u32 val)
update_user_segment(15, val);
 }
 
+int __init find_free_bat(void);
+unsigned int bat_block_size(unsigned long base, unsigned long top);
 #endif /* !__ASSEMBLY__ */
 
 /* We happily ignore the smaller BATs on 601, we don't actually use
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 94045b265b6b..203735caf691 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -76,7 +76,7 @@ unsigned long p_block_mapped(phys_addr_t pa)
return 0;
 }
 
-static int __init find_free_bat(void)
+int __init find_free_bat(void)
 {
int b;
int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
@@ -100,7 +100,7 @@ static int __init find_free_bat(void)
  * - block size has to be a power of two. This is calculated by finding the
  *   highest bit set to 1.
  */
-static unsigned int block_size(unsigned long base, unsigned long top)
+unsigned int bat_block_size(unsigned long base, unsigned long top)
 {
unsigned int max_size = SZ_256M;
unsigned int base_shift = (ffs(base) - 1) & 31;
@@ -145,7 +145,7 @@ static unsigned long __init __mmu_mapin_ram(unsigned long 
base, unsigned long to
int idx;
 
while ((idx = find_free_bat()) != -1 && base != top) {
-   unsigned int size = block_size(base, top);
+   unsigned int size = bat_block_size(base, top);
 
if (size < 128 << 10)
break;
@@ -201,12 +201,12 @@ void mmu_mark_initmem_nx(void)
unsigned long size;
 
for (i = 0; i < nb - 1 && base < top;) {
-   size = block_size(base, top);
+   size = bat_block_size(base, top);
setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
base += size;
}
if (base < top) {
-   size = block_size(base, top);
+   size = bat_block_size(base, top);
if ((top - base) > size) {
size <<= 1;
if (strict_kernel_rwx_enabled() && base + size > border)
diff --git a/arch/powerpc/mm/kasan/book3s_32.c 
b/arch/powerpc/mm/kasan/book3s_32.c
index 35b287b0a8da..450a67ef0bbe 100644
--- a/arch/powerpc/mm/kasan/book3s_32.c
+++ b/arch/powerpc/mm/kasan/book3s_32.c
@@ -10,48 +10,51 @@ int __init kasan_init_region(void *start, size_t size)
 {
unsigned long k_start = (unsigned long)kasan_mem_to_shadow(start);
unsigned long k_end = (unsigned 

Re: [PATCH V2 11/17] riscv: compat: Add elf.h implementation

2022-01-10 Thread Arnd Bergmann
On Tue, Dec 28, 2021 at 3:39 PM  wrote:
>
> From: Guo Ren 
>
> Implement necessary type and macro for compat elf. See the code
> comment for detail.
>
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> Cc: Arnd Bergmann 

This looks mostly correct,

> +/*
> + * FIXME: not sure SET_PERSONALITY for compat process is right!
> + */
> +#define SET_PERSONALITY(ex)   \
> +do {if ((ex).e_ident[EI_CLASS] == ELFCLASS32) \
> +   set_thread_flag(TIF_32BIT);\
> +   else   \
> +   clear_thread_flag(TIF_32BIT);  \
> +   set_personality(PER_LINUX | (current->personality & (~PER_MASK))); \
> +} while (0)

This means the personality after exec is always set to PER_LINUX, not
PER_LINUX32, which I think is wrong: you want the PER_LINUX32
setting to stick, just like the upper bits do in the default implementation.

What the other ones do is:

| arch/parisc/include/asm/elf.h-
set_personality((current->personality & ~PER_MASK) | PER_LINUX); \

This looks like the same problem you introduce here: always forcing PER_LINUX
instead of PER_LINUX32 makes it impossible to use PER_LINUX32.

| arch/alpha/include/asm/elf.h:#define SET_PERSONALITY(EX)
   \
| arch/alpha/include/asm/elf.h-   set_personality(((EX).e_flags &
EF_ALPHA_32BIT) \
| arch/alpha/include/asm/elf.h-  ? PER_LINUX_32BIT : PER_LINUX)
| arch/csky/include/asm/elf.h:#define SET_PERSONALITY(ex)
set_personality(PER_LINUX)
| arch/nds32/include/asm/elf.h:#define SET_PERSONALITY(ex)
set_personality(PER_LINUX)

These look even worse: instead of forcing the lower bits to
PER_LINUX/PER_LINUX32 and
leaving the upper bits untouched, these also clear the upper bits
unconditionally.

| arch/arm64/include/asm/elf.h:#define SET_PERSONALITY(ex)
   \
| arch/arm64/include/asm/elf.h-   current->personality &=
~READ_IMPLIES_EXEC; \
| arch/x86/um/asm/elf.h:#define SET_PERSONALITY(ex) do {} while(0)
| arch/x86/include/asm/elf.h:#define set_personality_64bit()  do {
} while (0)
| arch/x86/kernel/process_64.c:static void __set_personality_ia32(void)
| current->personality |= force_personality32;

Inconsistent: does not enforce PER_LINUX/PER_LINUX32 as the default
implementation
does, but just leaves the value untouched (other than (re)setting
READ_IMPLIES_EXEC).
I think this is harmless otherwise, as we generally ignore the lower
bits, except for the
bit of code that checks for PER_LINUX32 in override_architecture() to mangle the
output of sys_newuname() or in /proc/cpuinfo.

| arch/s390/include/asm/elf.h-if
(personality(current->personality) != PER_LINUX32)   \
| arch/s390/include/asm/elf.h-set_personality(PER_LINUX |
   \
| arch/s390/include/asm/elf.h-
(current->personality & ~PER_MASK));\
| arch/powerpc/include/asm/elf.h- if
(personality(current->personality) != PER_LINUX32)   \
| arch/powerpc/include/asm/elf.h- set_personality(PER_LINUX |
   \
| arch/powerpc/include/asm/elf.h-
(current->personality & (~PER_MASK)));  \
| arch/sparc/include/asm/elf_64.h-if
(personality(current->personality) != PER_LINUX32)   \
| arch/sparc/include/asm/elf_64.h-
set_personality(PER_LINUX | \
| arch/sparc/include/asm/elf_64.h-
(current->personality & (~PER_MASK)));  \

This is probably the behavior you want to copy.

  Arnd


[PATCH v4 6/6] module: Move duplicate mod_check_sig users code to mod_parse_sig

2022-01-10 Thread Michal Suchanek
Multiple users of mod_check_sig check for the marker, then call
mod_check_sig, extract signature length, and remove the signature.

Put this code in one place together with mod_check_sig.

This changes the error from ENOENT to ENODATA for ima_read_modsig in the
case the signature marker is missing.

This also changes the buffer length in ima_read_modsig from size_t to
unsigned long. This reduces the possible value range on 32bit but the
length refers to kernel in-memory buffer which cannot be longer than
ULONG_MAX.

Signed-off-by: Michal Suchanek 
---
v3: - Philipp Rudo : Update the commit with note about
  change of raturn value
- Preserve the EBADMSG error code while moving code araound
v4: - remove unused variable ms in module_signing
- note about buffer length
---
 include/linux/module_signature.h|  1 +
 kernel/module_signature.c   | 56 -
 kernel/module_signing.c | 27 +++---
 security/integrity/ima/ima_modsig.c | 22 ++--
 4 files changed, 63 insertions(+), 43 deletions(-)

diff --git a/include/linux/module_signature.h b/include/linux/module_signature.h
index 7eb4b00381ac..1343879b72b3 100644
--- a/include/linux/module_signature.h
+++ b/include/linux/module_signature.h
@@ -42,5 +42,6 @@ struct module_signature {
 
 int mod_check_sig(const struct module_signature *ms, size_t file_len,
  const char *name);
+int mod_parse_sig(const void *data, size_t *len, size_t *sig_len, const char 
*name);
 
 #endif /* _LINUX_MODULE_SIGNATURE_H */
diff --git a/kernel/module_signature.c b/kernel/module_signature.c
index 00132d12487c..b8eb30182183 100644
--- a/kernel/module_signature.c
+++ b/kernel/module_signature.c
@@ -8,14 +8,36 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
+/**
+ * mod_check_sig_marker - check that the given data has signature marker at 
the end
+ *
+ * @data:  Data with appended signature
+ * @len:   Length of data. Signature marker length is subtracted on 
success.
+ */
+static inline int mod_check_sig_marker(const void *data, size_t *len)
+{
+   const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
+
+   if (markerlen > *len)
+   return -ENODATA;
+
+   if (memcmp(data + *len - markerlen, MODULE_SIG_STRING,
+  markerlen))
+   return -ENODATA;
+
+   *len -= markerlen;
+   return 0;
+}
+
 /**
  * mod_check_sig - check that the given signature is sane
  *
  * @ms:Signature to check.
- * @file_len:  Size of the file to which @ms is appended.
+ * @file_len:  Size of the file to which @ms is appended (without the marker).
  * @name:  What is being checked. Used for error messages.
  */
 int mod_check_sig(const struct module_signature *ms, size_t file_len,
@@ -44,3 +66,35 @@ int mod_check_sig(const struct module_signature *ms, size_t 
file_len,
 
return 0;
 }
+
+/**
+ * mod_parse_sig - check that the given signature is sane and determine 
signature length
+ *
+ * @data:  Data with appended signature.
+ * @len:   Length of data. Signature and marker length is subtracted on 
success.
+ * @sig_len:   Length of signature. Filled on success.
+ * @name:  What is being checked. Used for error messages.
+ */
+int mod_parse_sig(const void *data, size_t *len, size_t *sig_len, const char 
*name)
+{
+   const struct module_signature *sig;
+   int rc;
+
+   rc = mod_check_sig_marker(data, len);
+   if (rc)
+   return rc;
+
+   if (*len < sizeof(*sig))
+   return -EBADMSG;
+
+   sig = (const struct module_signature *)(data + (*len - sizeof(*sig)));
+
+   rc = mod_check_sig(sig, *len, name);
+   if (rc)
+   return rc;
+
+   *sig_len = be32_to_cpu(sig->sig_len);
+   *len -= *sig_len + sizeof(*sig);
+
+   return 0;
+}
diff --git a/kernel/module_signing.c b/kernel/module_signing.c
index 20857d2a15ca..1d4cb03cce21 100644
--- a/kernel/module_signing.c
+++ b/kernel/module_signing.c
@@ -25,35 +25,16 @@ int verify_appended_signature(const void *data, unsigned 
long *len,
  struct key *trusted_keys,
  enum key_being_used_for purpose)
 {
-   const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
-   struct module_signature *ms;
-   unsigned long sig_len, modlen = *len;
+   unsigned long sig_len;
int ret;
 
-   pr_devel("==>%s %s(,%lu)\n", __func__, key_being_used_for[purpose], 
modlen);
+   pr_devel("==>%s %s(,%lu)\n", __func__, key_being_used_for[purpose], 
*len);
 
-   if (markerlen > modlen)
-   return -ENODATA;
-
-   if (memcmp(data + modlen - markerlen, MODULE_SIG_STRING,
-  markerlen))
-   return -ENODATA;
-   modlen -= markerlen;
-
-   if (modlen <= sizeof(*ms))
-   return -EBADMSG;
-
-   ms = data + modlen - sizeof(*ms);
-
-   ret = mod_check_sig(ms, 

[PATCH v4 5/6] module: Use key_being_used_for for log messages in verify_appended_signature

2022-01-10 Thread Michal Suchanek
Add value for kexec appended signature and pass in key_being_used_for
enum rather than a string to verify_appended_signature to produce log
messages about the signature.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kexec/elf_64.c  |  2 +-
 arch/s390/kernel/machine_kexec_file.c|  2 +-
 crypto/asymmetric_keys/asymmetric_type.c |  1 +
 include/linux/verification.h |  4 +++-
 kernel/module.c  |  3 ++-
 kernel/module_signing.c  | 11 ++-
 6 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 6dec8151ef73..c50869195d51 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -156,7 +156,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
 int elf64_verify_sig(const char *kernel, unsigned long kernel_len)
 {
return verify_appended_signature(kernel, _len, 
VERIFY_USE_PLATFORM_KEYRING,
-"kexec_file");
+VERIFYING_KEXEC_APPENDED_SIGNATURE);
 }
 #endif /* CONFIG_KEXEC_SIG */
 
diff --git a/arch/s390/kernel/machine_kexec_file.c 
b/arch/s390/kernel/machine_kexec_file.c
index c3deccf1da83..63eec38e3137 100644
--- a/arch/s390/kernel/machine_kexec_file.c
+++ b/arch/s390/kernel/machine_kexec_file.c
@@ -32,7 +32,7 @@ int s390_verify_sig(const char *kernel, unsigned long 
kernel_len)
return 0;
 
return verify_appended_signature(kernel, _len, 
VERIFY_USE_PLATFORM_KEYRING,
-   "kexec_file");
+   VERIFYING_KEXEC_APPENDED_SIGNATURE);
 }
 #endif /* CONFIG_KEXEC_SIG */
 
diff --git a/crypto/asymmetric_keys/asymmetric_type.c 
b/crypto/asymmetric_keys/asymmetric_type.c
index ad8af3d70ac0..6fd20eec3882 100644
--- a/crypto/asymmetric_keys/asymmetric_type.c
+++ b/crypto/asymmetric_keys/asymmetric_type.c
@@ -25,6 +25,7 @@ const char *const key_being_used_for[NR__KEY_BEING_USED_FOR] 
= {
[VERIFYING_KEXEC_PE_SIGNATURE]  = "kexec PE sig",
[VERIFYING_KEY_SIGNATURE]   = "key sig",
[VERIFYING_KEY_SELF_SIGNATURE]  = "key self sig",
+   [VERIFYING_KEXEC_APPENDED_SIGNATURE]= "kexec appended sig",
[VERIFYING_UNSPECIFIED_SIGNATURE]   = "unspec sig",
 };
 EXPORT_SYMBOL_GPL(key_being_used_for);
diff --git a/include/linux/verification.h b/include/linux/verification.h
index 32db9287a7b0..f92c49443b4f 100644
--- a/include/linux/verification.h
+++ b/include/linux/verification.h
@@ -26,6 +26,7 @@ enum key_being_used_for {
VERIFYING_KEXEC_PE_SIGNATURE,
VERIFYING_KEY_SIGNATURE,
VERIFYING_KEY_SELF_SIGNATURE,
+   VERIFYING_KEXEC_APPENDED_SIGNATURE,
VERIFYING_UNSPECIFIED_SIGNATURE,
NR__KEY_BEING_USED_FOR
 };
@@ -61,7 +62,8 @@ extern int verify_pefile_signature(const void *pebuf, 
unsigned pelen,
 #endif
 
 int verify_appended_signature(const void *data, unsigned long *len,
- struct key *trusted_keys, const char *what);
+ struct key *trusted_keys,
+ enum key_being_used_for purpose);
 
 #endif /* CONFIG_SYSTEM_DATA_VERIFICATION */
 #endif /* _LINUX_VERIFY_PEFILE_H */
diff --git a/kernel/module.c b/kernel/module.c
index d91ca0f93a40..0a359dc6b690 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2891,7 +2891,8 @@ static int module_sig_check(struct load_info *info, int 
flags)
 */
if (flags == 0) {
err = verify_appended_signature(mod, >len,
-   VERIFY_USE_SECONDARY_KEYRING, 
"module");
+   VERIFY_USE_SECONDARY_KEYRING,
+   VERIFYING_MODULE_SIGNATURE);
if (!err) {
info->sig_ok = true;
return 0;
diff --git a/kernel/module_signing.c b/kernel/module_signing.c
index 39a6dd7c6dd2..20857d2a15ca 100644
--- a/kernel/module_signing.c
+++ b/kernel/module_signing.c
@@ -19,17 +19,18 @@
  * @data: The data to be verified
  * @len: Size of @data.
  * @trusted_keys: Keyring to use for verification
- * @what: Informational string for log messages
+ * @purpose: The use to which the key is being put
  */
 int verify_appended_signature(const void *data, unsigned long *len,
- struct key *trusted_keys, const char *what)
+ struct key *trusted_keys,
+ enum key_being_used_for purpose)
 {
const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
struct module_signature *ms;
unsigned long sig_len, modlen = *len;
int ret;
 
-   pr_devel("==>%s(,%lu)\n", __func__, modlen);
+   pr_devel("==>%s %s(,%lu)\n", __func__, key_being_used_for[purpose], 
modlen);
 
if 

[PATCH v4 4/6] module: strip the signature marker in the verification function.

2022-01-10 Thread Michal Suchanek
It is stripped by each caller separately.

Note: this changes the error for kexec_file from EKEYREJECTED to ENODATA
when the signature marker is missing.

Signed-off-by: Michal Suchanek 
---
v3: - Philipp Rudo : Update the commit with note about
  change of raturn value
- the module_signature.h is now no longer needed for kexec_file
---
 arch/powerpc/kexec/elf_64.c   | 11 ---
 arch/s390/kernel/machine_kexec_file.c | 11 ---
 kernel/module.c   |  7 +--
 kernel/module_signing.c   | 12 ++--
 4 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 64cd314cce0d..6dec8151ef73 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -24,7 +24,6 @@
 #include 
 #include 
 #include 
-#include 
 
 static void *elf64_load(struct kimage *image, char *kernel_buf,
unsigned long kernel_len, char *initrd,
@@ -156,16 +155,6 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
 #ifdef CONFIG_KEXEC_SIG
 int elf64_verify_sig(const char *kernel, unsigned long kernel_len)
 {
-   const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
-
-   if (marker_len > kernel_len)
-   return -EKEYREJECTED;
-
-   if (memcmp(kernel + kernel_len - marker_len, MODULE_SIG_STRING,
-  marker_len))
-   return -EKEYREJECTED;
-   kernel_len -= marker_len;
-
return verify_appended_signature(kernel, _len, 
VERIFY_USE_PLATFORM_KEYRING,
 "kexec_file");
 }
diff --git a/arch/s390/kernel/machine_kexec_file.c 
b/arch/s390/kernel/machine_kexec_file.c
index 345f2eab6e04..c3deccf1da83 100644
--- a/arch/s390/kernel/machine_kexec_file.c
+++ b/arch/s390/kernel/machine_kexec_file.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -28,20 +27,10 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
 #ifdef CONFIG_KEXEC_SIG
 int s390_verify_sig(const char *kernel, unsigned long kernel_len)
 {
-   const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
-
/* Skip signature verification when not secure IPLed. */
if (!ipl_secure_flag)
return 0;
 
-   if (marker_len > kernel_len)
-   return -EKEYREJECTED;
-
-   if (memcmp(kernel + kernel_len - marker_len, MODULE_SIG_STRING,
-  marker_len))
-   return -EKEYREJECTED;
-   kernel_len -= marker_len;
-
return verify_appended_signature(kernel, _len, 
VERIFY_USE_PLATFORM_KEYRING,
"kexec_file");
 }
diff --git a/kernel/module.c b/kernel/module.c
index 8481933dfa92..d91ca0f93a40 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2882,7 +2882,6 @@ static inline void kmemleak_load_module(const struct 
module *mod,
 static int module_sig_check(struct load_info *info, int flags)
 {
int err = -ENODATA;
-   const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
const char *reason;
const void *mod = info->hdr;
 
@@ -2890,11 +2889,7 @@ static int module_sig_check(struct load_info *info, int 
flags)
 * Require flags == 0, as a module with version information
 * removed is no longer the module that was signed
 */
-   if (flags == 0 &&
-   info->len > markerlen &&
-   memcmp(mod + info->len - markerlen, MODULE_SIG_STRING, markerlen) 
== 0) {
-   /* We truncate the module to discard the signature */
-   info->len -= markerlen;
+   if (flags == 0) {
err = verify_appended_signature(mod, >len,
VERIFY_USE_SECONDARY_KEYRING, 
"module");
if (!err) {
diff --git a/kernel/module_signing.c b/kernel/module_signing.c
index 30149969f21f..39a6dd7c6dd2 100644
--- a/kernel/module_signing.c
+++ b/kernel/module_signing.c
@@ -15,8 +15,7 @@
 #include "module-internal.h"
 
 /**
- * verify_appended_signature - Verify the signature on a module with the
- * signature marker stripped.
+ * verify_appended_signature - Verify the signature on a module
  * @data: The data to be verified
  * @len: Size of @data.
  * @trusted_keys: Keyring to use for verification
@@ -25,12 +24,21 @@
 int verify_appended_signature(const void *data, unsigned long *len,
  struct key *trusted_keys, const char *what)
 {
+   const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
struct module_signature *ms;
unsigned long sig_len, modlen = *len;
int ret;
 
pr_devel("==>%s(,%lu)\n", __func__, modlen);
 
+   if (markerlen > modlen)
+   return -ENODATA;
+
+   if (memcmp(data + modlen - markerlen, MODULE_SIG_STRING,
+  markerlen))
+   return -ENODATA;
+   modlen -= markerlen;
+
   

[PATCH v4 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-01-10 Thread Michal Suchanek
Copy the code from s390x

Both powerpc and s390x use appended signature format (as opposed to EFI
based patforms using PE format).

Signed-off-by: Michal Suchanek 
---
v3: - Philipp Rudo : Update the comit message with
  explanation why the s390 code is usable on powerpc.
- Include correct header for mod_check_sig
- Nayna : Mention additional IMA features
  in kconfig text
---
 arch/powerpc/Kconfig| 16 
 arch/powerpc/kexec/elf_64.c | 36 
 2 files changed, 52 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index dea74d7717c0..1cde9b6c5987 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -560,6 +560,22 @@ config KEXEC_FILE
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
 
+config KEXEC_SIG
+   bool "Verify kernel signature during kexec_file_load() syscall"
+   depends on KEXEC_FILE && MODULE_SIG_FORMAT
+   help
+ This option makes kernel signature verification mandatory for
+ the kexec_file_load() syscall.
+
+ In addition to that option, you need to enable signature
+ verification for the corresponding kernel image type being
+ loaded in order for this to work.
+
+ Note: on powerpc IMA_ARCH_POLICY also implements kexec'ed kernel
+ verification. In addition IMA adds kernel hashes to the measurement
+ list, extends IMA PCR in the TPM, and implements kernel image
+ blacklist by hash.
+
 config RELOCATABLE
bool "Build a relocatable kernel"
depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index eeb258002d1e..98d1cb5135b4 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static void *elf64_load(struct kimage *image, char *kernel_buf,
unsigned long kernel_len, char *initrd,
@@ -151,7 +152,42 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
return ret ? ERR_PTR(ret) : NULL;
 }
 
+#ifdef CONFIG_KEXEC_SIG
+int elf64_verify_sig(const char *kernel, unsigned long kernel_len)
+{
+   const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
+   struct module_signature *ms;
+   unsigned long sig_len;
+   int ret;
+
+   if (marker_len > kernel_len)
+   return -EKEYREJECTED;
+
+   if (memcmp(kernel + kernel_len - marker_len, MODULE_SIG_STRING,
+  marker_len))
+   return -EKEYREJECTED;
+   kernel_len -= marker_len;
+
+   ms = (void *)kernel + kernel_len - sizeof(*ms);
+   ret = mod_check_sig(ms, kernel_len, "kexec");
+   if (ret)
+   return ret;
+
+   sig_len = be32_to_cpu(ms->sig_len);
+   kernel_len -= sizeof(*ms) + sig_len;
+
+   return verify_pkcs7_signature(kernel, kernel_len,
+ kernel + kernel_len, sig_len,
+ VERIFY_USE_PLATFORM_KEYRING,
+ VERIFYING_MODULE_SIGNATURE,
+ NULL, NULL);
+}
+#endif /* CONFIG_KEXEC_SIG */
+
 const struct kexec_file_ops kexec_elf64_ops = {
.probe = kexec_elf_probe,
.load = elf64_load,
+#ifdef CONFIG_KEXEC_SIG
+   .verify_sig = elf64_verify_sig,
+#endif
 };
-- 
2.31.1



[PATCH v4 3/6] kexec_file: Don't opencode appended signature verification.

2022-01-10 Thread Michal Suchanek
Module verification already implements appeded signature verification.

Reuse it for kexec_file.

Signed-off-by: Michal Suchanek 
---
v3: - Philipp Rudo : Update the dependency on
  MODULE_SIG_FORMAT to MODULE_SIG
- Include linux/verification.h - previously added in earlier patch
v4: - kernel test robot : Use unsigned long rather than size_t 
for data length
- Update the code in kernel/module_signing.c to use pointer rather
  than memcpy as the kexec and IMA code does
---
 arch/powerpc/Kconfig  |  2 +-
 arch/powerpc/kexec/elf_64.c   | 19 +++
 arch/s390/Kconfig |  2 +-
 arch/s390/kernel/machine_kexec_file.c | 18 ++
 include/linux/verification.h  |  3 +++
 kernel/module-internal.h  |  2 --
 kernel/module.c   |  4 +++-
 kernel/module_signing.c   | 34 ---
 8 files changed, 33 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1cde9b6c5987..4092187474ff 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -562,7 +562,7 @@ config ARCH_HAS_KEXEC_PURGATORY
 
 config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
-   depends on KEXEC_FILE && MODULE_SIG_FORMAT
+   depends on KEXEC_FILE && MODULE_SIG
help
  This option makes kernel signature verification mandatory for
  the kexec_file_load() syscall.
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 98d1cb5135b4..64cd314cce0d 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static void *elf64_load(struct kimage *image, char *kernel_buf,
@@ -156,9 +157,6 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
 int elf64_verify_sig(const char *kernel, unsigned long kernel_len)
 {
const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
-   struct module_signature *ms;
-   unsigned long sig_len;
-   int ret;
 
if (marker_len > kernel_len)
return -EKEYREJECTED;
@@ -168,19 +166,8 @@ int elf64_verify_sig(const char *kernel, unsigned long 
kernel_len)
return -EKEYREJECTED;
kernel_len -= marker_len;
 
-   ms = (void *)kernel + kernel_len - sizeof(*ms);
-   ret = mod_check_sig(ms, kernel_len, "kexec");
-   if (ret)
-   return ret;
-
-   sig_len = be32_to_cpu(ms->sig_len);
-   kernel_len -= sizeof(*ms) + sig_len;
-
-   return verify_pkcs7_signature(kernel, kernel_len,
- kernel + kernel_len, sig_len,
- VERIFY_USE_PLATFORM_KEYRING,
- VERIFYING_MODULE_SIGNATURE,
- NULL, NULL);
+   return verify_appended_signature(kernel, _len, 
VERIFY_USE_PLATFORM_KEYRING,
+"kexec_file");
 }
 #endif /* CONFIG_KEXEC_SIG */
 
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 2a5bb4f29cfe..cece7152ea35 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -544,7 +544,7 @@ config ARCH_HAS_KEXEC_PURGATORY
 
 config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
-   depends on KEXEC_FILE && MODULE_SIG_FORMAT
+   depends on KEXEC_FILE && MODULE_SIG
help
  This option makes kernel signature verification mandatory for
  the kexec_file_load() syscall.
diff --git a/arch/s390/kernel/machine_kexec_file.c 
b/arch/s390/kernel/machine_kexec_file.c
index c944d71316c7..345f2eab6e04 100644
--- a/arch/s390/kernel/machine_kexec_file.c
+++ b/arch/s390/kernel/machine_kexec_file.c
@@ -29,9 +29,6 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
 int s390_verify_sig(const char *kernel, unsigned long kernel_len)
 {
const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
-   struct module_signature *ms;
-   unsigned long sig_len;
-   int ret;
 
/* Skip signature verification when not secure IPLed. */
if (!ipl_secure_flag)
@@ -45,19 +42,8 @@ int s390_verify_sig(const char *kernel, unsigned long 
kernel_len)
return -EKEYREJECTED;
kernel_len -= marker_len;
 
-   ms = (void *)kernel + kernel_len - sizeof(*ms);
-   ret = mod_check_sig(ms, kernel_len, "kexec");
-   if (ret)
-   return ret;
-
-   sig_len = be32_to_cpu(ms->sig_len);
-   kernel_len -= sizeof(*ms) + sig_len;
-
-   return verify_pkcs7_signature(kernel, kernel_len,
- kernel + kernel_len, sig_len,
- VERIFY_USE_PLATFORM_KEYRING,
- VERIFYING_MODULE_SIGNATURE,
- NULL, NULL);
+   return verify_appended_signature(kernel, _len, 

[PATCH v4 1/6] s390/kexec_file: Don't opencode appended signature check.

2022-01-10 Thread Michal Suchanek
Module verification already implements appeded signature check.

Reuse it for kexec_file.

The kexec_file implementation uses EKEYREJECTED error in some cases when
there is no key and the common implementation uses ENOPKG or EBADMSG
instead.

Signed-off-by: Michal Suchanek 
Acked-by: Heiko Carstens 
---
v3: Philipp Rudo : Update the commit with note about
change of return value
---
 arch/s390/kernel/machine_kexec_file.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/arch/s390/kernel/machine_kexec_file.c 
b/arch/s390/kernel/machine_kexec_file.c
index 8f43575a4dd3..c944d71316c7 100644
--- a/arch/s390/kernel/machine_kexec_file.c
+++ b/arch/s390/kernel/machine_kexec_file.c
@@ -31,6 +31,7 @@ int s390_verify_sig(const char *kernel, unsigned long 
kernel_len)
const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
struct module_signature *ms;
unsigned long sig_len;
+   int ret;
 
/* Skip signature verification when not secure IPLed. */
if (!ipl_secure_flag)
@@ -45,25 +46,12 @@ int s390_verify_sig(const char *kernel, unsigned long 
kernel_len)
kernel_len -= marker_len;
 
ms = (void *)kernel + kernel_len - sizeof(*ms);
-   kernel_len -= sizeof(*ms);
+   ret = mod_check_sig(ms, kernel_len, "kexec");
+   if (ret)
+   return ret;
 
sig_len = be32_to_cpu(ms->sig_len);
-   if (sig_len >= kernel_len)
-   return -EKEYREJECTED;
-   kernel_len -= sig_len;
-
-   if (ms->id_type != PKEY_ID_PKCS7)
-   return -EKEYREJECTED;
-
-   if (ms->algo != 0 ||
-   ms->hash != 0 ||
-   ms->signer_len != 0 ||
-   ms->key_id_len != 0 ||
-   ms->__pad[0] != 0 ||
-   ms->__pad[1] != 0 ||
-   ms->__pad[2] != 0) {
-   return -EBADMSG;
-   }
+   kernel_len -= sizeof(*ms) + sig_len;
 
return verify_pkcs7_signature(kernel, kernel_len,
  kernel + kernel_len, sig_len,
-- 
2.31.1



[PATCH v4 0/6] KEXEC_SIG with appended signature

2022-01-10 Thread Michal Suchanek
Hello,

This is a refresh of the KEXEC_SIG series.

This adds KEXEC_SIG support on powerpc and deduplicates the code dealing
with appended signatures in the kernel.

powerpc supports IMA_KEXEC but that's an exception rather than the norm.
On the other hand, KEXEC_SIG is portable across platforms.

For distributions to have uniform security features across platforms one
option should be used on all platforms.

Thanks

Michal

Previous revision: 
https://lore.kernel.org/linuxppc-dev/cover.1637862358.git.msucha...@suse.de/
Patched kernel tree: https://github.com/hramrach/kernel/tree/kexec_sig

Michal Suchanek (6):
  s390/kexec_file: Don't opencode appended signature check.
  powerpc/kexec_file: Add KEXEC_SIG support.
  kexec_file: Don't opencode appended signature verification.
  module: strip the signature marker in the verification function.
  module: Use key_being_used_for for log messages in
verify_appended_signature
  module: Move duplicate mod_check_sig users code to mod_parse_sig

 arch/powerpc/Kconfig | 16 +++
 arch/powerpc/kexec/elf_64.c  | 12 +
 arch/s390/Kconfig|  2 +-
 arch/s390/kernel/machine_kexec_file.c| 41 +
 crypto/asymmetric_keys/asymmetric_type.c |  1 +
 include/linux/module_signature.h |  1 +
 include/linux/verification.h |  5 +++
 kernel/module-internal.h |  2 -
 kernel/module.c  | 12 +++--
 kernel/module_signature.c| 56 +++-
 kernel/module_signing.c  | 34 +++---
 security/integrity/ima/ima_modsig.c  | 22 ++
 12 files changed, 116 insertions(+), 88 deletions(-)

-- 
2.31.1



Re: [PATCH V2 03/17] asm-generic: fcntl: compat: Remove duplicate definitions

2022-01-10 Thread Arnd Bergmann
On Tue, Dec 28, 2021 at 3:39 PM  wrote:
>
> From: Guo Ren 
>
> Remove duplicate F_GETLK64,F_SETLK64,F_SETLKW64 definitions in
> arch/*/include/asm/compat.h.
>
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> Cc: Arnd Bergmann 

Unfortunately, this one does not look correct to me:

> @@ -116,7 +116,7 @@
>  #define F_GETSIG   11  /* for sockets. */
>  #endif
>
> -#ifndef CONFIG_64BIT
> +#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
>  #ifndef F_GETLK64
>  #define F_GETLK64  12  /*  using 'struct flock64' */
>  #define F_SETLK64  13

The problem here is that include/uapi/ headers cannot contain checks for
CONFIG_* symbols because those may have different meanings in user space
compared to kernel.

This is a preexisting problem in the header, but I think the change
makes it worse.

With the current behavior, user space will always see the definitions,
unless it happens to have its own definition for CONFIG_64BIT already.
On 64-bit parisc, this has the effect of defining the macros to the
same values as F_SETOWN/F_SETSIG/F_GETSIG, which is potentially
harmful. On MIPS, it uses values that are different from the 32-bit numbers
but are otherwise unused. Everywhere else, we get the definition from
the 32-bit architecture in user space, which will do nothing in the kernel.

The correct check for a uapi header would be to test for
__BITS_PER_LONG==32. We should probably do that here, but
this won't help you move the definitions, and it is a user-visible change
as the incorrect definition will no longer be visible. [Adding Jeff and Bruce
(the flock mainainers) to Cc for additional feedback on this]

For your series, I would suggest just moving the macro definitions to
include/linux/compat.h along with the 'struct compat_flock64'
definition, and leaving the duplicate one in the uapi header unchanged
until we have decided on a solution.

Arnd


Re: [PATCH v5] powerpc/pseries: read the lpar name from the firmware

2022-01-10 Thread kernel test robot
Hi Laurent,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on linux/master linus/master v5.16 next-20220110]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Laurent-Dufour/powerpc-pseries-read-the-lpar-name-from-the-firmware/20220107-001503
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-randconfig-r026-20220106 
(https://download.01.org/0day-ci/archive/20220110/202201102154.a95oqepr-...@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/5cf0dea6e919e93ff3088f87acd40e84608a6861
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Laurent-Dufour/powerpc-pseries-read-the-lpar-name-from-the-firmware/20220107-001503
git checkout 5cf0dea6e919e93ff3088f87acd40e84608a6861
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/platforms/pseries/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   arch/powerpc/platforms/pseries/lparcfg.c:257: warning: Function parameter or 
member 'm' not described in 'parse_mpp_data'
   arch/powerpc/platforms/pseries/lparcfg.c:295: warning: Function parameter or 
member 'm' not described in 'parse_mpp_x_data'
   arch/powerpc/platforms/pseries/lparcfg.c:334: warning: Function parameter or 
member 'm' not described in 'read_RTAS_lpar_name'
>> arch/powerpc/platforms/pseries/lparcfg.c:334: warning: expecting prototype 
>> for Read the lpar name using the RTAS ibm,get-system(). Prototype was for 
>> read_RTAS_lpar_name() instead
>> arch/powerpc/platforms/pseries/lparcfg.c:378: warning: This comment starts 
>> with '/**', but isn't a kernel-doc comment. Refer 
>> Documentation/doc-guide/kernel-doc.rst
* Read the LPAR name from the Device Tree.
   arch/powerpc/platforms/pseries/lparcfg.c:678: warning: Function parameter or 
member 'entitlement' not described in 'update_mpp'
   arch/powerpc/platforms/pseries/lparcfg.c:678: warning: Function parameter or 
member 'weight' not described in 'update_mpp'


vim +334 arch/powerpc/platforms/pseries/lparcfg.c

   323  
   324  /**
   325   * Read the lpar name using the RTAS ibm,get-system-parameter call.
   326   *
   327   * The name read through this call is updated if changes are made by 
the end
   328   * user on the hypervisor side.
   329   *
   330   * Some hypervisor (like Qemu) may not provide this value. In that 
case, a non
   331   * null value is returned.
   332   */
   333  static int read_RTAS_lpar_name(struct seq_file *m)
 > 334  {
   335  int rc, len, token;
   336  union {
   337  char raw_buffer[GET_SYS_PARM_BUF_SIZE];
   338  struct {
   339  __be16 len;
   340  char name[GET_SYS_PARM_BUF_SIZE-2];
   341  };
   342  } *local_buffer;
   343  
   344  token = rtas_token("ibm,get-system-parameter");
   345  if (token == RTAS_UNKNOWN_SERVICE)
   346  return -EINVAL;
   347  
   348  local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
   349  if (!local_buffer)
   350  return -ENOMEM;
   351  
   352  do {
   353  spin_lock(_data_buf_lock);
   354  memset(rtas_data_buf, 0, sizeof(*local_buffer));
   355  rc = rtas_call(token, 3, 1, NULL, 
SPLPAR_LPAR_NAME_TOKEN,
   356 __pa(rtas_data_buf), 
sizeof(*local_buffer));
   357  if (!rc)
   358  memcpy(local_buffer->raw_buffer, rtas_data_buf,
   359 sizeof(local_buffer->raw_buffer));
   360  spin_unlock(_data_buf_lock);
   361  } while (rtas_busy_delay(rc));
   362  
   363  if (!rc) {
   364  /* Force end of string */
   365  len = min((int) be16_to_cpu(local_buffer->len),
   366(int) sizeof(local_buffer->name)-1);
   367  local_buffer->name[len] = '\0';
   368  
   369  seq_printf(m, "partition_name=%s\n", 
local_buffer->name);
   370  } else
   371  rc = -ENODATA;
   372

[PATCH] powerpc/bpf: Always reallocate BPF_REG_5, BPF_REG_AX and TMP_REG when possible

2022-01-10 Thread Christophe Leroy
BPF_REG_5, BPF_REG_AX and TMP_REG are mapped on non volatile registers
because there are not enough volatile registers, but they don't need
to be preserved on function calls.

So when some volatile registers become available, those registers can
always be reallocated regardless of whether SEEN_FUNC is set or not.

Suggested-by: Naveen N. Rao 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/net/bpf_jit.h|  3 ---
 arch/powerpc/net/bpf_jit_comp32.c | 14 +++---
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index b20a2a83a6e7..b75507fc8f6b 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -127,9 +127,6 @@
 #define SEEN_FUNC  0x2000 /* might call external helpers */
 #define SEEN_TAILCALL  0x4000 /* uses tail calls */
 
-#define SEEN_VREG_MASK 0x1ff8 /* Volatile registers r3-r12 */
-#define SEEN_NVREG_MASK0x0003 /* Non volatile registers r14-r31 */
-
 #ifdef CONFIG_PPC64
 extern const int b2p[MAX_BPF_JIT_REG + 2];
 #else
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index d3a52cd42f53..cfec42c8a511 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -77,14 +77,22 @@ static int bpf_jit_stack_offsetof(struct codegen_context 
*ctx, int reg)
return BPF_PPC_STACKFRAME(ctx) - 4;
 }
 
+#define SEEN_VREG_MASK 0x1ff8 /* Volatile registers r3-r12 */
+#define SEEN_NVREG_FULL_MASK   0x0003 /* Non volatile registers r14-r31 */
+#define SEEN_NVREG_TEMP_MASK   0x1e01 /* BPF_REG_5, BPF_REG_AX, TMP_REG */
+
 void bpf_jit_realloc_regs(struct codegen_context *ctx)
 {
+   unsigned int nvreg_mask;
+
if (ctx->seen & SEEN_FUNC)
-   return;
+   nvreg_mask = SEEN_NVREG_TEMP_MASK;
+   else
+   nvreg_mask = SEEN_NVREG_FULL_MASK;
 
-   while (ctx->seen & SEEN_NVREG_MASK &&
+   while (ctx->seen & nvreg_mask &&
  (ctx->seen & SEEN_VREG_MASK) != SEEN_VREG_MASK) {
-   int old = 32 - fls(ctx->seen & (SEEN_NVREG_MASK & 0xaaab));
+   int old = 32 - fls(ctx->seen & (nvreg_mask & 0xaaab));
int new = 32 - fls(~ctx->seen & (SEEN_VREG_MASK & 0x));
int i;
 
-- 
2.33.1


Re: [PATCH v2 8/8] powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC32

2022-01-10 Thread Christophe Leroy


Le 07/01/2022 à 12:51, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>> When the BPF routine doesn't call any function, the non volatile
>> registers can be reallocated to volatile registers in order to
>> avoid having to save them/restore on the stack.
>>
>> Before this patch, the test #359 ADD default X is:
>>
>>    0:    7c 64 1b 78 mr  r4,r3
>>    4:    38 60 00 00 li  r3,0
>>    8:    94 21 ff b0 stwu    r1,-80(r1)
>>    c:    60 00 00 00 nop
>>   10:    92 e1 00 2c stw r23,44(r1)
>>   14:    93 01 00 30 stw r24,48(r1)
>>   18:    93 21 00 34 stw r25,52(r1)
>>   1c:    93 41 00 38 stw r26,56(r1)
>>   20:    39 80 00 00 li  r12,0
>>   24:    39 60 00 00 li  r11,0
>>   28:    3b 40 00 00 li  r26,0
>>   2c:    3b 20 00 00 li  r25,0
>>   30:    7c 98 23 78 mr  r24,r4
>>   34:    7c 77 1b 78 mr  r23,r3
>>   38:    39 80 00 42 li  r12,66
>>   3c:    39 60 00 00 li  r11,0
>>   40:    7d 8c d2 14 add r12,r12,r26
>>   44:    39 60 00 00 li  r11,0
>>   48:    7d 83 63 78 mr  r3,r12
>>   4c:    82 e1 00 2c lwz r23,44(r1)
>>   50:    83 01 00 30 lwz r24,48(r1)
>>   54:    83 21 00 34 lwz r25,52(r1)
>>   58:    83 41 00 38 lwz r26,56(r1)
>>   5c:    38 21 00 50 addi    r1,r1,80
>>   60:    4e 80 00 20 blr
>>
>> After this patch, the same test has become:
>>
>>    0:    7c 64 1b 78 mr  r4,r3
>>    4:    38 60 00 00 li  r3,0
>>    8:    94 21 ff b0 stwu    r1,-80(r1)
>>    c:    60 00 00 00 nop
>>   10:    39 80 00 00 li  r12,0
>>   14:    39 60 00 00 li  r11,0
>>   18:    39 00 00 00 li  r8,0
>>   1c:    38 e0 00 00 li  r7,0
>>   20:    7c 86 23 78 mr  r6,r4
>>   24:    7c 65 1b 78 mr  r5,r3
>>   28:    39 80 00 42 li  r12,66
>>   2c:    39 60 00 00 li  r11,0
>>   30:    7d 8c 42 14 add r12,r12,r8
>>   34:    39 60 00 00 li  r11,0
>>   38:    7d 83 63 78 mr  r3,r12
>>   3c:    38 21 00 50 addi    r1,r1,80
>>   40:    4e 80 00 20 blr
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>  arch/powerpc/net/bpf_jit.h    | 16 
>>  arch/powerpc/net/bpf_jit64.h  |  2 +-
>>  arch/powerpc/net/bpf_jit_comp.c   |  2 ++
>>  arch/powerpc/net/bpf_jit_comp32.c | 30 --
>>  arch/powerpc/net/bpf_jit_comp64.c |  4 
>>  5 files changed, 51 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
>> index a45b8266355d..776abef4d2a0 100644
>> --- a/arch/powerpc/net/bpf_jit.h
>> +++ b/arch/powerpc/net/bpf_jit.h
>> @@ -116,6 +116,15 @@ static inline bool is_nearbranch(int offset)
>>  #define SEEN_STACK    0x4000 /* uses BPF stack */
>>  #define SEEN_TAILCALL    0x8000 /* uses tail calls */
>>
>> +#define SEEN_VREG_MASK    0x1ff8 /* Volatile registers r3-r12 */
>> +#define SEEN_NVREG_MASK    0x0003 /* Non volatile registers 
>> r14-r31 */
>> +
>> +#ifdef CONFIG_PPC64
>> +extern const int b2p[MAX_BPF_JIT_REG + 2];
>> +#else
>> +extern const int b2p[MAX_BPF_JIT_REG + 1];
>> +#endif
>> +
>>  struct codegen_context {
>>  /*
>>   * This is used to track register usage as well
>> @@ -129,6 +138,7 @@ struct codegen_context {
>>  unsigned int seen;
>>  unsigned int idx;
>>  unsigned int stack_size;
>> +    int b2p[ARRAY_SIZE(b2p)];
>>  };
>>
>>  static inline void bpf_flush_icache(void *start, void *end)
>> @@ -147,11 +157,17 @@ static inline void bpf_set_seen_register(struct 
>> codegen_context *ctx, int i)
>>  ctx->seen |= 1 << (31 - i);
>>  }
>>
>> +static inline void bpf_clear_seen_register(struct codegen_context 
>> *ctx, int i)
>> +{
>> +    ctx->seen &= ~(1 << (31 - i));
>> +}
>> +
>>  void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context 
>> *ctx, u64 func);
>>  int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct 
>> codegen_context *ctx,
>>     u32 *addrs, bool extra_pass);
>>  void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
>>  void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
>> +void bpf_jit_realloc_regs(struct codegen_context *ctx);
>>
>>  #endif
>>
>> diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
>> index b05f2e67bba1..7b713edfa7e2 100644
>> --- a/arch/powerpc/net/bpf_jit64.h
>> +++ b/arch/powerpc/net/bpf_jit64.h
>> @@ -39,7 +39,7 @@
>>  #define TMP_REG_2    (MAX_BPF_JIT_REG + 1)
>>
>>  /* BPF to ppc register mappings */
>> -static const int b2p[] = {
>> +const int b2p[MAX_BPF_JIT_REG + 2] = {
>>  /* function return value */
>>  [BPF_REG_0] = 8,
>>  /* function arguments */
>> diff --git a/arch/powerpc/net/bpf_jit_comp.c 
>> b/arch/powerpc/net/bpf_jit_comp.c
>> index efac89964873..798ac4350a82 100644
>> --- a/arch/powerpc/net/bpf_jit_comp.c
>> +++ 

Re: [PATCH V2 02/17] fs: stat: compat: Add __ARCH_WANT_COMPAT_STAT

2022-01-10 Thread Arnd Bergmann
On Tue, Dec 28, 2021 at 3:39 PM  wrote:
>
> From: Guo Ren 
>
> RISC-V doesn't neeed compat_stat, so using __ARCH_WANT_COMPAT_STAT
> to exclude unnecessary SYSCALL functions.
>
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 

Reviewed-by: Arnd Bergmann 


Re: [PATCH V2 01/17] kconfig: Add SYSVIPC_COMPAT for all architectures

2022-01-10 Thread Arnd Bergmann
On Tue, Dec 28, 2021 at 3:39 PM  wrote:
>
> From: Guo Ren 
>
> The existing per-arch definitions are pretty much historic cruft.
> Move SYSVIPC_COMPAT into init/Kconfig.
>
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> Cc: Arnd Bergmann 
> Cc: Christoph Hellwig 

Acked-by: Arnd Bergmann 


Re: [PATCH 03/13] powerpc/bpf: Update ldimm64 instructions during extra pass

2022-01-10 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 06/01/2022 à 12:45, Naveen N. Rao a écrit :

These instructions are updated after the initial JIT, so redo codegen
during the extra pass. Rename bpf_jit_fixup_subprog_calls() to clarify
that this is more than just subprog calls.

Fixes: 69c087ba6225b5 ("bpf: Add bpf_for_each_map_elem() helper")
Cc: sta...@vger.kernel.org # v5.15
Signed-off-by: Naveen N. Rao 
---
  arch/powerpc/net/bpf_jit_comp.c   | 29 +++--
  arch/powerpc/net/bpf_jit_comp32.c |  6 ++
  arch/powerpc/net/bpf_jit_comp64.c |  7 ++-
  3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index d6ffdd0f2309d0..56dd1f4e3e4447 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -23,15 +23,15 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int 
size)
memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
  }
  
-/* Fix the branch target addresses for subprog calls */

-static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, u32 *image,
-  struct codegen_context *ctx, u32 *addrs)
+/* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass 
*/
+static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
+  struct codegen_context *ctx, u32 *addrs)
  {
const struct bpf_insn *insn = fp->insnsi;
bool func_addr_fixed;
u64 func_addr;
u32 tmp_idx;
-   int i, ret;
+   int i, j, ret;
  
  	for (i = 0; i < fp->len; i++) {

/*
@@ -66,6 +66,23 @@ static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, 
u32 *image,
 * of the JITed sequence remains unchanged.
 */
ctx->idx = tmp_idx;
+   } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) {
+   tmp_idx = ctx->idx;
+   ctx->idx = addrs[i] / 4;
+#ifdef CONFIG_PPC32
+   PPC_LI32(ctx->b2p[insn[i].dst_reg] - 1, (u32)insn[i + 
1].imm);
+   PPC_LI32(ctx->b2p[insn[i].dst_reg], (u32)insn[i].imm);
+   for (j = ctx->idx - addrs[i] / 4; j < 4; j++)
+   EMIT(PPC_RAW_NOP());
+#else
+   func_addr = ((u64)(u32)insn[i].imm) | (((u64)(u32)insn[i + 
1].imm) << 32);
+   PPC_LI64(b2p[insn[i].dst_reg], func_addr);
+   /* overwrite rest with nops */
+   for (j = ctx->idx - addrs[i] / 4; j < 5; j++)
+   EMIT(PPC_RAW_NOP());
+#endif


#ifdefs should be avoided as much as possible.

Here it seems we could easily do an

if (IS_ENABLED(CONFIG_PPC32)) {
} else {
}

And it looks like the CONFIG_PPC64 alternative would in fact also work 
on PPC32, wouldn't it ?


We never implemented PPC_LI64() for ppc32:
 /linux/arch/powerpc/net/bpf_jit_comp.c: In function 
 'bpf_jit_fixup_addresses':

 /linux/arch/powerpc/net/bpf_jit_comp.c:81:5: error: this decimal constant is 
unsigned only in ISO C90 [-Werror]
81 | PPC_LI64(b2p[insn[i].dst_reg], func_addr);
| ^~~~
 /linux/arch/powerpc/net/bpf_jit_comp.c:81:5: error: this decimal constant is 
unsigned only in ISO C90 [-Werror]
 In file included from /linux/arch/powerpc/net/bpf_jit_comp.c:19:
 /linux/arch/powerpc/net/bpf_jit.h:78:40: error: right shift count >= width of 
type [-Werror=shift-count-overflow]
78 | EMIT(PPC_RAW_LI(d, ((uintptr_t)(i) >> 32) &   \
|^~


We should move that out from bpf_jit.h


- Naveen



Re: [PATCH 02/13] powerpc32/bpf: Fix codegen for bpf-to-bpf calls

2022-01-10 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 06/01/2022 à 12:45, Naveen N. Rao a écrit :

Pad instructions emitted for BPF_CALL so that the number of instructions
generated does not change for different function addresses. This is
especially important for calls to other bpf functions, whose address
will only be known during extra pass.


In first pass, 'image' is NULL and we emit the 4 instructions sequence 
already, so the code won't grow after first pass, it can only shrink.


Right, but this patch addresses the scenario where the function address 
is only provided during the extra pass. So, even though we will not 
write past the end of the BPF image, the emitted instructions can still 
be wrong.




On PPC32, a huge effort is made to minimise the situations where 'bl' 
cannot be used, see commit 2ec13df16704 ("powerpc/modules: Load modules 
closer to kernel text")


And if you take the 8xx for instance, a NOP a just like any other 
instruction, it takes one cycle.


If it is absolutely needed, then I'd prefer we use an out-of-line 
trampoline for the unlikely case and use 'bl' to that trampoline.


Yes, something like that will be nice to do, but we will still need this 
patch for -stable.


The other option is to redo the whole JIT during the extra pass, but 
only if we can ensure that we have provisioned for the maximum image 
size.



- Naveen



Re: [PATCH 01/13] bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()

2022-01-10 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 06/01/2022 à 12:45, Naveen N. Rao a écrit :

task_pt_regs() can return NULL on powerpc for kernel threads. This is
then used in __bpf_get_stack() to check for user mode, resulting in a
kernel oops. Guard against this by checking return value of
task_pt_regs() before trying to obtain the call chain.


I started looking at that some time ago, and I'm wondering whether it is 
worth keeping that powerpc particularity.


We used to have a potentially different pt_regs depending on how we 
entered kernel, especially on PPC32, but since the following commits it 
is not the case anymore.


06d67d54741a ("powerpc: make process.c suitable for both 32-bit and 64-bit")
db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry")
b5cfc9cd7b04 ("powerpc/32: Fix critical and debug interrupts on BOOKE")

We could therefore just do like other architectures, define

#define task_pt_regs(p) ((struct pt_regs *)(THREAD_SIZE + 
task_stack_page(p)) - 1)


And then remove the regs field we have in thread_struct.


Sure, I don't have an opinion on that, but I think this patch will still 
be needed for -stable.







Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()")
Cc: sta...@vger.kernel.org # v5.9+
Signed-off-by: Naveen N. Rao 
---
  kernel/bpf/stackmap.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 6e75bbee39f0b5..0dcaed4d3f4cec 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -525,13 +525,14 @@ BPF_CALL_4(bpf_get_task_stack, struct task_struct *, 
task, void *, buf,
   u32, size, u64, flags)
  {
struct pt_regs *regs;
-   long res;
+   long res = -EINVAL;
  
  	if (!try_get_task_stack(task))

return -EFAULT;
  
  	regs = task_pt_regs(task);

-   res = __bpf_get_stack(regs, task, NULL, buf, size, flags);
+   if (regs)
+   res = __bpf_get_stack(regs, task, NULL, buf, size, flags);


Should there be a comment explaining that on powerpc, 'regs' can be NULL 
for a kernel thread ?


I guess this won't be required if we end up with the change you are 
proposing above?



- Naveen



[PATCH] powerpc/security: Provide stubs for when PPC_BARRIER_NOSPEC isn't enabled

2022-01-10 Thread Naveen N. Rao
kernel test robot reported the below build error with a randconfig:
  powerpc64-linux-ld: arch/powerpc/net/bpf_jit_comp64.o:(.toc+0x0):
  undefined reference to `powerpc_security_features'

This can happen if CONFIG_PPC_BARRIER_NOSPEC is not enabled. Address
this by providing stub functions for security_ftr_enabled() and related
helpers when the config option is not enabled.

Reported-by: kernel test robot 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/security_features.h | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/security_features.h 
b/arch/powerpc/include/asm/security_features.h
index 27574f218b371f..f2b990052641a0 100644
--- a/arch/powerpc/include/asm/security_features.h
+++ b/arch/powerpc/include/asm/security_features.h
@@ -8,10 +8,6 @@
 #ifndef _ASM_POWERPC_SECURITY_FEATURES_H
 #define _ASM_POWERPC_SECURITY_FEATURES_H
 
-
-extern u64 powerpc_security_features;
-extern bool rfi_flush;
-
 /* These are bit flags */
 enum stf_barrier_type {
STF_BARRIER_NONE= 0x1,
@@ -20,6 +16,10 @@ enum stf_barrier_type {
STF_BARRIER_SYNC_ORI= 0x8,
 };
 
+#ifdef CONFIG_PPC_BARRIER_NOSPEC
+extern u64 powerpc_security_features;
+extern bool rfi_flush;
+
 void setup_stf_barrier(void);
 void do_stf_barrier_fixups(enum stf_barrier_type types);
 void setup_count_cache_flush(void);
@@ -45,6 +45,13 @@ enum stf_barrier_type stf_barrier_type_get(void);
 static inline enum stf_barrier_type stf_barrier_type_get(void) { return 
STF_BARRIER_NONE; }
 #endif
 
+#else /* CONFIG_PPC_BARRIER_NOSPEC */
+static inline void security_ftr_set(u64 feature) { }
+static inline void security_ftr_clear(u64 feature) { }
+static inline bool security_ftr_enabled(u64 feature) { return false; }
+static inline enum stf_barrier_type stf_barrier_type_get(void) { return 
STF_BARRIER_NONE; }
+#endif /* CONFIG_PPC_BARRIER_NOSPEC */
+
 // Features indicating support for Spectre/Meltdown mitigations
 
 // The L1-D cache can be flushed with ori r30,r30,0

base-commit: bdcf18e133f656b2c97390a594fc95e37849e682
-- 
2.34.1



Re: [PATCH 03/13] powerpc/bpf: Update ldimm64 instructions during extra pass

2022-01-10 Thread Christophe Leroy


Le 06/01/2022 à 12:45, Naveen N. Rao a écrit :
> These instructions are updated after the initial JIT, so redo codegen
> during the extra pass. Rename bpf_jit_fixup_subprog_calls() to clarify
> that this is more than just subprog calls.
> 
> Fixes: 69c087ba6225b5 ("bpf: Add bpf_for_each_map_elem() helper")
> Cc: sta...@vger.kernel.org # v5.15
> Signed-off-by: Naveen N. Rao 
> ---
>   arch/powerpc/net/bpf_jit_comp.c   | 29 +++--
>   arch/powerpc/net/bpf_jit_comp32.c |  6 ++
>   arch/powerpc/net/bpf_jit_comp64.c |  7 ++-
>   3 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index d6ffdd0f2309d0..56dd1f4e3e4447 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -23,15 +23,15 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned 
> int size)
>   memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
>   }
>   
> -/* Fix the branch target addresses for subprog calls */
> -static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, u32 *image,
> -struct codegen_context *ctx, u32 *addrs)
> +/* Fix updated addresses (for subprog calls, ldimm64, et al) during extra 
> pass */
> +static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
> +struct codegen_context *ctx, u32 *addrs)
>   {
>   const struct bpf_insn *insn = fp->insnsi;
>   bool func_addr_fixed;
>   u64 func_addr;
>   u32 tmp_idx;
> - int i, ret;
> + int i, j, ret;
>   
>   for (i = 0; i < fp->len; i++) {
>   /*
> @@ -66,6 +66,23 @@ static int bpf_jit_fixup_subprog_calls(struct bpf_prog 
> *fp, u32 *image,
>* of the JITed sequence remains unchanged.
>*/
>   ctx->idx = tmp_idx;
> + } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) {
> + tmp_idx = ctx->idx;
> + ctx->idx = addrs[i] / 4;
> +#ifdef CONFIG_PPC32
> + PPC_LI32(ctx->b2p[insn[i].dst_reg] - 1, (u32)insn[i + 
> 1].imm);
> + PPC_LI32(ctx->b2p[insn[i].dst_reg], (u32)insn[i].imm);
> + for (j = ctx->idx - addrs[i] / 4; j < 4; j++)
> + EMIT(PPC_RAW_NOP());
> +#else
> + func_addr = ((u64)(u32)insn[i].imm) | 
> (((u64)(u32)insn[i + 1].imm) << 32);
> + PPC_LI64(b2p[insn[i].dst_reg], func_addr);
> + /* overwrite rest with nops */
> + for (j = ctx->idx - addrs[i] / 4; j < 5; j++)
> + EMIT(PPC_RAW_NOP());
> +#endif

#ifdefs should be avoided as much as possible.

Here it seems we could easily do an

if (IS_ENABLED(CONFIG_PPC32)) {
} else {
}

And it looks like the CONFIG_PPC64 alternative would in fact also work 
on PPC32, wouldn't it ?


> + ctx->idx = tmp_idx;
> + i++;
>   }
>   }
>   
> @@ -200,13 +217,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
> *fp)
>   /*
>* Do not touch the prologue and epilogue as they will remain
>* unchanged. Only fix the branch target address for subprog
> -  * calls in the body.
> +  * calls in the body, and ldimm64 instructions.
>*
>* This does not change the offsets and lengths of the subprog
>* call instruction sequences and hence, the size of the JITed
>* image as well.
>*/
> - bpf_jit_fixup_subprog_calls(fp, code_base, , addrs);
> + bpf_jit_fixup_addresses(fp, code_base, , addrs);
>   
>   /* There is no need to perform the usual passes. */
>   goto skip_codegen_passes;
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index 997a47fa615b30..2258d3886d02ec 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -293,6 +293,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> struct codegen_context *
>   bool func_addr_fixed;
>   u64 func_addr;
>   u32 true_cond;
> + u32 tmp_idx;
> + int j;
>   
>   /*
>* addrs[] maps a BPF bytecode address into a real offset from
> @@ -908,8 +910,12 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> struct codegen_context *
>* 16 byte instruction that uses two 'struct bpf_insn'
>*/
>   case BPF_LD | BPF_IMM | BPF_DW: /* dst = (u64) imm */
> + tmp_idx = ctx->idx;
>   PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm);
>   PPC_LI32(dst_reg, (u32)insn[i].imm);
> + /* 

Re: [PATCH 11/13] powerpc64/bpf elfv2: Setup kernel TOC in r2 on entry

2022-01-10 Thread Christophe Leroy


Le 06/01/2022 à 12:45, Naveen N. Rao a écrit :
> In preparation for using kernel TOC, load the same in r2 on entry. With
> elfv1, the kernel TOC is already setup by our caller so we just emit a
> nop. We adjust the number of instructions to skip on a tail call
> accordingly.
> 
> Signed-off-by: Naveen N. Rao 
> ---
>   arch/powerpc/net/bpf_jit_comp64.c | 8 +++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index ce4fc59bbd6a92..e05b577d95bf11 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -73,6 +73,12 @@ void bpf_jit_build_prologue(u32 *image, struct 
> codegen_context *ctx)
>   {
>   int i;
>   
> +#ifdef PPC64_ELF_ABI_v2
> + PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
> +#else
> + EMIT(PPC_RAW_NOP());
> +#endif

Can we avoid the #ifdef, using

if (__is_defined(PPC64_ELF_ABI_v2))
PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
else
EMIT(PPC_RAW_NOP());

> +
>   /*
>* Initialize tail_call_cnt if we do tail calls.
>* Otherwise, put in NOPs so that it can be skipped when we are
> @@ -87,7 +93,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
> codegen_context *ctx)
>   EMIT(PPC_RAW_NOP());
>   }
>   
> -#define BPF_TAILCALL_PROLOGUE_SIZE   8
> +#define BPF_TAILCALL_PROLOGUE_SIZE   12

Why not change that for v2 ABI only instead of adding a NOP ? ABI won't 
change during runtime AFAIU

>   
>   if (bpf_has_stack_frame(ctx)) {
>   /*

Re: [PATCH 02/13] powerpc32/bpf: Fix codegen for bpf-to-bpf calls

2022-01-10 Thread Christophe Leroy


Le 06/01/2022 à 12:45, Naveen N. Rao a écrit :
> Pad instructions emitted for BPF_CALL so that the number of instructions
> generated does not change for different function addresses. This is
> especially important for calls to other bpf functions, whose address
> will only be known during extra pass.

In first pass, 'image' is NULL and we emit the 4 instructions sequence 
already, so the code won't grow after first pass, it can only shrink.

On PPC32, a huge effort is made to minimise the situations where 'bl' 
cannot be used, see commit 2ec13df16704 ("powerpc/modules: Load modules 
closer to kernel text")

And if you take the 8xx for instance, a NOP a just like any other 
instruction, it takes one cycle.

If it is absolutely needed, then I'd prefer we use an out-of-line 
trampoline for the unlikely case and use 'bl' to that trampoline.

> 
> Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
> Cc: sta...@vger.kernel.org # v5.13+
> Signed-off-by: Naveen N. Rao 
> ---
>   arch/powerpc/net/bpf_jit_comp32.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index d3a52cd42f5346..997a47fa615b30 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -191,6 +191,9 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
> codegen_context *ctx, u64 fun
>   
>   if (image && rel < 0x200 && rel >= -0x200) {
>   PPC_BL_ABS(func);
> + EMIT(PPC_RAW_NOP());
> + EMIT(PPC_RAW_NOP());
> + EMIT(PPC_RAW_NOP());
>   } else {
>   /* Load function address into r0 */
>   EMIT(PPC_RAW_LIS(_R0, IMM_H(func)));

Re: [PATCH 01/13] bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()

2022-01-10 Thread Christophe Leroy


Le 06/01/2022 à 12:45, Naveen N. Rao a écrit :
> task_pt_regs() can return NULL on powerpc for kernel threads. This is
> then used in __bpf_get_stack() to check for user mode, resulting in a
> kernel oops. Guard against this by checking return value of
> task_pt_regs() before trying to obtain the call chain.

I started looking at that some time ago, and I'm wondering whether it is 
worth keeping that powerpc particularity.

We used to have a potentially different pt_regs depending on how we 
entered kernel, especially on PPC32, but since the following commits it 
is not the case anymore.

06d67d54741a ("powerpc: make process.c suitable for both 32-bit and 64-bit")
db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry")
b5cfc9cd7b04 ("powerpc/32: Fix critical and debug interrupts on BOOKE")

We could therefore just do like other architectures, define

#define task_pt_regs(p) ((struct pt_regs *)(THREAD_SIZE + 
task_stack_page(p)) - 1)

And then remove the regs field we have in thread_struct.


> 
> Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()")
> Cc: sta...@vger.kernel.org # v5.9+
> Signed-off-by: Naveen N. Rao 
> ---
>   kernel/bpf/stackmap.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 6e75bbee39f0b5..0dcaed4d3f4cec 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -525,13 +525,14 @@ BPF_CALL_4(bpf_get_task_stack, struct task_struct *, 
> task, void *, buf,
>  u32, size, u64, flags)
>   {
>   struct pt_regs *regs;
> - long res;
> + long res = -EINVAL;
>   
>   if (!try_get_task_stack(task))
>   return -EFAULT;
>   
>   regs = task_pt_regs(task);
> - res = __bpf_get_stack(regs, task, NULL, buf, size, flags);
> + if (regs)
> + res = __bpf_get_stack(regs, task, NULL, buf, size, flags);

Should there be a comment explaining that on powerpc, 'regs' can be NULL 
for a kernel thread ?

>   put_task_stack(task);
>   
>   return res;

Re: [PATCH 02/16] floppy: Remove usage of the deprecated "pci-dma-compat.h" API

2022-01-10 Thread Christoph Hellwig


Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 06/23] powerpc/membarrier: Remove special barrier on mm switch

2022-01-10 Thread Christophe Leroy


Le 08/01/2022 à 17:43, Andy Lutomirski a écrit :
> powerpc did the following on some, but not all, paths through
> switch_mm_irqs_off():
> 
> /*
>  * Only need the full barrier when switching between processes.
>  * Barrier when switching from kernel to userspace is not
>  * required here, given that it is implied by mmdrop(). Barrier
>  * when switching from userspace to kernel is not needed after
>  * store to rq->curr.
>  */
> if (likely(!(atomic_read(>membarrier_state) &
>  (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
>   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
> return;
> 
> This is puzzling: if !prev, then one might expect that we are switching
> from kernel to user, not user to kernel, which is inconsistent with the
> comment.  But this is all nonsense, because the one and only caller would
> never have prev == NULL and would, in fact, OOPS if prev == NULL.
> 
> In any event, this code is unnecessary, since the new generic
> membarrier_finish_switch_mm() provides the same barrier without arch help.

I can't find this function membarrier_finish_switch_mm(), neither in 
Linus tree, nor in linux-next tree.

> 
> arch/powerpc/include/asm/membarrier.h remains as an empty header,
> because a later patch in this series will add code to it.
> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Nicholas Piggin 
> Cc: Mathieu Desnoyers 
> Cc: Peter Zijlstra 
> Signed-off-by: Andy Lutomirski 
> ---
>   arch/powerpc/include/asm/membarrier.h | 24 
>   arch/powerpc/mm/mmu_context.c |  1 -
>   2 files changed, 25 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/membarrier.h 
> b/arch/powerpc/include/asm/membarrier.h
> index de7f79157918..b90766e95bd1 100644
> --- a/arch/powerpc/include/asm/membarrier.h
> +++ b/arch/powerpc/include/asm/membarrier.h
> @@ -1,28 +1,4 @@
>   #ifndef _ASM_POWERPC_MEMBARRIER_H
>   #define _ASM_POWERPC_MEMBARRIER_H
>   
> -static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
> -  struct mm_struct *next,
> -  struct task_struct *tsk)
> -{
> - /*
> -  * Only need the full barrier when switching between processes.
> -  * Barrier when switching from kernel to userspace is not
> -  * required here, given that it is implied by mmdrop(). Barrier
> -  * when switching from userspace to kernel is not needed after
> -  * store to rq->curr.
> -  */
> - if (IS_ENABLED(CONFIG_SMP) &&
> - likely(!(atomic_read(>membarrier_state) &
> -  (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
> -   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
> - return;
> -
> - /*
> -  * The membarrier system call requires a full memory barrier
> -  * after storing to rq->curr, before going back to user-space.
> -  */
> - smp_mb();
> -}
> -
>   #endif /* _ASM_POWERPC_MEMBARRIER_H */
> diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
> index 74246536b832..5f2daa6b0497 100644
> --- a/arch/powerpc/mm/mmu_context.c
> +++ b/arch/powerpc/mm/mmu_context.c
> @@ -84,7 +84,6 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
> mm_struct *next,
>   asm volatile ("dssall");
>   
>   if (!new_on_cpu)
> - membarrier_arch_switch_mm(prev, next, tsk);

Are you sure that's what you want ?

It now means you have:

if (!new_on_cpu)
switch_mmu_context(prev, next, tsk);


>   
>   /*
>* The actual HW switching method differs between the various

Re: [PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2022-01-10 Thread Christophe Leroy
Hi Qu,

Le 05/01/2022 à 00:32, Qu Wenruo a écrit :
> Hi Christophe,
> 
> I'm recently enhancing the subpage support for btrfs, and my current
> branch should solve the problem for btrfs to support larger page sizes.
> 
> But unfortunately my current test environment can only provide page size
> with 64K or 4K, no 16K or 128K/256K support.
> 
> Mind to test my new branch on 128K page size systems?
> (256K page size support is still lacking though, which will be addressed
> in the future)


I don't have any system with disk, I only use flashdisks with UBIFS 
filesystem.

The reason why I did this commit was because of a build failure reported 
by Kernel Build Robot, that's it.

Also note that powerpc doesn't have 128K pages. Only 4/16/64/256.

And for 256 it requires a special version of ld and binutils that I 
don't have.

I have a board where I can do 16k pages, but again that board has no disk.

Christophe

> 
> https://github.com/adam900710/linux/tree/metadata_subpage_switch
> 
> Thanks,
> Qu
> 
> On 2021/6/10 13:23, Christophe Leroy wrote:
>> With a config having PAGE_SIZE set to 256K, BTRFS build fails
>> with the following message
>>
>>   include/linux/compiler_types.h:326:38: error: call to 
>> '__compiletime_assert_791' declared with attribute error: BUILD_BUG_ON 
>> failed: (BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0
>>
>> BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with
>> 256K pages at the time being.
>>
>> There are two platforms that can select 256K pages:
>>   - hexagon
>>   - powerpc
>>
>> Disable BTRFS when 256K page size is selected.
>>
>> Reported-by: kernel test robot 
>> Signed-off-by: Christophe Leroy 
>> ---
>>   fs/btrfs/Kconfig | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
>> index 68b95ad82126..520a0f6a7d9e 100644
>> --- a/fs/btrfs/Kconfig
>> +++ b/fs/btrfs/Kconfig
>> @@ -18,6 +18,8 @@ config BTRFS_FS
>>   select RAID6_PQ
>>   select XOR_BLOCKS
>>   select SRCU
>> +    depends on !PPC_256K_PAGES    # powerpc
>> +    depends on !PAGE_SIZE_256KB    # hexagon
>>
>>   help
>>     Btrfs is a general purpose copy-on-write filesystem with extents,

Re: [PATCH v7 1/3] riscv: Introduce CONFIG_RELOCATABLE

2022-01-10 Thread Alexandre ghiti

Hi Palmer,

Do you think this could go in for-next?

Thanks,

Alex

On 12/6/21 10:44, Alexandre ghiti wrote:

@Palmer, can I do anything for that to be pulled in 5.17?

Thanks,

Alex

On 10/27/21 07:04, Alexandre ghiti wrote:

Hi Palmer,

On 10/26/21 11:29 PM, Palmer Dabbelt wrote:

On Sat, 09 Oct 2021 10:20:20 PDT (-0700), a...@ghiti.fr wrote:

Arf, I have sent this patchset with the wrong email address. @Palmer
tell me if you want me to resend it correctly.

Sorry for being kind of slow here.  It's fine: there's a "From:" in
the patch, and git picks those up so it'll match the signed-off-by
line.  I send pretty much all my patches that way, as I never managed
to get my Google address working correctly.


Thanks,

Alex

On 10/9/21 7:12 PM, Alexandre Ghiti wrote:

From: Alexandre Ghiti 

This config allows to compile 64b kernel as PIE and to relocate it at
any virtual address at runtime: this paves the way to KASLR.
Runtime relocation is possible since relocation metadata are
embedded into
the kernel.

IMO this should really be user selectable, at a bare minimum so it's
testable.
I just sent along a patch to do that (my power's off at home, so email
is a bit
wacky right now).

I haven't put this on for-next yet as I'm not sure if you had a fix
for the
kasan issue (which IIUC would conflict with this).


The kasan issue only revealed that I need to move the kasan shadow
memory around with sv48 support, that's not related to the relocatable
kernel.

Thanks,

Alex



Note that relocating at runtime introduces an overhead even if the
kernel is loaded at the same address it was linked at and that the
compiler
options are those used in arm64 which uses the same RELA relocation
format.

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/Kconfig  | 12 
  arch/riscv/Makefile |  7 +++--
  arch/riscv/kernel/vmlinux.lds.S |  6 
  arch/riscv/mm/Makefile  |  4 +++
  arch/riscv/mm/init.c    | 54 
-

  5 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ea16fa2dd768..043ba92559fa 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -213,6 +213,18 @@ config PGTABLE_LEVELS
  config LOCKDEP_SUPPORT
  def_bool y

+config RELOCATABLE
+    bool
+    depends on MMU && 64BIT && !XIP_KERNEL
+    help
+  This builds a kernel as a Position Independent Executable
(PIE),
+  which retains all relocation metadata required to
relocate the
+  kernel binary at runtime to a different virtual address
than the
+  address it was linked at.
+  Since RISCV uses the RELA relocation format, this 
requires a

+  relocation pass at runtime even if the kernel is loaded
at the
+  same address it was linked at.
+
  source "arch/riscv/Kconfig.socs"
  source "arch/riscv/Kconfig.erratas"

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 0eb4568fbd29..2f509915f246 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -9,9 +9,12 @@
  #

  OBJCOPYFLAGS    := -O binary
-LDFLAGS_vmlinux :=
+ifeq ($(CONFIG_RELOCATABLE),y)
+    LDFLAGS_vmlinux += -shared -Bsymbolic -z notext -z norelro
+    KBUILD_CFLAGS += -fPIE
+endif
  ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
-    LDFLAGS_vmlinux := --no-relax
+    LDFLAGS_vmlinux += --no-relax
  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
  CC_FLAGS_FTRACE := -fpatchable-function-entry=8
  endif
diff --git a/arch/riscv/kernel/vmlinux.lds.S
b/arch/riscv/kernel/vmlinux.lds.S
index 5104f3a871e3..862a8c09723c 100644
--- a/arch/riscv/kernel/vmlinux.lds.S
+++ b/arch/riscv/kernel/vmlinux.lds.S
@@ -133,6 +133,12 @@ SECTIONS

  BSS_SECTION(PAGE_SIZE, PAGE_SIZE, 0)

+    .rela.dyn : ALIGN(8) {
+    __rela_dyn_start = .;
+    *(.rela .rela*)
+    __rela_dyn_end = .;
+    }
+
  #ifdef CONFIG_EFI
  . = ALIGN(PECOFF_SECTION_ALIGNMENT);
  __pecoff_data_virt_size = ABSOLUTE(. - __pecoff_text_end);
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index 7ebaef10ea1b..2d33ec574bbb 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -1,6 +1,10 @@
  # SPDX-License-Identifier: GPL-2.0-only

  CFLAGS_init.o := -mcmodel=medany
+ifdef CONFIG_RELOCATABLE
+CFLAGS_init.o += -fno-pie
+endif
+
  ifdef CONFIG_FTRACE
  CFLAGS_REMOVE_init.o = $(CC_FLAGS_FTRACE)
  CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index c0cddf0fc22d..42041c12d496 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -20,6 +20,9 @@
  #include 
  #include 
  #include 
+#ifdef CONFIG_RELOCATABLE
+#include 
+#endif

  #include 
  #include 
@@ -103,7 +106,7 @@ static void __init print_vm_layout(void)
  print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
    (unsigned long)high_memory);
  #ifdef CONFIG_64BIT
-    print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
+    print_mlm("kernel", (unsigned 

Re: [PATCH v2 1/2] powerpc: Fix virt_addr_valid() check

2022-01-10 Thread Christophe Leroy


Le 25/12/2021 à 13:06, Kefeng Wang a écrit :
> When run ethtool eth0, the BUG occurred,
> 
>usercopy: Kernel memory exposure attempt detected from SLUB object not in 
> SLUB page?! (offset 0, size 1048)!
>kernel BUG at mm/usercopy.c:99
>...
>usercopy_abort+0x64/0xa0 (unreliable)
>__check_heap_object+0x168/0x190
>__check_object_size+0x1a0/0x200
>dev_ethtool+0x2494/0x2b20
>dev_ioctl+0x5d0/0x770
>sock_do_ioctl+0xf0/0x1d0
>sock_ioctl+0x3ec/0x5a0
>__se_sys_ioctl+0xf0/0x160
>system_call_exception+0xfc/0x1f0
>system_call_common+0xf8/0x200
> 
> The code shows below,
> 
>data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
>copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
> 
> The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
> on PowerPC64, which leads to the panic.
> 
> As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
> and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
> the virt_addr_valid().

The change done by that commit only applies to PPC64.

The change you are doing applies to both PPC64 and PPC32. Did you verify 
the impact (or should I say the absence of impact) on PPC32 ?

> 
> Signed-off-by: Kefeng Wang 
> ---
>   arch/powerpc/include/asm/page.h | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index 254687258f42..300d4c105a3a 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -132,7 +132,10 @@ static inline bool pfn_valid(unsigned long pfn)
>   #define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr))
>   #define pfn_to_kaddr(pfn)   __va((pfn) << PAGE_SHIFT)
>   
> -#define virt_addr_valid(kaddr)   pfn_valid(virt_to_pfn(kaddr))
> +#define virt_addr_valid(vaddr)   ({  
> \
> + unsigned long _addr = (unsigned long)vaddr; 
> \
> + (unsigned long)(_addr) >= PAGE_OFFSET && pfn_valid(virt_to_pfn(_addr)); 
> \

_addr is already an 'unsigned long' so you shouldnt need to cast it.

> +})
>   
>   /*
>* On Book-E parts we need __va to parse the device tree and we can't