date:20220214

Re: [PATCH 14/14] uaccess: drop set_fs leftovers

2022-02-14 Thread Helge Deller

On 2/15/22 04:03, Al Viro wrote:
> On Mon, Feb 14, 2022 at 05:34:52PM +0100, Arnd Bergmann wrote:
>> diff --git a/arch/parisc/include/asm/futex.h 
>> b/arch/parisc/include/asm/futex.h
>> index b5835325d44b..2f4a1b1ef387 100644
>> --- a/arch/parisc/include/asm/futex.h
>> +++ b/arch/parisc/include/asm/futex.h
>> @@ -99,7 +99,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
>>  /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
>>   * our gateway page, and causes no end of trouble...
>>   */
>> -if (uaccess_kernel() && !uaddr)
>> +if (!uaddr)
>>  return -EFAULT;
>
>   Huh?  uaccess_kernel() is removed since it becomes always false now,
> so this looks odd.
>
>   AFAICS, the comment above that check refers to futex_detect_cmpxchg()
> -> cmpxchg_futex_value_locked() -> futex_atomic_cmpxchg_inatomic() call chain.
> Which had been gone since commit 3297481d688a (futex: Remove futex_cmpxchg
> detection).  The comment *and* the check should've been killed off back
> then.
>   Let's make sure to get both now...

Right. Arnd, can you drop this if() and the comment above it?

Thanks,
Helge

Re: [PATCH 09/14] m68k: drop custom __access_ok()

2022-02-14 Thread Al Viro

On Tue, Feb 15, 2022 at 07:29:42AM +0100, Christoph Hellwig wrote:
> On Tue, Feb 15, 2022 at 12:37:41AM +, Al Viro wrote:
> > Perhaps simply wrap that sucker into #ifdef CONFIG_CPU_HAS_ADDRESS_SPACES
> > (and trim the comment down to "coldfire and 68000 will pick generic
> > variant")?
> 
> I wonder if we should invert CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE,
> select the separate address space config for s390, sparc64, non-coldfire
> m68k and mips with EVA and then just have one single access_ok for
> overlapping address space (as added by Arnd) and non-overlapping ones
> (always return true).

parisc is also such...  How about

select ALTERNATE_SPACE_USERLAND

for that bunch?  While we are at it, how many unusual access_ok() instances are
left after this series?  arm64, itanic, um, anything else?

FWIW, sparc32 has a slightly unusual instance (see uaccess_32.h there); it's
obviously cheaper than generic and I wonder if the trick is legitimate (and
applicable elsewhere, perhaps)...

Re: [PATCH 09/14] m68k: drop custom __access_ok()

2022-02-14 Thread Christoph Hellwig

On Tue, Feb 15, 2022 at 12:37:41AM +, Al Viro wrote:
> Perhaps simply wrap that sucker into #ifdef CONFIG_CPU_HAS_ADDRESS_SPACES
> (and trim the comment down to "coldfire and 68000 will pick generic
> variant")?

I wonder if we should invert CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE,
select the separate address space config for s390, sparc64, non-coldfire
m68k and mips with EVA and then just have one single access_ok for
overlapping address space (as added by Arnd) and non-overlapping ones
(always return true).

Re: [PATCH] powerpc/module_64: use module_init_section instead of patching names

2022-02-14 Thread Michael Ellerman

On Wed, 2 Feb 2022 05:51:23 +, Wedson Almeida Filho wrote:
> Without this patch, module init sections are disabled by patching their
> names in arch-specific code when they're loaded (which prevents code in
> layout_sections from finding init sections). This patch uses the new
> arch-specific module_init_section instead.
> 
> This allows modules that have .init_array sections to have the
> initialisers properly called (on load, before init). Without this patch,
> the initialisers are not called because .init_array is renamed to
> _init_array, and thus isn't found by code in find_module_sections().
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/module_64: use module_init_section instead of patching names
  https://git.kernel.org/powerpc/c/d4be60fe66b7380530868ceebe549f8eebccacc5

cheers

Re: [PATCH] powerpc: epapr: A typo fix

2022-02-14 Thread Michael Ellerman

On Sun, 21 Mar 2021 03:09:32 +0530, Bhaskar Chowdhury wrote:
> s/parmeters/parameters/
> 
> 

Applied to powerpc/next.

[1/1] powerpc: epapr: A typo fix
  https://git.kernel.org/powerpc/c/a1c414093370ed50e5b952d96d4ae775c7a18420

cheers

Re: [PATCH 1/1] powerpc/e500/qemu-e500: allow core to idle without waiting

2022-02-14 Thread Michael Ellerman

On Wed, 12 Jan 2022 12:24:59 +0100, Joachim Wiberg wrote:
> From: Tobias Waldekranz 
> 
> This means an idle guest won't needlessly consume an entire core on
> the host, waiting for work to show up.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/e500/qemu-e500: allow core to idle without waiting
  https://git.kernel.org/powerpc/c/f529edd1b69ddf832c3257dcd34e15100038d6b7

cheers

Re: [PATCH] powerpc: dts: Fix some I2C unit addresses

2022-02-14 Thread Michael Ellerman

On Mon, 20 Dec 2021 14:40:36 +0100, Thierry Reding wrote:
> From: Thierry Reding 
> 
> The unit-address for the Maxim MAX1237 ADCs on XPedite5200 boards don't
> match the value in the "reg" property and cause a DTC warning.
> 
> 

Applied to powerpc/next.

[1/1] powerpc: dts: Fix some I2C unit addresses
  https://git.kernel.org/powerpc/c/d5342fdd163ae0553a14820021a107e03eb1ea72

cheers

Re: [PATCH]powerpc/xive: Export XIVE IPI information for online-only processors.

2022-02-14 Thread Michael Ellerman

On Thu, 06 Jan 2022 16:33:53 +0530, Sachin Sant wrote:
> Cédric pointed out that XIVE IPI information exported via sysfs
> (debug/powerpc/xive) display empty lines for processors which are
> not online.
> 
> Switch to using for_each_online_cpu() so that information is
> displayed for online-only processors.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/xive: Export XIVE IPI information for online-only processors.
  https://git.kernel.org/powerpc/c/279d1a72c0f8021520f68ddb0a1346ff9ba1ea8c

cheers

Re: [PATCH] powerpc: add link stack flush mitigation status in debugfs.

2022-02-14 Thread Michael Ellerman

On Wed, 27 Nov 2019 23:09:59 +0100, Michal Suchanek wrote:
> The link stack flush status is not visible in debugfs. It can be enabled
> even when count cache flush is disabled. Add separate file for its
> status.
> 
> 

Applied to powerpc/next.

[1/1] powerpc: add link stack flush mitigation status in debugfs.
  https://git.kernel.org/powerpc/c/b2a6f6043577e09d51a4b5577fff9fc9f5b14b1c

cheers

Re: [PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules

2022-02-14 Thread Michael Ellerman

On Thu, 24 Sep 2020 10:44:16 +0530, Mamatha Inamdar wrote:
> This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
> (descriptions taken from Kconfig file)
> 
> 

Applied to powerpc/next.

[1/1] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules
  https://git.kernel.org/powerpc/c/be7be1c6c6f8bd348f0d83abe7a8f0e21bdaeac8

cheers

Re: [PATCH v5] powerpc/pseries: read the lpar name from the firmware

2022-02-14 Thread Michael Ellerman

On Thu, 6 Jan 2022 17:13:39 +0100, Laurent Dufour wrote:
> The LPAR name may be changed after the LPAR has been started in the HMC.
> In that case lparstat command is not reporting the updated value because it
> reads it from the device tree which is read at boot time.
> 
> However this value could be read from RTAS.
> 
> Adding this value in the /proc/powerpc/lparcfg output allows to read the
> updated value.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries: read the lpar name from the firmware
  https://git.kernel.org/powerpc/c/eddaa9a402758d379520f6511fb61e89990698aa

cheers

Re: [PATCH] powerpc/spufs: adjust list element pointer type

2022-02-14 Thread Michael Ellerman

On Fri, 08 May 2020 09:12:56 +, Julia Lawall wrote:
> Other uses of >aff_list_head, eg in spufs_assert_affinity, indicate
> that the list elements have type spu_context, not spu as used here.  Change
> the type of tmp accordingly.
> 
> This has no impact on the execution, because tmp is not used in the body of
> the loop.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/spufs: adjust list element pointer type
  https://git.kernel.org/powerpc/c/925f76c55784fdc17ab41aecde06b30439ceb73a

cheers

Re: [PATCH] powerpc: Fix debug print in smp_setup_cpu_maps

2022-02-14 Thread Michael Ellerman

On Wed, 20 Jan 2021 15:18:47 -0300, Fabiano Rosas wrote:
> When figuring out the number of threads, the debug message prints "1
> thread" for the first iteration of the loop, instead of the actual
> number of threads calculated from the length of the
> "ibm,ppc-interrupt-server#s" property.
> 
>   * /cpus/PowerPC,POWER8@20...
> ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG
> thread 0 -> cpu 0 (hard id 32)
> thread 1 -> cpu 1 (hard id 33)
> thread 2 -> cpu 2 (hard id 34)
> thread 3 -> cpu 3 (hard id 35)
> thread 4 -> cpu 4 (hard id 36)
> thread 5 -> cpu 5 (hard id 37)
> thread 6 -> cpu 6 (hard id 38)
> thread 7 -> cpu 7 (hard id 39)
>   * /cpus/PowerPC,POWER8@28...
> ibm,ppc-interrupt-server#s -> 8 threads
> thread 0 -> cpu 8 (hard id 40)
> thread 1 -> cpu 9 (hard id 41)
> thread 2 -> cpu 10 (hard id 42)
> thread 3 -> cpu 11 (hard id 43)
> thread 4 -> cpu 12 (hard id 44)
> thread 5 -> cpu 13 (hard id 45)
> thread 6 -> cpu 14 (hard id 46)
> thread 7 -> cpu 15 (hard id 47)
> (...)
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Fix debug print in smp_setup_cpu_maps
  https://git.kernel.org/powerpc/c/b53c86105919d4136591e3bee198a4829c0f5062

cheers

Re: [PATCH] macintosh: macio_asic: remove useless cast for driver.name

2022-02-14 Thread Michael Ellerman

On Tue, 25 Jan 2022 13:54:21 +, Corentin Labbe wrote:
> pci_driver name is const char pointer, so the cast it not necessary.
> 
> 

Applied to powerpc/next.

[1/1] macintosh: macio_asic: remove useless cast for driver.name
  https://git.kernel.org/powerpc/c/ccafe7c20b7de330d9091a114c9985305759f1ee

cheers

Re: [PATCH] powerpc/ptdump: Fix sparse warning in hashpagetable.c

2022-02-14 Thread Michael Ellerman

On Sun, 30 Jan 2022 18:39:18 +, Christophe Leroy wrote:
>   arch/powerpc/mm/ptdump/hashpagetable.c:264:29: warning: restricted __be64 
> degrades to integer
>   arch/powerpc/mm/ptdump/hashpagetable.c:265:49: warning: restricted __be64 
> degrades to integer
>   arch/powerpc/mm/ptdump/hashpagetable.c:267:36: warning: incorrect type in 
> assignment (different base types)
>   arch/powerpc/mm/ptdump/hashpagetable.c:267:36:expected unsigned long 
> long [usertype]
>   arch/powerpc/mm/ptdump/hashpagetable.c:267:36:got restricted __be64 
> [usertype] v
>   arch/powerpc/mm/ptdump/hashpagetable.c:268:36: warning: incorrect type in 
> assignment (different base types)
>   arch/powerpc/mm/ptdump/hashpagetable.c:268:36:expected unsigned long 
> long [usertype]
>   arch/powerpc/mm/ptdump/hashpagetable.c:268:36:got restricted __be64 
> [usertype] r
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/ptdump: Fix sparse warning in hashpagetable.c
  https://git.kernel.org/powerpc/c/961f649fb3ad9a9e384c695a050d776d970ddabd

cheers

Re: [PATCH] powerpc/nohash: Remove pte_same()

2022-02-14 Thread Michael Ellerman

On Mon, 31 Jan 2022 08:16:48 +, Christophe Leroy wrote:
> arch/powerpc/include/asm/nohash/{32/64}/pgtable.h has
> 
>   #define __HAVE_ARCH_PTE_SAME
>   #define pte_same(A,B)  ((pte_val(A) ^ pte_val(B)) == 0)
> 
> include/linux/pgtable.h has
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/nohash: Remove pte_same()
  https://git.kernel.org/powerpc/c/535bda36dbf2d271f59e06fe252c32eff452666d

cheers

Re: [PATCH v2] powerpc/xive: Add some error handling code to 'xive_spapr_init()'

2022-02-14 Thread Michael Ellerman

On Tue, 1 Feb 2022 13:31:16 +0100, Christophe JAILLET wrote:
> 'xive_irq_bitmap_add()' can return -ENOMEM.
> In this case, we should free the memory already allocated and return
> 'false' to the caller.
> 
> Also add an error path which undoes the 'tima = ioremap(...)'
> 
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/xive: Add some error handling code to 'xive_spapr_init()'
  https://git.kernel.org/powerpc/c/e414e2938ee26e734f19e92a60cd090ebaff37e6

cheers

Re: [PATCH] powerpc/603: Remove outdated comment

2022-02-14 Thread Michael Ellerman

On Mon, 31 Jan 2022 07:15:12 +, Christophe Leroy wrote:
> Since commit 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED
> in TLB miss handlers.") page table is not updated anymore by
> TLB miss handlers.
> 
> Remove the comment.
> 
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/603: Remove outdated comment
  https://git.kernel.org/powerpc/c/9872cbfb4558bf68219c5a8a65fd5c29b593323d

cheers

Re: [PATCH] powerpc/603: Clear C bit when PTE is read only

2022-02-14 Thread Michael Ellerman

On Mon, 31 Jan 2022 07:17:57 +, Christophe Leroy wrote:
> On book3s/32 MMU, PP bits don't offer kernel RO protection,
> kernel pages are always RW.
> 
> However, on the 603 a page fault is always generated when the
> C bit (change bit = dirty bit) is not set.
> 
> Enforce kernel RO protection by clearing C bit in TLB miss
> handler when the page doesn't have _PAGE_RW flag.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/603: Clear C bit when PTE is read only
  https://git.kernel.org/powerpc/c/4634bf4455fe26f07dabf97c3585c9ccb86353c4

cheers

Re: [PATCH] powerpc/32s: Make pte_update() non atomic on 603 core

2022-02-14 Thread Michael Ellerman

On Sun, 30 Jan 2022 10:29:34 +, Christophe Leroy wrote:
> On 603 core, TLB miss handler don't do any change to the
> page tables so pte_update() doesn't need to be atomic.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/32s: Make pte_update() non atomic on 603 core
  https://git.kernel.org/powerpc/c/4291d085b0b07a78403e845c187428b038c901cd

cheers

Re: [PATCH] powerpc/kasan: Fix early region not updated correctly

2022-02-14 Thread Michael Ellerman

On Wed, 29 Dec 2021 11:52:26 +0800, Chen Jingwen wrote:
> The shadow's page table is not updated when PTE_RPN_SHIFT is 24
> and PAGE_SHIFT is 12. It not only causes false positives but
> also false negative as shown the following text.
> 
> Fix it by bringing the logic of kasan_early_shadow_page_entry here.
> 
> 1. False Positive:
> ==
> BUG: KASAN: vmalloc-out-of-bounds in pcpu_alloc+0x508/0xa50
> Write of size 16 at addr f57f3be0 by task swapper/0/1
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/kasan: Fix early region not updated correctly
  https://git.kernel.org/powerpc/c/dd75080aa8409ce10d50fb58981c6b59bf8707d3

cheers

Re: [PATCH v3] powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch

2022-02-14 Thread Michael Ellerman

On Fri, 21 Jan 2022 12:14:47 +0300, Maxim Kiselev wrote:
> On board rev A, the network interface labels for the switch ports
> written on the front panel are different than on rev B and later.
> 
> This patch fixes network interface names for the switch ports according
> to labels that are written on the front panel of the board rev B.
> They start from ETH3 and end at ETH10.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch
  https://git.kernel.org/powerpc/c/5ebb74749202a25da4b3cc2eb15470225a05527c

cheers

Re: [PATCH] powerpc: dts: t104xrdb: fix phy type for FMAN 4/5

2022-02-14 Thread Michael Ellerman

On Thu, 30 Dec 2021 18:11:21 +0300, Maxim Kiselev wrote:
> T1040RDB has two RTL8211E-VB phys which requires setting
> of internal delays for correct work.
> 
> Changing the phy-connection-type property to `rgmii-id`
> will fix this issue.
> 
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
  https://git.kernel.org/powerpc/c/17846485dff91acce1ad47b508b633dffc32e838

cheers

Re: [PATCH V2] powerpc/perf: Fix task context setting for trace imc

2022-02-14 Thread Michael Ellerman

On Wed, 2 Feb 2022 09:48:37 +0530, Athira Rajeev wrote:
> Trace IMC (In-Memory collection counters) in powerpc is
> useful for application level profiling. For trace_imc,
> presently task context (task_ctx_nr) is set to
> perf_hw_context. But perf_hw_context is to be used for
> cpu PMU. So for trace_imc, even though it is per thread
> PMU, it is preferred to use sw_context inorder to be able
> to do application level monitoring. Hence change the
> task_ctx_nr to use perf_sw_context.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/perf: Fix task context setting for trace imc
  https://git.kernel.org/powerpc/c/0198322379c25215b2778482bf1221743a76e2b5

cheers

Re: [PATCH 14/14] uaccess: drop set_fs leftovers

2022-02-14 Thread Al Viro

On Mon, Feb 14, 2022 at 05:34:52PM +0100, Arnd Bergmann wrote:
> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
> index b5835325d44b..2f4a1b1ef387 100644
> --- a/arch/parisc/include/asm/futex.h
> +++ b/arch/parisc/include/asm/futex.h
> @@ -99,7 +99,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
>   /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
>* our gateway page, and causes no end of trouble...
>*/
> - if (uaccess_kernel() && !uaddr)
> + if (!uaddr)
>   return -EFAULT;

Huh?  uaccess_kernel() is removed since it becomes always false now,
so this looks odd.

AFAICS, the comment above that check refers to futex_detect_cmpxchg()
-> cmpxchg_futex_value_locked() -> futex_atomic_cmpxchg_inatomic() call chain.
Which had been gone since commit 3297481d688a (futex: Remove futex_cmpxchg
detection).  The comment *and* the check should've been killed off back
then.
Let's make sure to get both now...

Re: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Al Viro

On Mon, Feb 14, 2022 at 08:17:07PM +, Al Viro wrote:
> On Mon, Feb 14, 2022 at 12:01:05PM -0800, Linus Torvalds wrote:
> > On Mon, Feb 14, 2022 at 11:46 AM Arnd Bergmann  wrote:
> > >
> > > As Al pointed out, they turned out to be necessary on sparc64, but the 
> > > only
> > > definitions are on sparc64 and x86, so it's possible that they serve a 
> > > similar
> > > purpose here, in which case changing the limit from TASK_SIZE to
> > > TASK_SIZE_MAX is probably wrong as well.
> > 
> > x86-64 has always(*) used TASK_SIZE_MAX for access_ok(), and the
> > get_user() assembler implementation does the same.
> > 
> > I think any __range_not_ok() users that use TASK_SIZE are entirely
> > historical, and should be just fixed.
> 
> IIRC, that was mostly userland stack trace collection in perf.
> I'll try to dig in archives and see what shows up - it's been
> a while ago...

After some digging:

access_ok() needs only to make sure that MMU won't go anywhere near
the kernel page tables; address limit for 32bit threads is none of its
concern, so TASK_SIZE_MAX is right for it.

valid_user_frame() in arch/x86/events/core.c: used while walking
the userland call chain.  The reason it's not access_ok() is only that
perf_callchain_user() might've been called from interrupt that came while
we'd been under KERNEL_DS.
That had been back in 2015 and it had been obsoleted since 2017, commit
88b0193d9418 (perf/callchain: Force USER_DS when invoking 
perf_callchain_user()).
We had been guaranteed USER_DS ever since.
IOW, it could've reverted to use of access_ok() at any point after that.
TASK_SIZE vs TASK_SIZE_MAX is pretty much an accident there - might've been
TASK_SIZE_MAX from the very beginning.

copy_stack_frame() in arch/x86/kernel/stacktrace.c: similar story,
except the commit that made sure callers will have USER_DS - cac9b9a4b083
(stacktrace: Force USER_DS for stack_trace_save_user()) in this case.
Also could've been using access_ok() just fine.  Amusingly, access_ok()
used to be there, until it had been replaced with explicit check on
Jul 22 2019 - 4 days after that had been made useless by fix in the caller...

copy_from_user_nmi().  That one is a bit more interesting.
We have a call chain from perf_output_sample_ustack() (covered by
force_uaccess_begin() these days, not that it mattered for x86 now),
there's something odd in dumpstack.c:copy_code() (with explicit check
for TASK_SIZE_MAX in the caller) and there's a couple of callers in
Intel PMU code.
AFAICS, there's no reason whatsoever to use TASK_SIZE
in that one - the point is to prevent copyin from the kernel
memory, and in that respect TASK_SIZE_MAX isn't any worse.
The check in copy_code() probably should go.

So all of those guys should be simply switched to access_ok().
Might be worth making that a preliminary patch - it's independent
from everything else and there's no point folding it into any of the
patches in the series.

Re: [PATCH 11/14] sparc64: remove CONFIG_SET_FS support

2022-02-14 Thread Al Viro

On Mon, Feb 14, 2022 at 05:34:49PM +0100, Arnd Bergmann wrote:

> -/*
> - * Sparc64 is segmented, though more like the M68K than the I386.
> - * We use the secondary ASI to address user memory, which references a
> - * completely different VM map, thus there is zero chance of the user
> - * doing something queer and tricking us into poking kernel memory.

Actually, this part of comment probably ought to stay - it is relevant
for understanding what's going on (e.g. why is access_ok() always true, etc.)

Re: [PATCH 09/14] m68k: drop custom __access_ok()

2022-02-14 Thread Al Viro

On Mon, Feb 14, 2022 at 05:34:47PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> While most m68k platforms use separate address spaces for user
> and kernel space, at least coldfire does not, and the other
> ones have a TASK_SIZE that is less than the entire 4GB address
> range.
> 
> Using the generic implementation of __access_ok() stops coldfire
> user space from trivially accessing kernel memory, and is probably
> the right thing elsewhere for consistency as well.

Perhaps simply wrap that sucker into #ifdef CONFIG_CPU_HAS_ADDRESS_SPACES
(and trim the comment down to "coldfire and 68000 will pick generic
variant")?

> Signed-off-by: Arnd Bergmann 
> ---
>  arch/m68k/include/asm/uaccess.h | 13 -
>  1 file changed, 13 deletions(-)
> 
> diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
> index d6bb5720365a..64914872a5c9 100644
> --- a/arch/m68k/include/asm/uaccess.h
> +++ b/arch/m68k/include/asm/uaccess.h
> @@ -10,19 +10,6 @@
>  #include 
>  #include 
>  #include 
> -
> -/* We let the MMU do all checking */
> -static inline int __access_ok(const void __user *addr,
> - unsigned long size)
> -{
> - /*
> -  * XXX: for !CONFIG_CPU_HAS_ADDRESS_SPACES this really needs to check
> -  * for TASK_SIZE!
> -  * Removing this helper is probably sufficient.
> -  */
> - return 1;
> -}
> -#define __access_ok __access_ok
>  #include 
>  
>  /*
> -- 
> 2.29.2
>

Re: [PATCH 05/14] uaccess: add generic __{get,put}_kernel_nofault

2022-02-14 Thread Al Viro

On Mon, Feb 14, 2022 at 05:34:43PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> All architectures that don't provide __{get,put}_kernel_nofault() yet
> can implement this on top of __{get,put}_user.
> 
> Add a generic version that lets everything use the normal
> copy_{from,to}_kernel_nofault() code based on these, removing the last
> use of get_fs()/set_fs() from architecture-independent code.

I'd put the list of those architectures (AFAICS, that's alpha, ia64,
microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa) into commit
message - it's not that hard to find out, but...

And AFAICS, you've missed nios2 - see
#define __put_user(x, ptr) put_user(x, ptr)
in there.  nds32 oddities are dealt with earlier in the series, this
one is not...

RE: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread David Laight

From: Linus Torvalds
> Sent: 14 February 2022 20:24
> >
> > x86-64 has always(*) used TASK_SIZE_MAX for access_ok(), and the
> > get_user() assembler implementation does the same.
> 
> Side note: we could just check the sign bit instead, and avoid big
> constants that way.

The cheap test for most 64bit is (addr | size) >> 62 != 0.

I did some tests last week and the compilers correctly optimise
out constant size.

Doesn't sparc64 still need a wrap test?
Or is that assumed because there is always an unmapped page
and transfer are 'adequately' done on increasing addresses?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-14 Thread Robin Murphy


On 2022-02-14 16:34, Arnd Bergmann wrote:

From: Arnd Bergmann 

arm64 has an inline asm implementation of access_ok() that is derived from
the 32-bit arm version and optimized for the case that both the limit and
the size are variable. With set_fs() gone, the limit is always constant,
and the size usually is as well, so just using the default implementation
reduces the check into a comparison against a constant that can be
scheduled by the compiler.


Aww, I still vividly remember the birth of this madness, sat with my 
phone on a Saturday morning waiting for my bike to be MOT'd, staring at 
the 7-instruction sequence that Mark and I had come up with and certain 
that it could be shortened still. Kinda sad to see it go, but at the 
same time, glad that it can.


Acked-by: Robin Murphy 


On a defconfig build, this saves over 28KB of .text.


Not to mention saving those "WTF is going on there... oh yeah, 
access_ok()" moments when looking through disassembly :)


Cheers,
Robin.


Signed-off-by: Arnd Bergmann 
---
  arch/arm64/include/asm/uaccess.h | 28 +---
  1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 357f7bd9c981..e8dce0cc5eaa 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -26,6 +26,8 @@
  #include 
  #include 
  
+static inline int __access_ok(const void __user *ptr, unsigned long size);

+
  /*
   * Test whether a block of memory is a valid user space address.
   * Returns 1 if the range is valid, 0 otherwise.
@@ -33,10 +35,8 @@
   * This is equivalent to the following test:
   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
   */
-static inline unsigned long __access_ok(const void __user *addr, unsigned long 
size)
+static inline int access_ok(const void __user *addr, unsigned long size)
  {
-   unsigned long ret, limit = TASK_SIZE_MAX - 1;
-
/*
 * Asynchronous I/O running in a kernel thread does not have the
 * TIF_TAGGED_ADDR flag of the process owning the mm, so always untag
@@ -46,27 +46,9 @@ static inline unsigned long __access_ok(const void __user 
*addr, unsigned long s
(current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
addr = untagged_addr(addr);
  
-	__chk_user_ptr(addr);

-   asm volatile(
-   // A + B <= C + 1 for all A,B,C, in four easy steps:
-   // 1: X = A + B; X' = X % 2^64
-   "  adds%0, %3, %2\n"
-   // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
-   "  csel%1, xzr, %1, hi\n"
-   // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
-   //to compensate for the carry flag being set in step 4. For
-   //X > 2^64, X' merely has to remain nonzero, which it does.
-   "  csinv   %0, %0, xzr, cc\n"
-   // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
-   //comes from the carry in being clear. Otherwise, we are
-   //testing X' - C == 0, subject to the previous adjustments.
-   "  sbcsxzr, %0, %1\n"
-   "  cset%0, ls\n"
-   : "=" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
-
-   return ret;
+   return likely(__access_ok(addr, size));
  }
-#define __access_ok __access_ok
+#define access_ok access_ok
  
  #include

Re: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Linus Torvalds

On Mon, Feb 14, 2022 at 12:01 PM Linus Torvalds
 wrote:
>
> x86-64 has always(*) used TASK_SIZE_MAX for access_ok(), and the
> get_user() assembler implementation does the same.

Side note: we could just check the sign bit instead, and avoid big
constants that way.

Right now we actually have this complexity in the x86-64 user access code:

  #ifdef CONFIG_X86_5LEVEL
  #define LOAD_TASK_SIZE_MINUS_N(n) \
ALTERNATIVE __stringify(mov $((1 << 47) - 4096 - (n)),%rdx), \
__stringify(mov $((1 << 56) - 4096 - (n)),%rdx),
X86_FEATURE_LA57
  #else
  #define LOAD_TASK_SIZE_MINUS_N(n) \
  mov $(TASK_SIZE_MAX - (n)),%_ASM_DX
  #endif

just because the code tries to get that TASK_SIZE_MAX boundary just right.

And getting that boundary just right is important on 32-bit x86, but
it's *much* less important on x86-64.

There's still a (weak) reason to do it even for 64-bit code: page
faults outside the valid user space range don't actually cause a #PF
fault - they cause #GP - and then we have the #GP handler warn about
"this address hasn't been checked".

Which is nice and useful for doing syzbot kind of randomization loads
(ie user accesses that didn't go through access_ok() will stand out
nicely), but maybe it's not worth this. syzbot would be fine with only
the "sign bit set" case warning for the same thing.

So on x86-64, we could just check the sign of the address instead, and
simplify and shrink those get/put_user() code sequences (but
array_index_mask_nospec() currently uses the carry flag computation
too, so we'd have to change that part as well, maybe not worth it).

  Linus

Re: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Al Viro

On Mon, Feb 14, 2022 at 12:01:05PM -0800, Linus Torvalds wrote:
> On Mon, Feb 14, 2022 at 11:46 AM Arnd Bergmann  wrote:
> >
> > As Al pointed out, they turned out to be necessary on sparc64, but the only
> > definitions are on sparc64 and x86, so it's possible that they serve a 
> > similar
> > purpose here, in which case changing the limit from TASK_SIZE to
> > TASK_SIZE_MAX is probably wrong as well.
> 
> x86-64 has always(*) used TASK_SIZE_MAX for access_ok(), and the
> get_user() assembler implementation does the same.
> 
> I think any __range_not_ok() users that use TASK_SIZE are entirely
> historical, and should be just fixed.

IIRC, that was mostly userland stack trace collection in perf.
I'll try to dig in archives and see what shows up - it's been
a while ago...

Re: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Linus Torvalds

On Mon, Feb 14, 2022 at 11:46 AM Arnd Bergmann  wrote:
>
> As Al pointed out, they turned out to be necessary on sparc64, but the only
> definitions are on sparc64 and x86, so it's possible that they serve a similar
> purpose here, in which case changing the limit from TASK_SIZE to
> TASK_SIZE_MAX is probably wrong as well.

x86-64 has always(*) used TASK_SIZE_MAX for access_ok(), and the
get_user() assembler implementation does the same.

I think any __range_not_ok() users that use TASK_SIZE are entirely
historical, and should be just fixed.

 Linus

(*) And by "always" I mean "as far back as I bothered to go". In the
2.6.12 git import, we had

#define USER_DS  MAKE_MM_SEG(PAGE_OFFSET)

so the user access limit was actually not really TASK_SIZE_MAX at all,
but the beginning of the kernel mapping, which on x86-64 is much much
higher.

Re: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Christoph Hellwig

On Mon, Feb 14, 2022 at 08:45:52PM +0100, Arnd Bergmann wrote:
> As Al pointed out, they turned out to be necessary on sparc64, but the only
> definitions are on sparc64 and x86, so it's possible that they serve a similar
> purpose here, in which case changing the limit from TASK_SIZE to
> TASK_SIZE_MAX is probably wrong as well.
> 
> So either I need to revert the original definition as I did on sparc64, or
> they can be removed completely. Hopefully Al or the x86 maintainers
> can clarify.

Looking at the x86 users I think:

 - valid_user_frame should go away and the caller should use get_user
   instead of __get_user
 - the one in copy_code can just go away, as there is another check
   in copy_from_user_nmi
 - copy_stack_frame should just use access_ok
 - as does copy_from_user_nmi

but yes, having someone who actually knows this code look over it
would be very helpful.

Re: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Arnd Bergmann

On Mon, Feb 14, 2022 at 6:02 PM Christoph Hellwig  wrote:
>
> On Mon, Feb 14, 2022 at 05:34:42PM +0100, Arnd Bergmann wrote:
> > +#define __range_not_ok(addr, size, limit)(!__access_ok(addr, size))
> > +#define __chk_range_not_ok(addr, size, limit)(!__access_ok((void 
> > __user *)addr, size))
>
> Can we just kill these off insted of letting themm obsfucate the code?

As Al pointed out, they turned out to be necessary on sparc64, but the only
definitions are on sparc64 and x86, so it's possible that they serve a similar
purpose here, in which case changing the limit from TASK_SIZE to
TASK_SIZE_MAX is probably wrong as well.

So either I need to revert the original definition as I did on sparc64, or
they can be removed completely. Hopefully Al or the x86 maintainers
can clarify.

 Arnd

Re: [PATCH 10/14] uaccess: remove most CONFIG_SET_FS users

2022-02-14 Thread Arnd Bergmann

On Mon, Feb 14, 2022 at 6:06 PM Christoph Hellwig  wrote:
>
> On Mon, Feb 14, 2022 at 05:34:48PM +0100, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> >
> > On almost all architectures, there are no remaining callers
> > of set_fs(), so CONFIG_SET_FS can be disabled, along with
> > removing the thread_info field and any references to it.
> >
> > This turns access_ok() into a cheaper check against TASK_SIZE_MAX.
>
> Wouldn't it make more sense to just merge this into the last patch?

Yes, sounds good. I wasn't sure at first if there is enough buy-in to get
all architectures cleaned up, and I hadn't done the ia64 patch, so it
seemed more important to do this part early, but now it seems that it
will all go in at the same time, so doing this as part of a big removal
at the end makes sense.

Arnd

Re: [PATCH 07/14] uaccess: generalize access_ok()

2022-02-14 Thread Arnd Bergmann

On Mon, Feb 14, 2022 at 6:15 PM Al Viro  wrote:
>
> On Mon, Feb 14, 2022 at 05:34:45PM +0100, Arnd Bergmann wrote:
>
> > diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
> > index c7b763d2f526..8867ddf3e6c7 100644
> > --- a/arch/csky/kernel/signal.c
> > +++ b/arch/csky/kernel/signal.c
> > @@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal 
> > *ksig,
> >  static int
> >  setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
> >  {
> > - struct rt_sigframe *frame;
> > + struct rt_sigframe __user *frame;
> >   int err = 0;
> >
> >   frame = get_sigframe(ksig, regs, sizeof(*frame));
>
> Minor nit: might make sense to separate annotations (here, on nios2, etc.) 
> from the rest...

Done.

> > -}
> > -
> > -static inline int access_ok(const void __user * addr, unsigned long size)
> > -{
> > - return 1;
> > -}
> > +#define __range_not_ok(addr, size, limit) (!__access_ok(addr, size))
>
> is really wrong.  For sparc64, access_ok() should always be true.
> This __range_not_ok() thing is used *only* for valid_user_frame() in
> arch/sparc/kernel/perf_event.c - it's not a part of normal access_ok()
> there.
>
> sparc64 has separate address spaces for kernel and for userland; access_ok()
> had never been useful there.

Ok, fixed as well now. I had the access_ok() bit right, the definition just
moved around here so it comes before the #include, but I missed the
bit about __range_not_ok(), which I have now reverted back to the
correct version in my tree.

Arnd

Re: [PATCH v2 11/13] powerpc/ftrace: directly call of function graph tracer by ftrace caller

2022-02-14 Thread Steven Rostedt

On Mon, 14 Feb 2022 22:54:23 +0530
"Naveen N. Rao"  wrote:

> For x86, commit 0c0593b45c9b4e ("x86/ftrace: Make function graph use 
> ftrace directly") also adds recursion check before the call to 
> function_graph_enter() in prepare_ftrace_return(). Do we need that on
> powerpc as well?

Yes. The function_graph_enter() does not provide any recursion protection,
so if it were to call something that gets function graph traced, it will
crash the machine.

-- Steve

Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1

2022-02-14 Thread Tokunori Ikegami


Hi Ahmad-san,

On 2022/02/15 1:22, Ahmad Fatoum wrote:

Hello Tokunori-san,

On 13.02.22 17:47, Tokunori Ikegami wrote:

Hi Ahmad-san,

Thanks for your confirmations. Sorry for late to reply.

No worries. I appreciate you taking the time.


Could you please try the patch attached to disable the chip_good() change as 
before?
I think this should work for S29GL964N since the chip_ready() is used and works 
as mentioned.

yes, this resolves my issue:
Tested-by: Ahmad Fatoum 

Thanks for your testing. I have just sent the patch to review.



Doesn't seem to be a buffered write issue here though as the writes
did work fine before dfeae1073583. Any other ideas?

At first I thought the issue is possible to be resolved by using the word write 
instead of the buffered writes.
Now I am thinking to disable the changes dfeae1073583 partially with any 
condition if possible.

What seems to work for me is checking if chip_good or chip_ready
and map_word is equal to 0xFF. I can't justify why this is ok though.
(Worst case bus is floating at this point of time and Hi-Z is read
as 0xff on CPU data lines...)

Sorry I am not sure about this.
I thought the chip_ready() itself is correct as implemented as the data sheet 
in the past.
But it did not work correctly so changed to use chip_good() instead as it is 
also correct.

What exactly in the datasheet makes you believe chip_good is not appropriate?
I just mentioned about the actual issue behaviors as not worked 
chip_good() on S29GL964N and not worked chip_ready() on 
MX29GL512FHT2I-11G before etc.
Anyway let me recheck the data sheet details as just checked it again 
quickly but needed more investigation to understand.


Regards,
Ikegami



Cheers,
Ahmad

Re: [PATCH] powerpc/boot: Add `otheros-too-big.bld` to .gitignore

2022-02-14 Thread Geoff Levand

Hi Paul,

On 2/13/22 22:55, Paul Menzel wrote:
> Currently, `git status` lists the file as untracked by git, so tell git
> to ignore it.

Thanks for your contribution.

Acked-by: Geoff Levand

Re: [PATCH v1 2/2] mm: enforce pageblock_order < MAX_ORDER

2022-02-14 Thread Zi Yan

On 14 Feb 2022, at 12:41, David Hildenbrand wrote:

> Some places in the kernel don't really expect pageblock_order >=
> MAX_ORDER, and it looks like this is only possible in corner cases:
>
> 1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order
>pages via __free_pages_core(), which cannot possibly work.
>
> 2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE
>start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger
>pageblock_order, we could have a single pageblock partially managed by
>two zones.
>
> 3) compaction code runs into __fragmentation_index() with order
>>= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1]
>
> 4) mm/page_reporting.c won't be reporting any pages with default
>page_reporting_order == pageblock_order, as we'll be skipping the
>reporting loop inside page_reporting_process_zone().
>
> 5) __rmqueue_fallback() will never be able to steal with
>ALLOC_NOFRAGMENT.
>
> pageblock_order >= MAX_ORDER is weird either way: it's a pure
> optimization for making alloc_contig_range(), as used for allcoation of
> gigantic pages, a little more reliable to succeed. However, if there is
> demand for somewhat reliable allocation of gigantic pages, affected setups
> should be using CMA or boottime allocations instead.
>
> So let's make sure that pageblock_order < MAX_ORDER and simplify.
>
> [1] https://lkml.kernel.org/r/87r189a2ks@linux.ibm.com
>
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/virtio/virtio_mem.c |  9 +++--
>  include/linux/cma.h |  3 +--
>  include/linux/pageblock-flags.h |  7 +--
>  mm/Kconfig  |  3 +++
>  mm/page_alloc.c | 32 
>  5 files changed, 20 insertions(+), 34 deletions(-)

LGTM. Thanks. Reviewed-by: Zi Yan 

--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature

Re: [PATCH v1 1/2] cma: factor out minimum alignment requirement

2022-02-14 Thread Zi Yan

On 14 Feb 2022, at 12:41, David Hildenbrand wrote:

> Let's factor out determining the minimum alignment requirement for CMA
> and add a helpful comment.
>
> No functional change intended.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/include/asm/fadump-internal.h |  5 -
>  arch/powerpc/kernel/fadump.c   |  2 +-
>  drivers/of/of_reserved_mem.c   |  9 +++--
>  include/linux/cma.h|  9 +
>  kernel/dma/contiguous.c|  4 +---
>  mm/cma.c   | 20 +---
>  6 files changed, 19 insertions(+), 30 deletions(-)

LGTM. Thanks. Reviewed-by: Zi Yan 


--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 12/13] powerpc/ftrace: Prepare ftrace_64_mprofile.S for reuse by PPC32

2022-02-14 Thread Naveen N. Rao


Christophe Leroy wrote:

PPC64 mprofile versions and PPC32 are very similar.

Modify PPC64 version so that if can be reused for PPC32.

Signed-off-by: Christophe Leroy 
---
 .../powerpc/kernel/trace/ftrace_64_mprofile.S | 73 +--
 1 file changed, 51 insertions(+), 22 deletions(-)


While I agree that ppc32 and -mprofile-kernel ftrace code are very 
similar, I think this patch adds way too many #ifdefs. IMHO, this

makes the resultant code quite difficult to follow.


- Naveen



diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
index 6071e0122797..56da60e98327 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
@@ -34,13 +34,16 @@
  */
 _GLOBAL(ftrace_regs_caller)
/* Save the original return address in A's stack frame */
-   std r0,LRSAVE(r1)
+#ifdef CONFIG_MPROFILE_KERNEL
+   PPC_STL r0,LRSAVE(r1)
+#endif
 
 	/* Create our stack frame + pt_regs */

-   stdur1,-SWITCH_FRAME_SIZE(r1)
+   PPC_STLUr1,-SWITCH_FRAME_SIZE(r1)
 
 	/* Save all gprs to pt_regs */

SAVE_GPR(0, r1)
+#ifdef CONFIG_PPC64
SAVE_GPRS(2, 11, r1)
 
 	/* Ok to continue? */

@@ -49,10 +52,13 @@ _GLOBAL(ftrace_regs_caller)
beq ftrace_no_trace
 
 	SAVE_GPRS(12, 31, r1)

+#else
+   stmwr2, GPR2(r1)
+#endif
 
 	/* Save previous stack pointer (r1) */

addir8, r1, SWITCH_FRAME_SIZE
-   std r8, GPR1(r1)
+   PPC_STL r8, GPR1(r1)
 
 	/* Load special regs for save below */

mfmsr   r8
@@ -63,10 +69,11 @@ _GLOBAL(ftrace_regs_caller)
/* Get the _mcount() call site out of LR */
mflrr7
/* Save it as pt_regs->nip */
-   std r7, _NIP(r1)
+   PPC_STL r7, _NIP(r1)
/* Save the read LR in pt_regs->link */
-   std r0, _LINK(r1)
+   PPC_STL r0, _LINK(r1)
 
+#ifdef CONFIG_PPC64

/* Save callee's TOC in the ABI compliant location */
std r2, 24(r1)
ld  r2,PACATOC(r13) /* get kernel TOC in r2 */
@@ -74,8 +81,12 @@ _GLOBAL(ftrace_regs_caller)
addis   r3,r2,function_trace_op@toc@ha
addir3,r3,function_trace_op@toc@l
ld  r5,0(r3)
+#else
+   lis r3,function_trace_op@ha
+   lwz r5,function_trace_op@l(r3)
+#endif
 
-#ifdef CONFIG_LIVEPATCH

+#ifdef CONFIG_LIVEPATCH_64
mr  r14,r7  /* remember old NIP */
 #endif
/* Calculate ip from nip-4 into r3 for call below */
@@ -85,10 +96,10 @@ _GLOBAL(ftrace_regs_caller)
mr  r4, r0
 
 	/* Save special regs */

-   std r8, _MSR(r1)
-   std r9, _CTR(r1)
-   std r10, _XER(r1)
-   std r11, _CCR(r1)
+   PPC_STL r8, _MSR(r1)
+   PPC_STL r9, _CTR(r1)
+   PPC_STL r10, _XER(r1)
+   PPC_STL r11, _CCR(r1)
 
 	/* Load _regs in r6 for call below */

addir6, r1 ,STACK_FRAME_OVERHEAD
@@ -100,27 +111,32 @@ ftrace_regs_call:
nop
 
 	/* Load ctr with the possibly modified NIP */

-   ld  r3, _NIP(r1)
+   PPC_LL  r3, _NIP(r1)
mtctr   r3
-#ifdef CONFIG_LIVEPATCH
+#ifdef CONFIG_LIVEPATCH_64
cmpdr14, r3 /* has NIP been altered? */
 #endif
 
 	/* Restore gprs */

-   REST_GPR(0, r1)
+#ifdef CONFIG_PPC64
REST_GPRS(2, 31, r1)
+#else
+   lmw r2, GPR2(r1)
+#endif
 
 	/* Restore possibly modified LR */

-   ld  r0, _LINK(r1)
+   PPC_LL  r0, _LINK(r1)
mtlrr0
 
+#ifdef CONFIG_PPC64

/* Restore callee's TOC */
ld  r2, 24(r1)
+#endif
 
 	/* Pop our stack frame */

addi r1, r1, SWITCH_FRAME_SIZE
 
-#ifdef CONFIG_LIVEPATCH

+#ifdef CONFIG_LIVEPATCH_64
 /* Based on the cmpd above, if the NIP was altered handle livepatch */
bne-livepatch_handler
 #endif
@@ -129,6 +145,7 @@ ftrace_regs_call:
 _GLOBAL(ftrace_stub)
blr
 
+#ifdef CONFIG_PPC64

 ftrace_no_trace:
mflrr3
mtctr   r3
@@ -136,25 +153,31 @@ ftrace_no_trace:
addir1, r1, SWITCH_FRAME_SIZE
mtlrr0
bctr
+#endif
 
 _GLOBAL(ftrace_caller)

/* Save the original return address in A's stack frame */
-   std r0, LRSAVE(r1)
+#ifdef CONFIG_MPROFILE_KERNEL
+   PPC_STL r0, LRSAVE(r1)
+#endif
 
 	/* Create our stack frame + pt_regs */

-   stdur1, -SWITCH_FRAME_SIZE(r1)
+   PPC_STLUr1, -SWITCH_FRAME_SIZE(r1)
 
 	/* Save all gprs to pt_regs */

SAVE_GPRS(3, 10, r1)
 
+#ifdef CONFIG_PPC64

lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi   r3, 0
beq ftrace_no_trace
+#endif
 
 	/* Get the _mcount() call site out of LR */

mflrr7
-   std r7, _NIP(r1)
+   PPC_STL r7, _NIP(r1)
 
+#ifdef CONFIG_PPC64

/* Save callee's TOC in the ABI compliant location */
std r2, 24(r1)
ld  r2, PACATOC(r13)/* get kernel TOC in r2 */
@@ -162,6

[PATCH v1 2/2] mm: enforce pageblock_order < MAX_ORDER

2022-02-14 Thread David Hildenbrand

Some places in the kernel don't really expect pageblock_order >=
MAX_ORDER, and it looks like this is only possible in corner cases:

1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order
   pages via __free_pages_core(), which cannot possibly work.

2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE
   start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger
   pageblock_order, we could have a single pageblock partially managed by
   two zones.

3) compaction code runs into __fragmentation_index() with order
   >= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1]

4) mm/page_reporting.c won't be reporting any pages with default
   page_reporting_order == pageblock_order, as we'll be skipping the
   reporting loop inside page_reporting_process_zone().

5) __rmqueue_fallback() will never be able to steal with
   ALLOC_NOFRAGMENT.

pageblock_order >= MAX_ORDER is weird either way: it's a pure
optimization for making alloc_contig_range(), as used for allcoation of
gigantic pages, a little more reliable to succeed. However, if there is
demand for somewhat reliable allocation of gigantic pages, affected setups
should be using CMA or boottime allocations instead.

So let's make sure that pageblock_order < MAX_ORDER and simplify.

[1] https://lkml.kernel.org/r/87r189a2ks@linux.ibm.com

Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c |  9 +++--
 include/linux/cma.h |  3 +--
 include/linux/pageblock-flags.h |  7 +--
 mm/Kconfig  |  3 +++
 mm/page_alloc.c | 32 
 5 files changed, 20 insertions(+), 34 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 38becd8d578c..e7d6b679596d 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,10 @@ static int virtio_mem_init_hotplug(struct virtio_mem 
*vm)
  VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
/*
-* We want subblocks to span at least MAX_ORDER_NR_PAGES and
-* pageblock_nr_pages pages. This:
-* - Is required for now for alloc_contig_range() to work reliably -
-*   it doesn't properly handle smaller granularity on ZONE_NORMAL.
+* TODO: once alloc_contig_range() works reliably with pageblock
+* granularity on ZONE_NORMAL, use pageblock_nr_pages instead.
 */
-   sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-   pageblock_nr_pages) * PAGE_SIZE;
+   sb_size = PAGE_SIZE * MAX_ORDER_NR_PAGES;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
if (sb_size < memory_block_size_bytes() && !force_bbm) {
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 75fe188ec4a1..b1ba94f1cc9c 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -25,8 +25,7 @@
  * -- can deal with only some pageblocks of a higher-order page being
  *  MIGRATE_CMA, we can use pageblock_nr_pages.
  */
-#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
- pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES
 #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
 
 struct cma;
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 973fd731a520..83c7248053a1 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -37,8 +37,11 @@ extern unsigned int pageblock_order;
 
 #else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
 
-/* Huge pages are a constant size */
-#define pageblock_orderHUGETLB_PAGE_ORDER
+/*
+ * Huge pages are a constant size, but don't exceed the maximum allocation
+ * granularity.
+ */
+#define pageblock_ordermin_t(unsigned int, HUGETLB_PAGE_ORDER, 
MAX_ORDER - 1)
 
 #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
 
diff --git a/mm/Kconfig b/mm/Kconfig
index 3326ee3903f3..4c91b92e7537 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -262,6 +262,9 @@ config HUGETLB_PAGE_SIZE_VARIABLE
  HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes 
available
  on a platform.
 
+ Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
+ clamped down to MAX_ORDER - 1.
+
 config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3589febc6d31..04cf964b57b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1072,14 +1072,12 @@ static inline void __free_one_page(struct page *page,
int migratetype, fpi_t fpi_flags)
 {
struct capture_control *capc = task_capc(zone);
+   unsigned int max_order = pageblock_order;
unsigned long buddy_pfn;
unsigned long combined_pfn;
-   unsigned int max_order;
struct page *buddy;
bool to_tail;
 
-   max_order =

[PATCH v1 1/2] cma: factor out minimum alignment requirement

2022-02-14 Thread David Hildenbrand

Let's factor out determining the minimum alignment requirement for CMA
and add a helpful comment.

No functional change intended.

Signed-off-by: David Hildenbrand 
---
 arch/powerpc/include/asm/fadump-internal.h |  5 -
 arch/powerpc/kernel/fadump.c   |  2 +-
 drivers/of/of_reserved_mem.c   |  9 +++--
 include/linux/cma.h|  9 +
 kernel/dma/contiguous.c|  4 +---
 mm/cma.c   | 20 +---
 6 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 52189928ec08..81bcb9abb371 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -19,11 +19,6 @@
 
 #define memblock_num_regions(memblock_type)(memblock.memblock_type.cnt)
 
-/* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT   (PAGE_SIZE <<   \
-max_t(unsigned long, MAX_ORDER - 1,\
-pageblock_order))
-
 /* FAD commands */
 #define FADUMP_REGISTER1
 #define FADUMP_UNREGISTER  2
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index d03e488cfe9c..7eb67201ea41 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -544,7 +544,7 @@ int __init fadump_reserve_mem(void)
if (!fw_dump.nocma) {
fw_dump.boot_memory_size =
ALIGN(fw_dump.boot_memory_size,
- FADUMP_CMA_ALIGNMENT);
+ CMA_MIN_ALIGNMENT_BYTES);
}
 #endif
 
diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 9c0fb962c22b..75caa6f5d36f 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "of_private.h"
 
@@ -116,12 +117,8 @@ static int __init __reserved_mem_alloc_size(unsigned long 
node,
if (IS_ENABLED(CONFIG_CMA)
&& of_flat_dt_is_compatible(node, "shared-dma-pool")
&& of_get_flat_dt_prop(node, "reusable", NULL)
-   && !nomap) {
-   unsigned long order =
-   max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
-
-   align = max(align, (phys_addr_t)PAGE_SIZE << order);
-   }
+   && !nomap)
+   align = max_t(phys_addr_t, align, CMA_MIN_ALIGNMENT_BYTES);
 
prop = of_get_flat_dt_prop(node, "alloc-ranges", );
if (prop) {
diff --git a/include/linux/cma.h b/include/linux/cma.h
index bd801023504b..75fe188ec4a1 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -20,6 +20,15 @@
 
 #define CMA_MAX_NAME 64
 
+/*
+ * TODO: once the buddy -- especially pageblock merging and 
alloc_contig_range()
+ * -- can deal with only some pageblocks of a higher-order page being
+ *  MIGRATE_CMA, we can use pageblock_nr_pages.
+ */
+#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
+ pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
+
 struct cma;
 
 extern unsigned long totalcma_pages;
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..6ea80ae42622 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,8 +399,6 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-   phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
-   phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
struct cma *cma;
@@ -416,7 +414,7 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem)
of_get_flat_dt_prop(node, "no-map", NULL))
return -EINVAL;
 
-   if ((rmem->base & mask) || (rmem->size & mask)) {
+   if (!IS_ALIGNED(rmem->base | rmem->size, CMA_MIN_ALIGNMENT_BYTES)) {
pr_err("Reserved memory: incorrect alignment of CMA region\n");
return -EINVAL;
}
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..5a2cd5851658 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -168,7 +168,6 @@ int __init cma_init_reserved_mem(phys_addr_t base, 
phys_addr_t size,
 struct cma **res_cma)
 {
struct cma *cma;
-   phys_addr_t alignment;
 
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
@@ -179,15 +178,12 @@ int __init cma_init_reserved_mem(phys_addr_t base, 
phys_addr_t size,
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
 
-   /* ensure minimal

[PATCH v1 0/2] mm: enforce pageblock_order < MAX_ORDER

2022-02-14 Thread David Hildenbrand

Having pageblock_order >= MAX_ORDER seems to be able to happen in corner
cases and some parts of the kernel are not prepared for it.

For example, Aneesh has shown [1] that such kernels can be compiled on
ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run
into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during
boot.

We can get pageblock_order >= MAX_ORDER when the default hugetlb size is
bigger than the maximum allocation granularity of the buddy, in which case
we are no longer talking about huge pages but instead gigantic pages.

Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of
such gigantic pages more likely to succeed.

Reliable use of gigantic pages either requires boot time allcoation or CMA,
no need to overcomplicate some places in the kernel to optimize for corner
cases that are broken in other areas of the kernel.

Let's enforce pageblock_order < MAX_ORDER and simplify.

Especially patch #1 can be regarded a cleanup before:
[PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range
alignment. [2]

[1] https://lkml.kernel.org/r/87r189a2ks@linux.ibm.com
[2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi@sent.com

Cc: Andrew Morton 
Cc: Aneesh Kumar K V 
Cc: Zi Yan 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Rob Herring 
Cc: Frank Rowand 
Cc: "Michael S. Tsirkin" 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: Minchan Kim 
Cc: Vlastimil Babka 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: devicet...@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: io...@lists.linux-foundation.org
Cc: linux...@kvack.org

David Hildenbrand (2):
  cma: factor out minimum alignment requirement
  mm: enforce pageblock_order < MAX_ORDER

 arch/powerpc/include/asm/fadump-internal.h |  5 
 arch/powerpc/kernel/fadump.c   |  2 +-
 drivers/of/of_reserved_mem.c   |  9 ++
 drivers/virtio/virtio_mem.c|  9 ++
 include/linux/cma.h|  8 ++
 include/linux/pageblock-flags.h|  7 +++--
 kernel/dma/contiguous.c|  4 +--
 mm/Kconfig |  3 ++
 mm/cma.c   | 20 --
 mm/page_alloc.c| 32 ++
 10 files changed, 37 insertions(+), 62 deletions(-)


base-commit: 754e0b0e35608ed5206d6a67a791563c631cec07
-- 
2.34.1

Re: [PATCH 00/14] clean up asm/uaccess.h, kill set_fs for good

2022-02-14 Thread Linus Torvalds

On Mon, Feb 14, 2022 at 8:35 AM Arnd Bergmann  wrote:
>
> I did a patch for microblaze at some point, which turned out to be fairly
> generic, and now ported it to most other architectures, using new generic
> implementations of access_ok() and __{get,put}_kernel_nocheck().

Thanks for doing this.

Apart from the sparc64 issue with completely separate address spaces
(so access_ok() should always return true like Al pointed out), this
looks excellent to me.

Somebody should check that there aren't other cases like sparc64, but
let's merge this asap other than that.

  Linus

Re: [PATCH 07/14] uaccess: generalize access_ok()

2022-02-14 Thread Al Viro

On Mon, Feb 14, 2022 at 05:34:45PM +0100, Arnd Bergmann wrote:

> diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
> index c7b763d2f526..8867ddf3e6c7 100644
> --- a/arch/csky/kernel/signal.c
> +++ b/arch/csky/kernel/signal.c
> @@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal 
> *ksig,
>  static int
>  setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
>  {
> - struct rt_sigframe *frame;
> + struct rt_sigframe __user *frame;
>   int err = 0;
>  
>   frame = get_sigframe(ksig, regs, sizeof(*frame));

Minor nit: might make sense to separate annotations (here, on nios2, etc.) from 
the rest...

This, OTOH,

> diff --git a/arch/sparc/include/asm/uaccess_64.h 
> b/arch/sparc/include/asm/uaccess_64.h
> index 5c12fb46bc61..000bac67cf31 100644
> --- a/arch/sparc/include/asm/uaccess_64.h
> +++ b/arch/sparc/include/asm/uaccess_64.h
...
> -static inline bool __chk_range_not_ok(unsigned long addr, unsigned long 
> size, unsigned long limit)
> -{
> - if (__builtin_constant_p(size))
> - return addr > limit - size;
> -
> - addr += size;
> - if (addr < size)
> - return true;
> -
> - return addr > limit;
> -}
> -
> -#define __range_not_ok(addr, size, limit)   \
> -({  \
> - __chk_user_ptr(addr);   \
> - __chk_range_not_ok((unsigned long __force)(addr), size, limit); \
> -})
> -
> -static inline int __access_ok(const void __user * addr, unsigned long size)
> -{
> - return 1;
> -}
> -
> -static inline int access_ok(const void __user * addr, unsigned long size)
> -{
> - return 1;
> -}
> +#define __range_not_ok(addr, size, limit) (!__access_ok(addr, size))

is really wrong.  For sparc64, access_ok() should always be true.
This __range_not_ok() thing is used *only* for valid_user_frame() in
arch/sparc/kernel/perf_event.c - it's not a part of normal access_ok()
there.

sparc64 has separate address spaces for kernel and for userland; access_ok()
had never been useful there.

Re: [PATCH v2 11/13] powerpc/ftrace: directly call of function graph tracer by ftrace caller

2022-02-14 Thread Naveen N. Rao


Christophe Leroy wrote:

Modify function graph tracer to be handled directly by the standard
ftrace caller.

This is made possible as powerpc now supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS.

This change simplifies the call of function graph ftrace.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/ftrace.h |  6 ++
 arch/powerpc/kernel/trace/ftrace.c| 11 
 arch/powerpc/kernel/trace/ftrace_32.S | 53 +--
 .../powerpc/kernel/trace/ftrace_64_mprofile.S | 64 +--
 4 files changed, 20 insertions(+), 114 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 45c3d6f11daa..70b457097098 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -38,6 +38,12 @@ static __always_inline void 
ftrace_instruction_pointer_set(struct ftrace_regs *f
 {
regs_set_return_ip(>regs, ip);
 }
+
+struct ftrace_ops;
+
+#define ftrace_graph_func ftrace_graph_func
+void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
+  struct ftrace_ops *op, struct ftrace_regs *fregs);
 #endif
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c

index ce673764cb69..74a176e394ef 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -917,6 +917,9 @@ static int ftrace_modify_ftrace_graph_caller(bool enable)
unsigned long stub = (unsigned long)(_graph_stub);
ppc_inst_t old, new;
 
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_ARGS))

+   return 0;
+
old = ftrace_call_replace(ip, enable ? stub : addr, 0);
new = ftrace_call_replace(ip, enable ? addr : stub, 0);
 
@@ -955,6 +958,14 @@ unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,

 out:
return parent;
 }


For x86, commit 0c0593b45c9b4e ("x86/ftrace: Make function graph use 
ftrace directly") also adds recursion check before the call to 
function_graph_enter() in prepare_ftrace_return(). Do we need that on

powerpc as well?


- Naveen

Re: [PATCH v5 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-02-14 Thread Mimi Zohar

On Mon, 2022-02-14 at 16:55 +0100, Michal Suchánek wrote:
> Hello,
> 
> On Mon, Feb 14, 2022 at 10:14:16AM -0500, Mimi Zohar wrote:
> > Hi Michal,
> > 
> > On Sun, 2022-02-13 at 21:59 -0500, Mimi Zohar wrote:
> > 
> > > 
> > > On Tue, 2022-01-11 at 12:37 +0100, Michal Suchanek wrote:
> > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > > index dea74d7717c0..1cde9b6c5987 100644
> > > > --- a/arch/powerpc/Kconfig
> > > > +++ b/arch/powerpc/Kconfig
> > > > @@ -560,6 +560,22 @@ config KEXEC_FILE
> > > >  config ARCH_HAS_KEXEC_PURGATORY
> > > > def_bool KEXEC_FILE
> > > >  
> > > > +config KEXEC_SIG
> > > > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > > > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> > > > +   help
> > > > + This option makes kernel signature verification mandatory for
> 
> This is actually wrong. KEXEC_SIG makes it mandatory that any signature
> that is appended is valid and made by a key that is part of the platform
> keyiring (which is also wrong, built-in keys should be also accepted).
> KEXEC_SIG_FORCE or an IMA policy makes it mandatory that the signature
> is present.

I'm aware of MODULE_SIG_FORCE, which isn't normally enabled by distros,
but enabling MODULE_SIG allows MODULE_SIG_FORCE to be enabled on the
boot command line.  In the IMA arch policies, if MODULE_SIG is enabled,
it is then enforced, otherwise an IMA "appraise" policy rule is
defined.  This rule would prevent the module_load syscall.

I'm not aware of KEXEC_SIG_FORCE.  If there is such a Kconfig, then I
assume it could work similarly.

> 
> > > > + the kexec_file_load() syscall.
> > > 
> > > When KEXEC_SIG is enabled on other architectures, IMA does not define a
> > > kexec 'appraise' policy rule.  Refer to the policy rules in
> > > security/ima/ima_efi.c.  Similarly the kexec 'appraise' policy rule in
> 
> I suppose you mean security/integrity/ima/ima_efi.c

Yes

> 
> I also think it's misguided because KEXEC_SIG in itself does not enforce
> the signature. KEXEC_SIG_FORCE does.

Right, which is why the IMA efi policy calls set_module_sig_enforced().

> 
> > > arch/powerpc/kernel/ima_policy.c should not be defined.
> 
> I suppose you mean arch/powerpc/kernel/ima_arch.c - see above.

Sorry, yes.  

> 
> 
> Thanks for taking the time to reseach and summarize the differences.
> 
> > The discussion shouldn't only be about IMA vs. KEXEC_SIG kernel image
> > signature verification.  Let's try and reframe the problem a bit.
> > 
> > 1. Unify and simply the existing kexec signature verification so
> > verifying the KEXEC kernel image signature works irrespective of
> > signature type - PE, appended signature.
> > 
> > solution: enable KEXEC_SIG  (This patch set, with the above powerpc IMA
> > policy changes.)
> > 
> > 2. Measure and include the kexec kernel image in a log for attestation,
> > if desired.
> > 
> > solution: enable IMA_ARCH_POLICY 
> > - Powerpc: requires trusted boot to be enabled.
> > - EFI:   requires  secure boot to be enabled.  The IMA efi policy
> > doesn't differentiate between secure and trusted boot.
> > 
> > 3. Carry the kexec kernel image measurement across kexec, if desired
> > and supported on the architecture.
> > 
> > solution: enable IMA_KEXEC
> > 
> > Comparison: 
> > - Are there any differences between IMA vs. KEXEC_SIG measuring the
> > kexec kernel image?
> > 
> > One of the main differences is "what" is included in the measurement
> > list differs.  In both cases, the 'd-ng' field of the IMA measurement
> > list template (e.g. ima-ng, ima-sig, ima-modsig) is the full file hash
> > including the appended signature.  With IMA and the 'ima-modsig'
> > template, an additional hash without the appended signature is defined,
> > as well as including the appended signature in the 'sig' field.
> > 
> > Including the file hash and appended signature in the measurement list
> > allows an attestation server, for example, to verify the appended
> > signature without having to know the file hash without the signature.
> 
> I don't understand this part. Isn't the hash *with* signature always
> included, and the distinguishing part about IMA is the hash *without*
> signature which is the same irrespective of signature type (PE, appended
> xattr) and irrespective of the keyt used for signoing?

Roberto Sassu added support for IMA templates.  These are the
definitions of 'ima-sig' and 'ima-modsig'.

{.name = "ima-sig", .fmt = "d-ng|n-ng|sig"},
{.name = "ima-modsig", .fmt = "d-ng|n-ng|sig|d-modsig|modsig"}

d-ng: is the file hash.  With the proposed IMA support for fs-verity
digests, the 'd-ng' field may also include the fsverity digest, based
on policy.

n-ng: is the file pathname.
sig: is the file signature stored as a 'security.ima' xattr (may be
NULL).
d-modsig: is the file hash without the appended signature (may be
NULL).

FYI, changing from "module signature" to "appended signature", might
impact the template field and

RE: [PATCH 03/14] nds32: fix access_ok() checks in get/put_user

2022-02-14 Thread David Laight

From: Christoph Hellwig
> Sent: 14 February 2022 17:01
> 
> On Mon, Feb 14, 2022 at 05:34:41PM +0100, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> >
> > The get_user()/put_user() functions are meant to check for
> > access_ok(), while the __get_user()/__put_user() functions
> > don't.
> >
> > This broke in 4.19 for nds32, when it gained an extraneous
> > check in __get_user(), but lost the check it needs in
> > __put_user().
> 
> Can we follow the lead of MIPS (which this was originally copied
> from I think) and kill the pointless __get/put_user_check wrapper
> that just obsfucate the code?

Is it possible to make all these architectures fall back to
a common definition somewhere?

Maybe they need to define ACCESS_OK_USER_LIMIT - which can be
different from TASK_SIZE.

There'll be a few special cases, but most architectures have
kernel addresses above userspace ones.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH 11/14] sparc64: remove CONFIG_SET_FS support

2022-02-14 Thread Christoph Hellwig

>  void prom_world(int enter)
>  {
> - if (!enter)
> - set_fs(get_fs());
> -
>   __asm__ __volatile__("flushw");
>  }

The enter argument is now unused.

Re: [PATCH 10/14] uaccess: remove most CONFIG_SET_FS users

2022-02-14 Thread Christoph Hellwig

On Mon, Feb 14, 2022 at 05:34:48PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> On almost all architectures, there are no remaining callers
> of set_fs(), so CONFIG_SET_FS can be disabled, along with
> removing the thread_info field and any references to it.
> 
> This turns access_ok() into a cheaper check against TASK_SIZE_MAX.

Wouldn't it make more sense to just merge this into the last patch?

Re: [PATCH 07/14] uaccess: generalize access_ok()

2022-02-14 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH 05/14] uaccess: add generic __{get,put}_kernel_nofault

2022-02-14 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Christoph Hellwig

On Mon, Feb 14, 2022 at 05:34:42PM +0100, Arnd Bergmann wrote:
> +#define __range_not_ok(addr, size, limit)(!__access_ok(addr, size))
> +#define __chk_range_not_ok(addr, size, limit)(!__access_ok((void 
> __user *)addr, size))

Can we just kill these off insted of letting themm obsfucate the code?

Re: [PATCH 03/14] nds32: fix access_ok() checks in get/put_user

2022-02-14 Thread Christoph Hellwig

On Mon, Feb 14, 2022 at 05:34:41PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The get_user()/put_user() functions are meant to check for
> access_ok(), while the __get_user()/__put_user() functions
> don't.
> 
> This broke in 4.19 for nds32, when it gained an extraneous
> check in __get_user(), but lost the check it needs in
> __put_user().

Can we follow the lead of MIPS (which this was originally copied
from I think) and kill the pointless __get/put_user_check wrapper
that just obsfucate the code?

Re: [PATCH 01/14] uaccess: fix integer overflow on access_ok()

2022-02-14 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

[PATCH 14/14] uaccess: drop set_fs leftovers

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

There are no more users of CONFIG_SET_FS left, so drop all
remaining references to set_fs()/get_fs(), mm_segment_t
and uaccess_kernel().

Signed-off-by: Arnd Bergmann 
---
 arch/Kconfig   |  3 ---
 arch/arm/lib/uaccess_with_memcpy.c | 10 -
 arch/nds32/kernel/process.c|  5 ++---
 arch/parisc/include/asm/futex.h|  2 +-
 arch/parisc/lib/memcpy.c   |  2 +-
 drivers/hid/uhid.c |  2 +-
 drivers/scsi/sg.c  |  5 -
 fs/exec.c  |  6 --
 include/asm-generic/access_ok.h| 10 +
 include/linux/syscalls.h   |  4 
 include/linux/uaccess.h| 33 --
 include/rdma/ib.h  |  2 +-
 kernel/events/callchain.c  |  4 
 kernel/events/core.c   |  3 ---
 kernel/exit.c  | 14 -
 kernel/kthread.c   |  5 -
 kernel/stacktrace.c|  3 ---
 kernel/trace/bpf_trace.c   |  4 
 mm/maccess.c   | 11 --
 mm/memory.c|  8 
 net/bpfilter/bpfilter_kern.c   |  2 +-
 21 files changed, 8 insertions(+), 130 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 678a80713b21..96075a12c720 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,9 +24,6 @@ config KEXEC_ELF
 config HAVE_IMA_KEXEC
bool
 
-config SET_FS
-   bool
-
 config HOTPLUG_SMT
bool
 
diff --git a/arch/arm/lib/uaccess_with_memcpy.c 
b/arch/arm/lib/uaccess_with_memcpy.c
index 106f83a5ea6d..c30b689bec2e 100644
--- a/arch/arm/lib/uaccess_with_memcpy.c
+++ b/arch/arm/lib/uaccess_with_memcpy.c
@@ -92,11 +92,6 @@ __copy_to_user_memcpy(void __user *to, const void *from, 
unsigned long n)
unsigned long ua_flags;
int atomic;
 
-   if (uaccess_kernel()) {
-   memcpy((void *)to, from, n);
-   return 0;
-   }
-
/* the mmap semaphore is taken only if not in an atomic context */
atomic = faulthandler_disabled();
 
@@ -165,11 +160,6 @@ __clear_user_memset(void __user *addr, unsigned long n)
 {
unsigned long ua_flags;
 
-   if (uaccess_kernel()) {
-   memset((void *)addr, 0, n);
-   return 0;
-   }
-
mmap_read_lock(current->mm);
while (n) {
pte_t *pte;
diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
index 49fab9e39cbf..d35c1f63fa11 100644
--- a/arch/nds32/kernel/process.c
+++ b/arch/nds32/kernel/process.c
@@ -119,9 +119,8 @@ void show_regs(struct pt_regs *regs)
regs->uregs[7], regs->uregs[6], regs->uregs[5], regs->uregs[4]);
pr_info("r3 : %08lx  r2 : %08lx  r1 : %08lx  r0 : %08lx\n",
regs->uregs[3], regs->uregs[2], regs->uregs[1], regs->uregs[0]);
-   pr_info("  IRQs o%s  Segment %s\n",
-   interrupts_enabled(regs) ? "n" : "ff",
-   uaccess_kernel() ? "kernel" : "user");
+   pr_info("  IRQs o%s  Segment user\n",
+   interrupts_enabled(regs) ? "n" : "ff");
 }
 
 EXPORT_SYMBOL(show_regs);
diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
index b5835325d44b..2f4a1b1ef387 100644
--- a/arch/parisc/include/asm/futex.h
+++ b/arch/parisc/include/asm/futex.h
@@ -99,7 +99,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
/* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
 * our gateway page, and causes no end of trouble...
 */
-   if (uaccess_kernel() && !uaddr)
+   if (!uaddr)
return -EFAULT;
 
if (!access_ok(uaddr, sizeof(u32)))
diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index ea70a0e08321..468704ce8a1c 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -13,7 +13,7 @@
 #include 
 #include 
 
-#define get_user_space() (uaccess_kernel() ? 0 : mfsp(3))
+#define get_user_space() (mfsp(3))
 #define get_kernel_space() (0)
 
 /* Returns 0 for success, otherwise, returns number of bytes not transferred. 
*/
diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c
index 614adb510dbd..2a918aeb0af1 100644
--- a/drivers/hid/uhid.c
+++ b/drivers/hid/uhid.c
@@ -747,7 +747,7 @@ static ssize_t uhid_char_write(struct file *file, const 
char __user *buffer,
 * copied from, so it's unsafe to allow this with elevated
 * privileges (e.g. from a setuid binary) or via kernel_write().
 */
-   if (file->f_cred != current_cred() || uaccess_kernel()) {
+   if (file->f_cred != current_cred()) {
pr_err_once("UHID_CREATE from different security 
context by process %d (%s), this is not allowed.\n",
task_tgid_vnr(current), current->comm);
ret = -EACCES;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index

[PATCH 13/14] ia64: remove CONFIG_SET_FS support

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

ia64 only uses set_fs() in one file to handle unaligned access for
both user space and kernel instructions. Rewrite this to explicitly
pass around a flag about which one it is and drop the feature from
the architecture.

Signed-off-by: Arnd Bergmann 
---
 arch/ia64/Kconfig   |  1 -
 arch/ia64/include/asm/processor.h   |  4 --
 arch/ia64/include/asm/thread_info.h |  2 -
 arch/ia64/include/asm/uaccess.h | 21 +++---
 arch/ia64/kernel/unaligned.c| 60 +++--
 5 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index a7e01573abd8..6b6a35b3d959 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -61,7 +61,6 @@ config IA64
select NEED_SG_DMA_LENGTH
select NUMA if !FLATMEM
select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
-   select SET_FS
select ZONE_DMA32
default y
help
diff --git a/arch/ia64/include/asm/processor.h 
b/arch/ia64/include/asm/processor.h
index 45365c2ef598..7cbce290f4e5 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -243,10 +243,6 @@ DECLARE_PER_CPU(struct cpuinfo_ia64, ia64_cpu_info);
 
 extern void print_cpu_info (struct cpuinfo_ia64 *);
 
-typedef struct {
-   unsigned long seg;
-} mm_segment_t;
-
 #define SET_UNALIGN_CTL(task,value)
\
 ({ 
\
(task)->thread.flags = (((task)->thread.flags & ~IA64_THREAD_UAC_MASK)  
\
diff --git a/arch/ia64/include/asm/thread_info.h 
b/arch/ia64/include/asm/thread_info.h
index 51d20cb37706..ef83493e6778 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -27,7 +27,6 @@ struct thread_info {
__u32 cpu;  /* current CPU */
__u32 last_cpu; /* Last CPU thread ran on */
__u32 status;   /* Thread synchronous flags */
-   mm_segment_t addr_limit;/* user-level address space limit */
int preempt_count;  /* 0=premptable, <0=BUG; will also 
serve as bh-counter */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
__u64 utime;
@@ -48,7 +47,6 @@ struct thread_info {
.task   = , \
.flags  = 0,\
.cpu= 0,\
-   .addr_limit = KERNEL_DS,\
.preempt_count  = INIT_PREEMPT_COUNT,   \
 }
 
diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index e242a3cc1330..60adadeb3e9e 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -42,26 +42,17 @@
 #include 
 
 /*
- * For historical reasons, the following macros are grossly misnamed:
- */
-#define KERNEL_DS  ((mm_segment_t) { ~0UL })   /* cf. 
access_ok() */
-#define USER_DS((mm_segment_t) { TASK_SIZE-1 })/* cf. 
access_ok() */
-
-#define get_fs()  (current_thread_info()->addr_limit)
-#define set_fs(x) (current_thread_info()->addr_limit = (x))
-
-/*
- * When accessing user memory, we need to make sure the entire area really is 
in
- * user-level space.  In order to do this efficiently, we make sure that the 
page at
- * address TASK_SIZE is never valid.  We also need to make sure that the 
address doesn't
+ * When accessing user memory, we need to make sure the entire area really is
+ * in user-level space.  We also need to make sure that the address doesn't
  * point inside the virtually mapped linear page table.
  */
 static inline int __access_ok(const void __user *p, unsigned long size)
 {
+   unsigned long limit = TASK_SIZE;
unsigned long addr = (unsigned long)p;
-   unsigned long seg = get_fs().seg;
-   return likely(addr <= seg) &&
-(seg == KERNEL_DS.seg || likely(REGION_OFFSET(addr) < RGN_MAP_LIMIT));
+
+   return likely((size <= limit) && (addr <= (limit - size)) &&
+likely(REGION_OFFSET(addr) < RGN_MAP_LIMIT));
 }
 #define __access_ok __access_ok
 #include 
diff --git a/arch/ia64/kernel/unaligned.c b/arch/ia64/kernel/unaligned.c
index 6c1a8951dfbb..0acb5a0cd7ab 100644
--- a/arch/ia64/kernel/unaligned.c
+++ b/arch/ia64/kernel/unaligned.c
@@ -749,9 +749,25 @@ emulate_load_updates (update_t type, load_store_t ld, 
struct pt_regs *regs, unsi
}
 }
 
+static int emulate_store(unsigned long ifa, void *val, int len, bool 
kernel_mode)
+{
+   if (kernel_mode)
+   return copy_to_kernel_nofault((void *)ifa, val, len);
+
+   return copy_to_user((void __user *)ifa, val, len);
+}
+
+static int emulate_load(void *val, unsigned long ifa, int len, bool 
kernel_mode)
+{
+   if (kernel_mode)
+  return copy_from_kernel_nofault(val, (void *)ifa, len);
+
+   return copy_from_user(val, (void

[PATCH 12/14] sh: remove CONFIG_SET_FS support

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

sh uses set_fs/get_fs only in one file, to handle address
errors in both user and kernel memory.

It already has an abstraction to differentiate between I/O
and memory, so adding a third class for kernel memory fits
into the same scheme and lets us kill off CONFIG_SET_FS.

Signed-off-by: Arnd Bergmann 
---
 arch/sh/Kconfig   |  1 -
 arch/sh/include/asm/processor.h   |  1 -
 arch/sh/include/asm/segment.h | 33 ---
 arch/sh/include/asm/thread_info.h |  2 --
 arch/sh/include/asm/uaccess.h |  4 
 arch/sh/kernel/io_trapped.c   |  9 ++---
 arch/sh/kernel/process_32.c   |  2 --
 arch/sh/kernel/traps_32.c | 30 +---
 8 files changed, 21 insertions(+), 61 deletions(-)
 delete mode 100644 arch/sh/include/asm/segment.h

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 2474a04ceac4..f676e92b7d5b 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -65,7 +65,6 @@ config SUPERH
select PERF_EVENTS
select PERF_USE_VMALLOC
select RTC_LIB
-   select SET_FS
select SPARSE_IRQ
select TRACE_IRQFLAGS_SUPPORT
help
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 3820d698846e..85a6c1c3c16e 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -3,7 +3,6 @@
 #define __ASM_SH_PROCESSOR_H
 
 #include 
-#include 
 #include 
 
 #ifndef __ASSEMBLY__
diff --git a/arch/sh/include/asm/segment.h b/arch/sh/include/asm/segment.h
deleted file mode 100644
index 02e54a3335d6..
--- a/arch/sh/include/asm/segment.h
+++ /dev/null
@@ -1,33 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_SH_SEGMENT_H
-#define __ASM_SH_SEGMENT_H
-
-#ifndef __ASSEMBLY__
-
-typedef struct {
-   unsigned long seg;
-} mm_segment_t;
-
-#define MAKE_MM_SEG(s) ((mm_segment_t) { (s) })
-
-/*
- * The fs value determines whether argument validity checking should be
- * performed or not.  If get_fs() == USER_DS, checking is performed, with
- * get_fs() == KERNEL_DS, checking is bypassed.
- *
- * For historical reasons, these macros are grossly misnamed.
- */
-#define KERNEL_DS  MAKE_MM_SEG(0xUL)
-#ifdef CONFIG_MMU
-#define USER_DSMAKE_MM_SEG(PAGE_OFFSET)
-#else
-#define USER_DSKERNEL_DS
-#endif
-
-#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
-
-#define get_fs()   (current_thread_info()->addr_limit)
-#define set_fs(x)  (current_thread_info()->addr_limit = (x))
-
-#endif /* __ASSEMBLY__ */
-#endif /* __ASM_SH_SEGMENT_H */
diff --git a/arch/sh/include/asm/thread_info.h 
b/arch/sh/include/asm/thread_info.h
index 598d0184ffea..b119b859a0a3 100644
--- a/arch/sh/include/asm/thread_info.h
+++ b/arch/sh/include/asm/thread_info.h
@@ -30,7 +30,6 @@ struct thread_info {
__u32   status; /* thread synchronous flags */
__u32   cpu;
int preempt_count; /* 0 => preemptable, <0 => BUG */
-   mm_segment_taddr_limit; /* thread address space */
unsigned long   previous_sp;/* sp of previous stack in case
   of nested IRQ stacks */
__u8supervisor_stack[0];
@@ -58,7 +57,6 @@ struct thread_info {
.status = 0,\
.cpu= 0,\
.preempt_count  = INIT_PREEMPT_COUNT,   \
-   .addr_limit = KERNEL_DS,\
 }
 
 /* how to get the current stack pointer from C */
diff --git a/arch/sh/include/asm/uaccess.h b/arch/sh/include/asm/uaccess.h
index ccd219d74851..a79609eb14be 100644
--- a/arch/sh/include/asm/uaccess.h
+++ b/arch/sh/include/asm/uaccess.h
@@ -2,11 +2,7 @@
 #ifndef __ASM_SH_UACCESS_H
 #define __ASM_SH_UACCESS_H
 
-#include 
 #include 
-
-#define user_addr_max()(current_thread_info()->addr_limit.seg)
-
 #include 
 
 /*
diff --git a/arch/sh/kernel/io_trapped.c b/arch/sh/kernel/io_trapped.c
index 004ad0130b10..e803b14ef12e 100644
--- a/arch/sh/kernel/io_trapped.c
+++ b/arch/sh/kernel/io_trapped.c
@@ -270,7 +270,6 @@ static struct mem_access trapped_io_access = {
 
 int handle_trapped_io(struct pt_regs *regs, unsigned long address)
 {
-   mm_segment_t oldfs;
insn_size_t instruction;
int tmp;
 
@@ -281,16 +280,12 @@ int handle_trapped_io(struct pt_regs *regs, unsigned long 
address)
 
WARN_ON(user_mode(regs));
 
-   oldfs = get_fs();
-   set_fs(KERNEL_DS);
-   if (copy_from_user(, (void *)(regs->pc),
-  sizeof(instruction))) {
-   set_fs(oldfs);
+   if (copy_from_kernel_nofault(, (void *)(regs->pc),
+sizeof(instruction))) {
return 0;
}
 
tmp = handle_unaligned_access(instruction, regs,
  _io_access,

[PATCH 11/14] sparc64: remove CONFIG_SET_FS support

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

sparc64 uses address space identifiers to differentiate between kernel
and user space, using ASI_P for kernel threads but ASI_AIUS for normal
user space, with the option of changing between them.

As nothing really changes the ASI any more, just hardcode ASI_AIUS
everywhere. Kernel threads are not allowed to access __user pointers
anyway.

Signed-off-by: Arnd Bergmann 
---
 arch/sparc/Kconfig  |  1 -
 arch/sparc/include/asm/processor_64.h   |  4 
 arch/sparc/include/asm/switch_to_64.h   |  4 +---
 arch/sparc/include/asm/thread_info_64.h |  4 +---
 arch/sparc/include/asm/uaccess_64.h | 24 
 arch/sparc/kernel/process_64.c  | 12 
 arch/sparc/kernel/traps_64.c|  2 --
 arch/sparc/lib/NGmemcpy.S   |  3 +--
 arch/sparc/mm/init_64.c |  3 ---
 9 files changed, 3 insertions(+), 54 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 875388835a58..5f08e4d16ad8 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -99,7 +99,6 @@ config SPARC64
select HAVE_SETUP_PER_CPU_AREA
select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK
-   select SET_FS
 
 config ARCH_PROC_KCORE_TEXT
def_bool y
diff --git a/arch/sparc/include/asm/processor_64.h 
b/arch/sparc/include/asm/processor_64.h
index ae851e8fce4c..89850dff6b03 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -47,10 +47,6 @@
 
 #ifndef __ASSEMBLY__
 
-typedef struct {
-   unsigned char seg;
-} mm_segment_t;
-
 /* The Sparc processor specific thread struct. */
 /* XXX This should die, everything can go into thread_info now. */
 struct thread_struct {
diff --git a/arch/sparc/include/asm/switch_to_64.h 
b/arch/sparc/include/asm/switch_to_64.h
index b1d4e2e3210f..14f3c49bfdbc 100644
--- a/arch/sparc/include/asm/switch_to_64.h
+++ b/arch/sparc/include/asm/switch_to_64.h
@@ -20,10 +20,8 @@ do { \
 */
 #define switch_to(prev, next, last)\
 do {   save_and_clear_fpu();   \
-   /* If you are tempted to conditionalize the following */\
-   /* so that ASI is only written if it changes, think again. */   \
__asm__ __volatile__("wr %%g0, %0, %%asi"   \
-   : : "r" (task_thread_info(next)->current_ds));\
+   : : "r" (ASI_AIUS));\
trap_block[current_thread_info()->cpu].thread = \
task_thread_info(next); \
__asm__ __volatile__(   \
diff --git a/arch/sparc/include/asm/thread_info_64.h 
b/arch/sparc/include/asm/thread_info_64.h
index 8047a9caab2f..1a44372e2bc0 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -46,7 +46,7 @@ struct thread_info {
struct pt_regs  *kregs;
int preempt_count;  /* 0 => preemptable, <0 => BUG 
*/
__u8new_child;
-   __u8current_ds;
+   __u8__pad;
__u16   cpu;
 
unsigned long   *utraps;
@@ -81,7 +81,6 @@ struct thread_info {
 #define TI_KREGS   0x0028
 #define TI_PRE_COUNT   0x0030
 #define TI_NEW_CHILD   0x0034
-#define TI_CURRENT_DS  0x0035
 #define TI_CPU 0x0036
 #define TI_UTRAPS  0x0038
 #define TI_REG_WINDOW  0x0040
@@ -116,7 +115,6 @@ struct thread_info {
 #define INIT_THREAD_INFO(tsk)  \
 {  \
.task   =   ,   \
-   .current_ds =   ASI_P,  \
.preempt_count  =   INIT_PREEMPT_COUNT, \
.kregs  =   (struct pt_regs *)(init_stack+THREAD_SIZE)-1 \
 }
diff --git a/arch/sparc/include/asm/uaccess_64.h 
b/arch/sparc/include/asm/uaccess_64.h
index 000bac67cf31..617a462d1f56 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -13,24 +13,6 @@
 
 #include 
 
-/*
- * Sparc64 is segmented, though more like the M68K than the I386.
- * We use the secondary ASI to address user memory, which references a
- * completely different VM map, thus there is zero chance of the user
- * doing something queer and tricking us into poking kernel memory.
- *
- * What is left here is basically what is needed for the other parts of
- * the kernel that expect to be able to manipulate, erum, "segments".
- * Or perhaps more properly, permissions.
- *
- * "For historical reasons, these macros are grossly misnamed." -Linus
- */
-
-#define KERNEL_DS   ((mm_segment_t) { ASI_P })
-#define USER_DS ((mm_segment_t) { ASI_AIUS })  /*

[PATCH 10/14] uaccess: remove most CONFIG_SET_FS users

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

On almost all architectures, there are no remaining callers
of set_fs(), so CONFIG_SET_FS can be disabled, along with
removing the thread_info field and any references to it.

This turns access_ok() into a cheaper check against TASK_SIZE_MAX.

Signed-off-by: Arnd Bergmann 
---
 arch/alpha/Kconfig|  1 -
 arch/alpha/include/asm/processor.h|  4 --
 arch/alpha/include/asm/thread_info.h  |  2 -
 arch/alpha/include/asm/uaccess.h  | 19 --
 arch/arc/Kconfig  |  1 -
 arch/arc/include/asm/segment.h| 20 ---
 arch/arc/include/asm/thread_info.h|  3 -
 arch/arc/include/asm/uaccess.h|  1 -
 arch/csky/Kconfig |  1 -
 arch/csky/include/asm/processor.h |  2 -
 arch/csky/include/asm/segment.h   | 10 
 arch/csky/include/asm/thread_info.h   |  2 -
 arch/csky/include/asm/uaccess.h   |  3 -
 arch/csky/kernel/asm-offsets.c|  1 -
 arch/h8300/Kconfig|  1 -
 arch/h8300/include/asm/processor.h|  1 -
 arch/h8300/include/asm/segment.h  | 40 -
 arch/h8300/include/asm/thread_info.h  |  3 -
 arch/h8300/kernel/entry.S |  1 -
 arch/h8300/kernel/head_ram.S  |  1 -
 arch/h8300/mm/init.c  |  6 --
 arch/h8300/mm/memory.c|  1 -
 arch/hexagon/Kconfig  |  1 -
 arch/hexagon/include/asm/thread_info.h|  6 --
 arch/hexagon/kernel/process.c |  1 -
 arch/microblaze/Kconfig   |  1 -
 arch/microblaze/include/asm/thread_info.h |  6 --
 arch/microblaze/include/asm/uaccess.h | 24 
 arch/microblaze/kernel/asm-offsets.c  |  1 -
 arch/microblaze/kernel/process.c  |  1 -
 arch/nds32/Kconfig|  1 -
 arch/nds32/include/asm/thread_info.h  |  4 --
 arch/nds32/include/asm/uaccess.h  | 15 +
 arch/nds32/mm/alignment.c |  3 -
 arch/nios2/Kconfig|  1 -
 arch/nios2/include/asm/thread_info.h  |  9 ---
 arch/nios2/include/asm/uaccess.h  | 12 
 arch/openrisc/Kconfig |  1 -
 arch/openrisc/include/asm/thread_info.h   |  7 ---
 arch/openrisc/include/asm/uaccess.h   | 23 
 arch/sparc/Kconfig|  2 +-
 arch/sparc/include/asm/processor_32.h |  6 --
 arch/sparc/include/asm/uaccess_32.h   | 13 -
 arch/sparc/kernel/process_32.c|  2 -
 arch/xtensa/Kconfig   |  1 -
 arch/xtensa/include/asm/asm-uaccess.h | 71 ---
 arch/xtensa/include/asm/processor.h   |  7 ---
 arch/xtensa/include/asm/thread_info.h |  3 -
 arch/xtensa/include/asm/uaccess.h | 16 -
 arch/xtensa/kernel/asm-offsets.c  |  3 -
 include/asm-generic/uaccess.h | 25 +---
 51 files changed, 3 insertions(+), 387 deletions(-)
 delete mode 100644 arch/arc/include/asm/segment.h
 delete mode 100644 arch/csky/include/asm/segment.h
 delete mode 100644 arch/h8300/include/asm/segment.h

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 4e87783c90ad..eee8b5b0a58b 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -35,7 +35,6 @@ config ALPHA
select OLD_SIGSUSPEND
select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67
select MMU_GATHER_NO_RANGE
-   select SET_FS
select SPARSEMEM_EXTREME if SPARSEMEM
select ZONE_DMA
help
diff --git a/arch/alpha/include/asm/processor.h 
b/arch/alpha/include/asm/processor.h
index 090499c99c1c..43e234c518b1 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -26,10 +26,6 @@
 #define TASK_UNMAPPED_BASE \
   ((current->personality & ADDR_LIMIT_32BIT) ? 0x4000 : TASK_SIZE / 2)
 
-typedef struct {
-   unsigned long seg;
-} mm_segment_t;
-
 /* This is dead.  Everything has been moved to thread_info.  */
 struct thread_struct { };
 #define INIT_THREAD  { }
diff --git a/arch/alpha/include/asm/thread_info.h 
b/arch/alpha/include/asm/thread_info.h
index 2592356e3215..fdc485d7787a 100644
--- a/arch/alpha/include/asm/thread_info.h
+++ b/arch/alpha/include/asm/thread_info.h
@@ -19,7 +19,6 @@ struct thread_info {
unsigned intflags;  /* low level flags */
unsigned intieee_state; /* see fpu.h */
 
-   mm_segment_taddr_limit; /* thread address space */
unsignedcpu;/* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
unsigned intstatus; /* thread-synchronous flags */
@@ -35,7 +34,6 @@ struct thread_info {
 #define INIT_THREAD_INFO(tsk)  \
 {  \
.task   = , \
-   .addr_limit = KERNEL_DS,\

[PATCH 09/14] m68k: drop custom __access_ok()

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

While most m68k platforms use separate address spaces for user
and kernel space, at least coldfire does not, and the other
ones have a TASK_SIZE that is less than the entire 4GB address
range.

Using the generic implementation of __access_ok() stops coldfire
user space from trivially accessing kernel memory, and is probably
the right thing elsewhere for consistency as well.

Signed-off-by: Arnd Bergmann 
---
 arch/m68k/include/asm/uaccess.h | 13 -
 1 file changed, 13 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index d6bb5720365a..64914872a5c9 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -10,19 +10,6 @@
 #include 
 #include 
 #include 
-
-/* We let the MMU do all checking */
-static inline int __access_ok(const void __user *addr,
-   unsigned long size)
-{
-   /*
-* XXX: for !CONFIG_CPU_HAS_ADDRESS_SPACES this really needs to check
-* for TASK_SIZE!
-* Removing this helper is probably sufficient.
-*/
-   return 1;
-}
-#define __access_ok __access_ok
 #include 
 
 /*
-- 
2.29.2

[PATCH 08/14] arm64: simplify access_ok()

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

arm64 has an inline asm implementation of access_ok() that is derived from
the 32-bit arm version and optimized for the case that both the limit and
the size are variable. With set_fs() gone, the limit is always constant,
and the size usually is as well, so just using the default implementation
reduces the check into a comparison against a constant that can be
scheduled by the compiler.

On a defconfig build, this saves over 28KB of .text.

Signed-off-by: Arnd Bergmann 
---
 arch/arm64/include/asm/uaccess.h | 28 +---
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 357f7bd9c981..e8dce0cc5eaa 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -26,6 +26,8 @@
 #include 
 #include 
 
+static inline int __access_ok(const void __user *ptr, unsigned long size);
+
 /*
  * Test whether a block of memory is a valid user space address.
  * Returns 1 if the range is valid, 0 otherwise.
@@ -33,10 +35,8 @@
  * This is equivalent to the following test:
  * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
  */
-static inline unsigned long __access_ok(const void __user *addr, unsigned long 
size)
+static inline int access_ok(const void __user *addr, unsigned long size)
 {
-   unsigned long ret, limit = TASK_SIZE_MAX - 1;
-
/*
 * Asynchronous I/O running in a kernel thread does not have the
 * TIF_TAGGED_ADDR flag of the process owning the mm, so always untag
@@ -46,27 +46,9 @@ static inline unsigned long __access_ok(const void __user 
*addr, unsigned long s
(current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
addr = untagged_addr(addr);
 
-   __chk_user_ptr(addr);
-   asm volatile(
-   // A + B <= C + 1 for all A,B,C, in four easy steps:
-   // 1: X = A + B; X' = X % 2^64
-   "   adds%0, %3, %2\n"
-   // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
-   "   csel%1, xzr, %1, hi\n"
-   // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
-   //to compensate for the carry flag being set in step 4. For
-   //X > 2^64, X' merely has to remain nonzero, which it does.
-   "   csinv   %0, %0, xzr, cc\n"
-   // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
-   //comes from the carry in being clear. Otherwise, we are
-   //testing X' - C == 0, subject to the previous adjustments.
-   "   sbcsxzr, %0, %1\n"
-   "   cset%0, ls\n"
-   : "=" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
-
-   return ret;
+   return likely(__access_ok(addr, size));
 }
-#define __access_ok __access_ok
+#define access_ok access_ok
 
 #include 
 
-- 
2.29.2

[PATCH 07/14] uaccess: generalize access_ok()

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.

Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.

For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.

Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.

Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.

Signed-off-by: Arnd Bergmann 
---
 arch/alpha/include/asm/uaccess.h  | 34 +++
 arch/arc/include/asm/uaccess.h| 29 -
 arch/arm/include/asm/uaccess.h| 20 +
 arch/arm/kernel/swp_emulate.c |  2 +-
 arch/arm/kernel/traps.c   |  2 +-
 arch/arm64/include/asm/uaccess.h  |  5 ++-
 arch/csky/include/asm/uaccess.h   |  8 
 arch/csky/kernel/signal.c |  2 +-
 arch/hexagon/include/asm/uaccess.h| 25 
 arch/ia64/include/asm/uaccess.h   |  5 +--
 arch/m68k/include/asm/uaccess.h   |  5 ++-
 arch/microblaze/include/asm/uaccess.h |  8 +---
 arch/mips/include/asm/uaccess.h   | 29 +
 arch/nds32/include/asm/uaccess.h  |  7 +---
 arch/nios2/include/asm/uaccess.h  | 11 +
 arch/nios2/kernel/signal.c| 20 +
 arch/openrisc/include/asm/uaccess.h   | 19 +
 arch/parisc/include/asm/uaccess.h | 10 +++--
 arch/powerpc/include/asm/uaccess.h| 11 +
 arch/powerpc/lib/sstep.c  |  4 +-
 arch/riscv/include/asm/uaccess.h  | 31 +-
 arch/riscv/kernel/perf_callchain.c|  2 +-
 arch/s390/include/asm/uaccess.h   | 11 ++---
 arch/sh/include/asm/uaccess.h | 22 +-
 arch/sparc/include/asm/uaccess.h  |  3 --
 arch/sparc/include/asm/uaccess_32.h   | 18 ++--
 arch/sparc/include/asm/uaccess_64.h   | 35 
 arch/sparc/kernel/signal_32.c |  2 +-
 arch/um/include/asm/uaccess.h |  5 ++-
 arch/x86/include/asm/uaccess.h| 14 +--
 arch/xtensa/include/asm/uaccess.h | 10 +
 include/asm-generic/access_ok.h   | 59 +++
 include/asm-generic/uaccess.h | 21 +-
 include/linux/uaccess.h   |  7 
 34 files changed, 130 insertions(+), 366 deletions(-)
 create mode 100644 include/asm-generic/access_ok.h

diff --git a/arch/alpha/include/asm/uaccess.h b/arch/alpha/include/asm/uaccess.h
index 1b6f25efa247..82c5743fc9cd 100644
--- a/arch/alpha/include/asm/uaccess.h
+++ b/arch/alpha/include/asm/uaccess.h
@@ -20,28 +20,7 @@
 #define get_fs()  (current_thread_info()->addr_limit)
 #define set_fs(x) (current_thread_info()->addr_limit = (x))
 
-#define uaccess_kernel()   (get_fs().seg == KERNEL_DS.seg)
-
-/*
- * Is a address valid? This does a straightforward calculation rather
- * than tests.
- *
- * Address valid if:
- *  - "addr" doesn't have any high-bits set
- *  - AND "size" doesn't have any high-bits set
- *  - AND "addr+size-(size != 0)" doesn't have any high-bits set
- *  - OR we are in kernel mode.
- */
-#define __access_ok(addr, size) ({ \
-   unsigned long __ao_a = (addr), __ao_b = (size); \
-   unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;\
-   (get_fs().seg & (__ao_a | __ao_b | __ao_end)) == 0; })
-
-#define access_ok(addr, size)  \
-({ \
-   __chk_user_ptr(addr);   \
-   __access_ok(((unsigned long)(addr)), (size));   \
-})
+#include 
 
 /*
  * These are the main single-value transfer routines.  They automatically
@@ -105,7 +84,7 @@ extern void __get_user_unknown(void);
long __gu_err = -EFAULT;\
unsigned long __gu_val = 0; \
const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \
-   if (__access_ok((unsigned long)__gu_addr, size)) {  \
+   if (__access_ok(__gu_addr, size)) { \
__gu_err = 0;   \
switch (size) { \
  case 1: __get_user_8(__gu_addr); break;   \
@@ -200,7

[PATCH 06/14] mips: use simpler access_ok()

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

Before unifying the mips version of __access_ok() with the generic
code, this converts it to the same algorithm. This is a change in
behavior on mips64, as now address in the user segment, the lower
2^62 bytes, is taken to be valid, relying on a page fault for
addresses that are within that segment but not valid on that CPU.

The new version should be the most effecient way to do this, but
it gets rid of the special handling for size=0 that most other
architectures ignore as well.

Signed-off-by: Arnd Bergmann 
---
 arch/mips/include/asm/uaccess.h | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index db9a8e002b62..d7c89dc3426c 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -19,6 +19,7 @@
 #ifdef CONFIG_32BIT
 
 #define __UA_LIMIT 0x8000UL
+#define TASK_SIZE_MAX  __UA_LIMIT
 
 #define __UA_ADDR  ".word"
 #define __UA_LA"la"
@@ -33,6 +34,7 @@
 extern u64 __ua_limit;
 
 #define __UA_LIMIT __ua_limit
+#define TASK_SIZE_MAX  XKSSEG
 
 #define __UA_ADDR  ".dword"
 #define __UA_LA"dla"
@@ -42,22 +44,6 @@ extern u64 __ua_limit;
 
 #endif /* CONFIG_64BIT */
 
-/*
- * Is a address valid? This does a straightforward calculation rather
- * than tests.
- *
- * Address valid if:
- *  - "addr" doesn't have any high-bits set
- *  - AND "size" doesn't have any high-bits set
- *  - AND "addr+size" doesn't have any high-bits set
- *  - OR we are in kernel mode.
- *
- * __ua_size() is a trick to avoid runtime checking of positive constant
- * sizes; for those we already know at compile time that the size is ok.
- */
-#define __ua_size(size)
\
-   ((__builtin_constant_p(size) && (signed long) (size) > 0) ? 0 : (size))
-
 /*
  * access_ok: - Checks if a user space pointer is valid
  * @addr: User space pointer to start of block to check
@@ -79,9 +65,9 @@ extern u64 __ua_limit;
 static inline int __access_ok(const void __user *p, unsigned long size)
 {
unsigned long addr = (unsigned long)p;
-   unsigned long end = addr + size - !!size;
+   unsigned long limit = TASK_SIZE_MAX;
 
-   return (__UA_LIMIT & (addr | end | __ua_size(size))) == 0;
+   return (size <= limit) && (addr <= (limit - size));
 }
 
 #define access_ok(addr, size)  \
-- 
2.29.2

[PATCH 05/14] uaccess: add generic __{get,put}_kernel_nofault

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

All architectures that don't provide __{get,put}_kernel_nofault() yet
can implement this on top of __{get,put}_user.

Add a generic version that lets everything use the normal
copy_{from,to}_kernel_nofault() code based on these, removing the last
use of get_fs()/set_fs() from architecture-independent code.

Signed-off-by: Arnd Bergmann 
---
 arch/arm/include/asm/uaccess.h  |   2 -
 arch/arm64/include/asm/uaccess.h|   2 -
 arch/m68k/include/asm/uaccess.h |   2 -
 arch/mips/include/asm/uaccess.h |   2 -
 arch/parisc/include/asm/uaccess.h   |   1 -
 arch/powerpc/include/asm/uaccess.h  |   2 -
 arch/riscv/include/asm/uaccess.h|   2 -
 arch/s390/include/asm/uaccess.h |   2 -
 arch/sparc/include/asm/uaccess_64.h |   2 -
 arch/um/include/asm/uaccess.h   |   2 -
 arch/x86/include/asm/uaccess.h  |   2 -
 include/asm-generic/uaccess.h   |   2 -
 include/linux/uaccess.h |  19 +
 mm/maccess.c| 108 
 14 files changed, 19 insertions(+), 131 deletions(-)

diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 32dbfd81f42a..d20d78c34b94 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -476,8 +476,6 @@ do {
\
: "r" (x), "i" (-EFAULT)\
: "cc")
 
-#define HAVE_GET_KERNEL_NOFAULT
-
 #define __get_kernel_nofault(dst, src, type, err_label)
\
 do {   \
const type *__pk_ptr = (src);   \
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 3a5ff5e20586..2e20879fe3cf 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -26,8 +26,6 @@
 #include 
 #include 
 
-#define HAVE_GET_KERNEL_NOFAULT
-
 /*
  * Test whether a block of memory is a valid user space address.
  * Returns 1 if the range is valid, 0 otherwise.
diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index ba670523885c..79617c0b2f91 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -390,8 +390,6 @@ raw_copy_to_user(void __user *to, const void *from, 
unsigned long n)
 #define INLINE_COPY_FROM_USER
 #define INLINE_COPY_TO_USER
 
-#define HAVE_GET_KERNEL_NOFAULT
-
 #define __get_kernel_nofault(dst, src, type, err_label)
\
 do {   \
type *__gk_dst = (type *)(dst); \
diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index f8f74f9f5883..db9a8e002b62 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -296,8 +296,6 @@ struct __large_struct { unsigned long buf[100]; };
(val) = __gu_tmp.t; \
 }
 
-#define HAVE_GET_KERNEL_NOFAULT
-
 #define __get_kernel_nofault(dst, src, type, err_label)
\
 do {   \
int __gu_err;   \
diff --git a/arch/parisc/include/asm/uaccess.h 
b/arch/parisc/include/asm/uaccess.h
index ebf8a845b017..0925bbd6db67 100644
--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -95,7 +95,6 @@ struct exception_table_entry {
(val) = (__force __typeof__(*(ptr))) __gu_val;  \
 }
 
-#define HAVE_GET_KERNEL_NOFAULT
 #define __get_kernel_nofault(dst, src, type, err_label)\
 {  \
type __z;   \
diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 63316100080c..a0032c2e7550 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -467,8 +467,6 @@ do {
\
unsafe_put_user(*(u8*)(_src + _i), (u8 __user *)(_dst + _i), 
e); \
 } while (0)
 
-#define HAVE_GET_KERNEL_NOFAULT
-
 #define __get_kernel_nofault(dst, src, type, err_label)
\
__get_user_size_goto(*((type *)(dst)),  \
(__force type __user *)(src), sizeof(type), err_label)
diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index c701a5e57a2b..4407b9e48d2c 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -346,8 +346,6 @@ unsigned long __must_check clear_user(void __user *to, 
unsigned long n)
__clear_user(to, n) : n;
 }
 
-#define HAVE_GET_KERNEL_NOFAULT
-
 #define __get_kernel_nofault(dst, src, type, err_label)
\

[PATCH 04/14] x86: use more conventional access_ok() definition

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

The way that access_ok() is defined on x86 is slightly different from
most other architectures, and a bit more complex.

The generic version tends to result in the best output on all
architectures, as it results in single comparison against a constant
limit for calls with a known size.

There are a few callers of __range_not_ok(), all of which use TASK_SIZE
as the limit rather than TASK_SIZE_MAX, but I could not see any reason
for picking this. Changing these to call __access_ok() instead uses the
default limit, but keeps the behavior otherwise.

x86 is the only architecture with a WARN_ON_IN_IRQ() checking
access_ok(), but it's probably best to leave that in place.

Signed-off-by: Arnd Bergmann 
---
 arch/x86/include/asm/uaccess.h | 38 +++---
 1 file changed, 12 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index ac96f9b2d64b..6956a63291b6 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -16,30 +16,13 @@
  * Test whether a block of memory is a valid user space address.
  * Returns 0 if the range is valid, nonzero otherwise.
  */
-static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, 
unsigned long limit)
+static inline bool __access_ok(void __user *ptr, unsigned long size)
 {
-   /*
-* If we have used "sizeof()" for the size,
-* we know it won't overflow the limit (but
-* it might overflow the 'addr', so it's
-* important to subtract the size from the
-* limit, not add it to the address).
-*/
-   if (__builtin_constant_p(size))
-   return unlikely(addr > limit - size);
-
-   /* Arbitrary sizes? Be careful about overflow */
-   addr += size;
-   if (unlikely(addr < size))
-   return true;
-   return unlikely(addr > limit);
-}
+   unsigned long limit = TASK_SIZE_MAX;
+   unsigned long addr = ptr;
 
-#define __range_not_ok(addr, size, limit)  \
-({ \
-   __chk_user_ptr(addr);   \
-   __chk_range_not_ok((unsigned long __force)(addr), size, limit); \
-})
+   return (size <= limit) && (addr <= (limit - size));
+}
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 static inline bool pagefault_disabled(void);
@@ -66,12 +49,15 @@ static inline bool pagefault_disabled(void);
  * Return: true (nonzero) if the memory block may be valid, false (zero)
  * if it is definitely invalid.
  */
-#define access_ok(addr, size)  \
-({ \
-   WARN_ON_IN_IRQ();   \
-   likely(!__range_not_ok(addr, size, TASK_SIZE_MAX)); \
+#define access_ok(addr, size)  \
+({ \
+   WARN_ON_IN_IRQ();   \
+   likely(__access_ok(addr, size));\
 })
 
+#define __range_not_ok(addr, size, limit)  (!__access_ok(addr, size))
+#define __chk_range_not_ok(addr, size, limit)  (!__access_ok((void __user 
*)addr, size))
+
 extern int __get_user_1(void);
 extern int __get_user_2(void);
 extern int __get_user_4(void);
-- 
2.29.2

[PATCH 03/14] nds32: fix access_ok() checks in get/put_user

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

The get_user()/put_user() functions are meant to check for
access_ok(), while the __get_user()/__put_user() functions
don't.

This broke in 4.19 for nds32, when it gained an extraneous
check in __get_user(), but lost the check it needs in
__put_user().

Fixes: 487913ab18c2 ("nds32: Extract the checking and getting pointer to a 
macro")
Cc: sta...@vger.kernel.org @ v4.19+
Signed-off-by: Arnd Bergmann 
---
 arch/nds32/include/asm/uaccess.h | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index d4cbf069dc22..37a40981deb3 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -70,9 +70,7 @@ static inline void set_fs(mm_segment_t fs)
  * versions are void (ie, don't return a value as such).
  */
 
-#define get_user   __get_user  \
-
-#define __get_user(x, ptr) \
+#define get_user(x, ptr)   \
 ({ \
long __gu_err = 0;  \
__get_user_check((x), (ptr), __gu_err); \
@@ -85,6 +83,14 @@ static inline void set_fs(mm_segment_t fs)
(void)0;\
 })
 
+#define __get_user(x, ptr) \
+({ \
+   long __gu_err = 0;  \
+   const __typeof__(*(ptr)) __user *__p = (ptr);   \
+   __get_user_err((x), __p, (__gu_err));   \
+   __gu_err;   \
+})
+
 #define __get_user_check(x, ptr, err)  \
 ({ \
const __typeof__(*(ptr)) __user *__p = (ptr);   \
@@ -165,12 +171,18 @@ do {  
\
: "r"(addr), "i"(-EFAULT)   \
: "cc")
 
-#define put_user   __put_user  \
+#define put_user(x, ptr)   \
+({ \
+   long __pu_err = 0;  \
+   __put_user_check((x), (ptr), __pu_err); \
+   __pu_err;   \
+})
 
 #define __put_user(x, ptr) \
 ({ \
long __pu_err = 0;  \
-   __put_user_err((x), (ptr), __pu_err);   \
+   __typeof__(*(ptr)) __user *__p = (ptr); \
+   __put_user_err((x), __p, __pu_err); \
__pu_err;   \
 })
 
-- 
2.29.2

[PATCH 02/14] sparc64: add __{get,put}_kernel_nocheck()

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

sparc64 is one of the architectures that uses separate address
spaces for kernel and user addresses, so __get_kernel_nofault()
can not just call into the normal __get_user() without the
access_ok() check.

Instead duplicate __get_user() and __put_user() into their
in-kernel versions, with minor changes for the calling conventions
and leaving out the address space modifier on the assembler
instruction.

This could surely be written more elegantly, but duplicating it
gets the job done.

Signed-off-by: Arnd Bergmann 
---
 arch/sparc/include/asm/uaccess_64.h | 78 +
 1 file changed, 78 insertions(+)

diff --git a/arch/sparc/include/asm/uaccess_64.h 
b/arch/sparc/include/asm/uaccess_64.h
index 30eb4c6414d1..b283798315b1 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -100,6 +100,42 @@ void __retl_efault(void);
 struct __large_struct { unsigned long buf[100]; };
 #define __m(x) ((struct __large_struct *)(x))
 
+#define __put_kernel_nofault(dst, src, type, label)\
+do {   \
+   type *addr = (type __force *)(dst); \
+   type data = *(type *)src;   \
+   register int __pu_ret;  \
+   switch (sizeof(type)) { \
+   case 1: __put_kernel_asm(data, b, addr, __pu_ret); break;   \
+   case 2: __put_kernel_asm(data, h, addr, __pu_ret); break;   \
+   case 4: __put_kernel_asm(data, w, addr, __pu_ret); break;   \
+   case 8: __put_kernel_asm(data, x, addr, __pu_ret); break;   \
+   default: __pu_ret = __put_user_bad(); break;\
+   }   \
+   if (__pu_ret)   \
+   goto label; \
+} while (0)
+
+#define __put_kernel_asm(x, size, addr, ret)   \
+__asm__ __volatile__(  \
+   "/* Put kernel asm, inline. */\n"   \
+   "1:\t"  "st"#size " %1, [%2]\n\t"   \
+   "clr%0\n"   \
+   "2:\n\n\t"  \
+   ".section .fixup,#alloc,#execinstr\n\t" \
+   ".align 4\n"\
+   "3:\n\t"\
+   "sethi  %%hi(2b), %0\n\t"   \
+   "jmpl   %0 + %%lo(2b), %%g0\n\t"\
+   " mov   %3, %0\n\n\t"   \
+   ".previous\n\t" \
+   ".section __ex_table,\"a\"\n\t" \
+   ".align 4\n\t"  \
+   ".word  1b, 3b\n\t" \
+   ".previous\n\n\t"   \
+  : "=r" (ret) : "r" (x), "r" (__m(addr)), \
+"i" (-EFAULT))
+
 #define __put_user_nocheck(data, addr, size) ({\
register int __pu_ret;  \
switch (size) { \
@@ -134,6 +170,48 @@ __asm__ __volatile__(  
\
 
 int __put_user_bad(void);
 
+#define __get_kernel_nofault(dst, src, type, label) \
+do {\
+   type *addr = (type __force *)(src);  \
+   register int __gu_ret;   \
+   register unsigned long __gu_val; \
+   switch (sizeof(type)) {  \
+   case 1: __get_kernel_asm(__gu_val, ub, addr, __gu_ret); break; \
+   case 2: __get_kernel_asm(__gu_val, uh, addr, __gu_ret); break; \
+   case 4: __get_kernel_asm(__gu_val, uw, addr, __gu_ret); break; \
+   case 8: __get_kernel_asm(__gu_val, x, addr, __gu_ret); break;  \
+   default: \
+   __gu_val = 0;\
+   __gu_ret = __get_user_bad(); \
+   break;   \
+   }\
+   if (__gu_ret)

[PATCH 01/14] uaccess: fix integer overflow on access_ok()

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

Three architectures check the end of a user access against the
address limit without taking a possible overflow into account.
Passing a negative length or another overflow in here returns
success when it should not.

Use the most common correct implementation here, which optimizes
for a constant 'size' argument, and turns the common case into a
single comparison.

Cc: sta...@vger.kernel.org
Fixes: da551281947c ("csky: User access")
Fixes: f663b60f5215 ("microblaze: Fix uaccess_ok macro")
Fixes: 7567746e1c0d ("Hexagon: Add user access functions")
Reported-by: David Laight 
Signed-off-by: Arnd Bergmann 
---
 arch/csky/include/asm/uaccess.h   |  7 +++
 arch/hexagon/include/asm/uaccess.h| 18 +-
 arch/microblaze/include/asm/uaccess.h | 19 ---
 3 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index c40f06ee8d3e..ac5a54f57d40 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -3,14 +3,13 @@
 #ifndef __ASM_CSKY_UACCESS_H
 #define __ASM_CSKY_UACCESS_H
 
-#define user_addr_max() \
-   (uaccess_kernel() ? KERNEL_DS.seg : get_fs().seg)
+#define user_addr_max() (current_thread_info()->addr_limit.seg)
 
 static inline int __access_ok(unsigned long addr, unsigned long size)
 {
-   unsigned long limit = current_thread_info()->addr_limit.seg;
+   unsigned long limit = user_addr_max();
 
-   return ((addr < limit) && ((addr + size) < limit));
+   return (size <= limit) && (addr <= (limit - size));
 }
 #define __access_ok __access_ok
 
diff --git a/arch/hexagon/include/asm/uaccess.h 
b/arch/hexagon/include/asm/uaccess.h
index ef5bfef8d490..719ba3f3c45c 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -25,17 +25,17 @@
  * Returns true (nonzero) if the memory block *may* be valid, false (zero)
  * if it is definitely invalid.
  *
- * User address space in Hexagon, like x86, goes to 0xbfff, so the
- * simple MSB-based tests used by MIPS won't work.  Some further
- * optimization is probably possible here, but for now, keep it
- * reasonably simple and not *too* slow.  After all, we've got the
- * MMU for backup.
  */
+#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
+#define user_addr_max() (uaccess_kernel() ? ~0UL : TASK_SIZE)
 
-#define __access_ok(addr, size) \
-   ((get_fs().seg == KERNEL_DS.seg) || \
-   (((unsigned long)addr < get_fs().seg) && \
- (unsigned long)size < (get_fs().seg - (unsigned long)addr)))
+static inline int __access_ok(unsigned long addr, unsigned long size)
+{
+   unsigned long limit = TASK_SIZE;
+
+   return (size <= limit) && (addr <= (limit - size));
+}
+#define __access_ok __access_ok
 
 /*
  * When a kernel-mode page fault is taken, the faulting instruction
diff --git a/arch/microblaze/include/asm/uaccess.h 
b/arch/microblaze/include/asm/uaccess.h
index d2a8ef9f8978..5b6e0e7788f4 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -39,24 +39,13 @@
 
 # define uaccess_kernel()  (get_fs().seg == KERNEL_DS.seg)
 
-static inline int access_ok(const void __user *addr, unsigned long size)
+static inline int __access_ok(unsigned long addr, unsigned long size)
 {
-   if (!size)
-   goto ok;
+   unsigned long limit = user_addr_max();
 
-   if ((get_fs().seg < ((unsigned long)addr)) ||
-   (get_fs().seg < ((unsigned long)addr + size - 1))) {
-   pr_devel("ACCESS fail at 0x%08x (size 0x%x), seg 0x%08x\n",
-   (__force u32)addr, (u32)size,
-   (u32)get_fs().seg);
-   return 0;
-   }
-ok:
-   pr_devel("ACCESS OK at 0x%08x (size 0x%x), seg 0x%08x\n",
-   (__force u32)addr, (u32)size,
-   (u32)get_fs().seg);
-   return 1;
+   return (size <= limit) && (addr <= (limit - size));
 }
+#define access_ok(addr, size) __access_ok((unsigned long)addr, size)
 
 # define __FIXUP_SECTION   ".section .fixup,\"ax\"\n"
 # define __EX_TABLE_SECTION".section __ex_table,\"a\"\n"
-- 
2.29.2

[PATCH 00/14] clean up asm/uaccess.h, kill set_fs for good

2022-02-14 Thread Arnd Bergmann

From: Arnd Bergmann 

Christoph Hellwig and a few others spent a huge effort on removing
set_fs() from most of the important architectures, but about half the
other architectures were never completed even though most of them don't
actually use set_fs() at all.

I did a patch for microblaze at some point, which turned out to be fairly
generic, and now ported it to most other architectures, using new generic
implementations of access_ok() and __{get,put}_kernel_nocheck().

Three architectures (sparc64, ia64, and sh) needed some extra work,
which I also completed.

The final series contains extra cleanup changes that touch all
architectures. Please review and test these, so we can merge them
for v5.18.

The series is available at
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=set_fs
for testing.

   Arnd

Arnd Bergmann (14):
  uaccess: fix integer overflow on access_ok()
  sparc64: add __{get,put}_kernel_nocheck()
  nds32: fix access_ok() checks in get/put_user
  x86: use more conventional access_ok() definition
  uaccess: add generic __{get,put}_kernel_nofault
  mips: use simpler access_ok()
  uaccess: generalize access_ok()
  arm64: simplify access_ok()
  m68k: drop custom __access_ok()
  uaccess: remove most CONFIG_SET_FS users
  sparc64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  ia64: remove CONFIG_SET_FS support
  uaccess: drop set_fs leftovers

 arch/Kconfig  |   3 -
 arch/alpha/Kconfig|   1 -
 arch/alpha/include/asm/processor.h|   4 -
 arch/alpha/include/asm/thread_info.h  |   2 -
 arch/alpha/include/asm/uaccess.h  |  53 +
 arch/arc/Kconfig  |   1 -
 arch/arc/include/asm/segment.h|  20 
 arch/arc/include/asm/thread_info.h|   3 -
 arch/arc/include/asm/uaccess.h|  30 -
 arch/arm/include/asm/uaccess.h|  22 +---
 arch/arm/kernel/swp_emulate.c |   2 +-
 arch/arm/kernel/traps.c   |   2 +-
 arch/arm/lib/uaccess_with_memcpy.c|  10 --
 arch/arm64/include/asm/uaccess.h  |  29 +
 arch/csky/Kconfig |   1 -
 arch/csky/include/asm/processor.h |   2 -
 arch/csky/include/asm/segment.h   |  10 --
 arch/csky/include/asm/thread_info.h   |   2 -
 arch/csky/include/asm/uaccess.h   |  12 --
 arch/csky/kernel/asm-offsets.c|   1 -
 arch/csky/kernel/signal.c |   2 +-
 arch/h8300/Kconfig|   1 -
 arch/h8300/include/asm/processor.h|   1 -
 arch/h8300/include/asm/segment.h  |  40 ---
 arch/h8300/include/asm/thread_info.h  |   3 -
 arch/h8300/kernel/entry.S |   1 -
 arch/h8300/kernel/head_ram.S  |   1 -
 arch/h8300/mm/init.c  |   6 -
 arch/h8300/mm/memory.c|   1 -
 arch/hexagon/Kconfig  |   1 -
 arch/hexagon/include/asm/thread_info.h|   6 -
 arch/hexagon/include/asm/uaccess.h|  25 
 arch/hexagon/kernel/process.c |   1 -
 arch/ia64/Kconfig |   1 -
 arch/ia64/include/asm/processor.h |   4 -
 arch/ia64/include/asm/thread_info.h   |   2 -
 arch/ia64/include/asm/uaccess.h   |  26 ++---
 arch/ia64/kernel/unaligned.c  |  60 ++
 arch/m68k/include/asm/uaccess.h   |  14 +--
 arch/microblaze/Kconfig   |   1 -
 arch/microblaze/include/asm/thread_info.h |   6 -
 arch/microblaze/include/asm/uaccess.h |  43 +--
 arch/microblaze/kernel/asm-offsets.c  |   1 -
 arch/microblaze/kernel/process.c  |   1 -
 arch/mips/include/asm/uaccess.h   |  47 +---
 arch/nds32/Kconfig|   1 -
 arch/nds32/include/asm/thread_info.h  |   4 -
 arch/nds32/include/asm/uaccess.h  |  40 +++
 arch/nds32/kernel/process.c   |   5 +-
 arch/nds32/mm/alignment.c |   3 -
 arch/nios2/Kconfig|   1 -
 arch/nios2/include/asm/thread_info.h  |   9 --
 arch/nios2/include/asm/uaccess.h  |  23 +---
 arch/nios2/kernel/signal.c|  20 ++--
 arch/openrisc/Kconfig |   1 -
 arch/openrisc/include/asm/thread_info.h   |   7 --
 arch/openrisc/include/asm/uaccess.h   |  42 +--
 arch/parisc/include/asm/futex.h   |   2 +-
 arch/parisc/include/asm/uaccess.h |  11 +-
 arch/parisc/lib/memcpy.c  |   2 +-
 arch/powerpc/include/asm/uaccess.h|  13 +--
 arch/powerpc/lib/sstep.c  |   4 +-
 arch/riscv/include/asm/uaccess.h  |  33 +-
 arch/riscv/kernel/perf_callchain.c|   2 +-
 arch/s390/include/asm/uaccess.h   |  13 +--
 arch/sh/Kconfig   |   1 -
 arch/sh/include/asm/processor.h   |   1 -
 arch/sh/include/asm/segment.h |  33 --

Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1

2022-02-14 Thread Ahmad Fatoum

Hello Tokunori-san,

On 13.02.22 17:47, Tokunori Ikegami wrote:
> Hi Ahmad-san,
> 
> Thanks for your confirmations. Sorry for late to reply.

No worries. I appreciate you taking the time.

> Could you please try the patch attached to disable the chip_good() change as 
> before?
> I think this should work for S29GL964N since the chip_ready() is used and 
> works as mentioned.

yes, this resolves my issue:
Tested-by: Ahmad Fatoum 

 Doesn't seem to be a buffered write issue here though as the writes
 did work fine before dfeae1073583. Any other ideas?
>>> At first I thought the issue is possible to be resolved by using the word 
>>> write instead of the buffered writes.
>>> Now I am thinking to disable the changes dfeae1073583 partially with any 
>>> condition if possible.
>> What seems to work for me is checking if chip_good or chip_ready
>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>> (Worst case bus is floating at this point of time and Hi-Z is read
>> as 0xff on CPU data lines...)
> 
> Sorry I am not sure about this.
> I thought the chip_ready() itself is correct as implemented as the data sheet 
> in the past.
> But it did not work correctly so changed to use chip_good() instead as it is 
> also correct.

What exactly in the datasheet makes you believe chip_good is not appropriate?

Cheers,
Ahmad


-- 
Pengutronix e.K.   | |
Steuerwalder Str. 21   | http://www.pengutronix.de/  |
31137 Hildesheim, Germany  | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

Re: [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity

2022-02-14 Thread Zi Yan

On 14 Feb 2022, at 2:59, Christophe Leroy wrote:

> Le 11/02/2022 à 17:41, Zi Yan a écrit :
>> From: Zi Yan 
>>
>> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
>> pageblocks with different migratetypes. It might unnecessarily convert
>> extra pageblocks at the beginning and at the end of the range. Change
>> alloc_contig_range() to work at pageblock granularity.
>>
>> Special handling is needed for free pages and in-use pages across the
>> boundaries of the range specified alloc_contig_range(). Because these
>> partially isolated pages causes free page accounting issues. The free
>> pages will be split and freed into separate migratetype lists; the
>> in-use pages will be migrated then the freed pages will be handled.
>>
>> Signed-off-by: Zi Yan 
>> ---
>>   include/linux/page-isolation.h |   2 +-
>>   mm/internal.h  |   3 +
>>   mm/memory_hotplug.c|   3 +-
>>   mm/page_alloc.c| 235 +
>>   mm/page_isolation.c|  33 -
>>   5 files changed, 211 insertions(+), 65 deletions(-)
>>
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index 4ef7be6def83..78ff940cc169 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -54,7 +54,7 @@ int move_freepages_block(struct zone *zone, struct page 
>> *page,
>>*/
>>   int
>>   start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> - unsigned migratetype, int flags);
>> + unsigned migratetype, int flags, gfp_t gfp_flags);
>>
>>   /*
>>* Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 0d240e876831..509cbdc25992 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -319,6 +319,9 @@ isolate_freepages_range(struct compact_control *cc,
>>   int
>>   isolate_migratepages_range(struct compact_control *cc,
>> unsigned long low_pfn, unsigned long end_pfn);
>> +
>> +int
>> +isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int 
>> isolate_before_boundary);
>>   #endif
>>   int find_suitable_fallback(struct free_area *area, unsigned int order,
>>  int migratetype, bool only_stealable, bool *can_steal);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index ce68098832aa..82406d2f3e46 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1863,7 +1863,8 @@ int __ref offline_pages(unsigned long start_pfn, 
>> unsigned long nr_pages,
>>  /* set above range as isolated */
>>  ret = start_isolate_page_range(start_pfn, end_pfn,
>> MIGRATE_MOVABLE,
>> -   MEMORY_OFFLINE | REPORT_FAILURE);
>> +   MEMORY_OFFLINE | REPORT_FAILURE,
>> +   GFP_USER | __GFP_MOVABLE | 
>> __GFP_RETRY_MAYFAIL);
>>  if (ret) {
>>  reason = "failure to isolate range";
>>  goto failed_removal_pcplists_disabled;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 62ef78f3d771..7a4fa21aea5c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8985,7 +8985,7 @@ static inline void alloc_contig_dump_pages(struct 
>> list_head *page_list)
>>   #endif
>>
>>   /* [start, end) must belong to a single zone. */
>> -static int __alloc_contig_migrate_range(struct compact_control *cc,
>> +int __alloc_contig_migrate_range(struct compact_control *cc,
>>  unsigned long start, unsigned long end)
>>   {
>>  /* This function is based on compact_zone() from compaction.c. */
>> @@ -9043,6 +9043,167 @@ static int __alloc_contig_migrate_range(struct 
>> compact_control *cc,
>>  return 0;
>>   }
>>
>> +/**
>> + * split_free_page() -- split a free page at split_pfn_offset
>> + * @free_page:  the original free page
>> + * @order:  the order of the page
>> + * @split_pfn_offset:   split offset within the page
>> + *
>> + * It is used when the free page crosses two pageblocks with different 
>> migratetypes
>> + * at split_pfn_offset within the page. The split free page will be put into
>> + * separate migratetype lists afterwards. Otherwise, the function achieves
>> + * nothing.
>> + */
>> +static inline void split_free_page(struct page *free_page,
>> +int order, unsigned long split_pfn_offset)
>> +{
>> +struct zone *zone = page_zone(free_page);
>> +unsigned long free_page_pfn = page_to_pfn(free_page);
>> +unsigned long pfn;
>> +unsigned long flags;
>> +int free_page_order;
>> +
>> +spin_lock_irqsave(>lock, flags);
>> +del_page_from_free_list(free_page, zone, order);
>> +for (pfn = free_page_pfn;
>> + pfn < free_page_pfn + (1UL << order);) {
>> +int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> +

Re: [PATCH v5 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-02-14 Thread Michal Suchánek

Hello,

On Mon, Feb 14, 2022 at 10:14:16AM -0500, Mimi Zohar wrote:
> Hi Michal,
> 
> On Sun, 2022-02-13 at 21:59 -0500, Mimi Zohar wrote:
> 
> > 
> > On Tue, 2022-01-11 at 12:37 +0100, Michal Suchanek wrote:
> > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > index dea74d7717c0..1cde9b6c5987 100644
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -560,6 +560,22 @@ config KEXEC_FILE
> > >  config ARCH_HAS_KEXEC_PURGATORY
> > > def_bool KEXEC_FILE
> > >  
> > > +config KEXEC_SIG
> > > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> > > +   help
> > > + This option makes kernel signature verification mandatory for

This is actually wrong. KEXEC_SIG makes it mandatory that any signature
that is appended is valid and made by a key that is part of the platform
keyiring (which is also wrong, built-in keys should be also accepted).
KEXEC_SIG_FORCE or an IMA policy makes it mandatory that the signature
is present.

> > > + the kexec_file_load() syscall.
> > 
> > When KEXEC_SIG is enabled on other architectures, IMA does not define a
> > kexec 'appraise' policy rule.  Refer to the policy rules in
> > security/ima/ima_efi.c.  Similarly the kexec 'appraise' policy rule in

I suppose you mean security/integrity/ima/ima_efi.c

I also think it's misguided because KEXEC_SIG in itself does not enforce
the signature. KEXEC_SIG_FORCE does.

> > arch/powerpc/kernel/ima_policy.c should not be defined.

I suppose you mean arch/powerpc/kernel/ima_arch.c - see above.


Thanks for taking the time to reseach and summarize the differences.

> The discussion shouldn't only be about IMA vs. KEXEC_SIG kernel image
> signature verification.  Let's try and reframe the problem a bit.
> 
> 1. Unify and simply the existing kexec signature verification so
> verifying the KEXEC kernel image signature works irrespective of
> signature type - PE, appended signature.
> 
> solution: enable KEXEC_SIG  (This patch set, with the above powerpc IMA
> policy changes.)
> 
> 2. Measure and include the kexec kernel image in a log for attestation,
> if desired.
> 
> solution: enable IMA_ARCH_POLICY 
> - Powerpc: requires trusted boot to be enabled.
> - EFI:   requires  secure boot to be enabled.  The IMA efi policy
> doesn't differentiate between secure and trusted boot.
> 
> 3. Carry the kexec kernel image measurement across kexec, if desired
> and supported on the architecture.
> 
> solution: enable IMA_KEXEC
> 
> Comparison: 
> - Are there any differences between IMA vs. KEXEC_SIG measuring the
> kexec kernel image?
> 
> One of the main differences is "what" is included in the measurement
> list differs.  In both cases, the 'd-ng' field of the IMA measurement
> list template (e.g. ima-ng, ima-sig, ima-modsig) is the full file hash
> including the appended signature.  With IMA and the 'ima-modsig'
> template, an additional hash without the appended signature is defined,
> as well as including the appended signature in the 'sig' field.
> 
> Including the file hash and appended signature in the measurement list
> allows an attestation server, for example, to verify the appended
> signature without having to know the file hash without the signature.

I don't understand this part. Isn't the hash *with* signature always
included, and the distinguishing part about IMA is the hash *without*
signature which is the same irrespective of signature type (PE, appended
xattr) and irrespective of the keyt used for signoing?

> Other differences are already included in the Kconfig KEXEC_SIG "Notes"
> section.

Which besides what is already described above would be blacklisting
specific binaries, which is much more effective if you have hashes of
binaries without signature.

Thanks

Michal

Re: [PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity

2022-02-14 Thread Zi Yan

On 14 Feb 2022, at 2:26, Christoph Hellwig wrote:

>> +int
>> +isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int 
>> isolate_before_boundary);
>
> Please avoid the completely unreadably long line. i.e.
>
> int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   int isolate_before_boundary);
>
> Same in various other spots.

OK. Thanks for pointing it out. checkpatch.pl did not report any
warning about this. It seems that the column limit has been relaxed
to 100. Anyway, I will make it shorter.

--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-14 Thread Naveen N. Rao


Christophe Leroy wrote:

Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
of livepatching.

Also note that powerpc being the last one to convert to
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
klp_arch_set_pc() on all architectures.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/ftrace.h| 17 +
 arch/powerpc/include/asm/livepatch.h |  4 +---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cdac2115eb00..e2b1792b2aae 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -210,6 +210,7 @@ config PPC
select HAVE_DEBUG_KMEMLEAK
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_ARGSif MPROFILE_KERNEL || PPC32
select HAVE_DYNAMIC_FTRACE_WITH_REGSif MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS  if !(CPU_LITTLE_ENDIAN && 
POWER7_CPU)
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index b3f6184f77ea..45c3d6f11daa 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -22,6 +22,23 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
 struct dyn_arch_ftrace {
struct module *mod;
 };
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
+struct ftrace_regs {
+   struct pt_regs regs;
+};
+
+static __always_inline struct pt_regs *arch_ftrace_get_regs(struct ftrace_regs 
*fregs)
+{
+   return >regs;
+}


I think this is wrong. We need to differentiate between ftrace_caller() 
and ftrace_regs_caller() here, and only return pt_regs if coming in 
through ftrace_regs_caller() (i.e., FL_SAVE_REGS is set).



+
+static __always_inline void ftrace_instruction_pointer_set(struct ftrace_regs 
*fregs,
+  unsigned long ip)
+{
+   regs_set_return_ip(>regs, ip);


Should we use that helper here? regs_set_return_ip() also updates some 
other state related to taking interrupts and I don't think it makes 
sense for use with ftrace.



- Naveen

Re: [PATCH v2 08/13] powerpc/ftrace: Prepare PPC64's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-14 Thread Naveen N. Rao


Hi Christophe,
Thanks for your work enabling DYNAMIC_FTRACE_WITH_ARGS on powerpc. Sorry 
for the late review on this series, but I have a few comments below.



Christophe Leroy wrote:

In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change ftrace_caller()
to handle LIVEPATCH the same way as frace_caller_regs().

Signed-off-by: Christophe Leroy 
---
 .../powerpc/kernel/trace/ftrace_64_mprofile.S | 25 ++-
 1 file changed, 19 insertions(+), 6 deletions(-)


I think we also need to save r1 into pt_regs so that the stack pointer 
is available in the callbacks.


Other than that, a few minor nits below...



diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
index d636fc755f60..f6f787819273 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
@@ -172,14 +172,19 @@ _GLOBAL(ftrace_caller)
addir3, r3, function_trace_op@toc@l
ld  r5, 0(r3)
 
+#ifdef CONFIG_LIVEPATCH_64

+   SAVE_GPR(14, r1)
+   mr  r14,r7  /* remember old NIP */

   ^ add a space

+#endif


Please add a blank line here, to match the formatting for the rest of 
this file.



/* Calculate ip from nip-4 into r3 for call below */
subir3, r7, MCOUNT_INSN_SIZE
 
 	/* Put the original return address in r4 as parent_ip */

+   std r0, _LINK(r1)
mr  r4, r0
 
-	/* Set pt_regs to NULL */

-   li  r6, 0
+   /* Load _regs in r6 for call below */
+   addir6, r1 ,STACK_FRAME_OVERHEAD

 ^^ incorrect spacing
 
 	/* ftrace_call(r3, r4, r5, r6) */

 .globl ftrace_call
@@ -189,6 +194,10 @@ ftrace_call:
 
 	ld	r3, _NIP(r1)

mtctr   r3


Another blank line here.


+#ifdef CONFIG_LIVEPATCH_64
+   cmpdr14, r3 /* has NIP been altered? */
+   REST_GPR(14, r1)
+#endif
 
 	/* Restore gprs */

REST_GPRS(3, 10, r1)
@@ -196,13 +205,17 @@ ftrace_call:
/* Restore callee's TOC */
ld  r2, 24(r1)
 
+	/* Restore possibly modified LR */

+   ld  r0, _LINK(r1)
+   mtlrr0
+
/* Pop our stack frame */
addir1, r1, SWITCH_FRAME_SIZE
 
-	/* Reload original LR */

-   ld  r0, LRSAVE(r1)
-   mtlrr0
-
+#ifdef CONFIG_LIVEPATCH_64
+/* Based on the cmpd above, if the NIP was altered handle livepatch */
+   bne-livepatch_handler
+#endif


Here too.


/* Handle function_graph or go back */
b   ftrace_caller_common
 



- Naveen

Re: [PATCH v5 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-02-14 Thread Mimi Zohar

Hi Michal,

On Sun, 2022-02-13 at 21:59 -0500, Mimi Zohar wrote:

> 
> On Tue, 2022-01-11 at 12:37 +0100, Michal Suchanek wrote:
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index dea74d7717c0..1cde9b6c5987 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -560,6 +560,22 @@ config KEXEC_FILE
> >  config ARCH_HAS_KEXEC_PURGATORY
> > def_bool KEXEC_FILE
> >  
> > +config KEXEC_SIG
> > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> > +   help
> > + This option makes kernel signature verification mandatory for
> > + the kexec_file_load() syscall.
> 
> When KEXEC_SIG is enabled on other architectures, IMA does not define a
> kexec 'appraise' policy rule.  Refer to the policy rules in
> security/ima/ima_efi.c.  Similarly the kexec 'appraise' policy rule in
> arch/powerpc/kernel/ima_policy.c should not be defined.

The discussion shouldn't only be about IMA vs. KEXEC_SIG kernel image
signature verification.  Let's try and reframe the problem a bit.

1. Unify and simply the existing kexec signature verification so
verifying the KEXEC kernel image signature works irrespective of
signature type - PE, appended signature.

solution: enable KEXEC_SIG  (This patch set, with the above powerpc IMA
policy changes.)

2. Measure and include the kexec kernel image in a log for attestation,
if desired.

solution: enable IMA_ARCH_POLICY 
- Powerpc: requires trusted boot to be enabled.
- EFI:   requires  secure boot to be enabled.  The IMA efi policy
doesn't differentiate between secure and trusted boot.

3. Carry the kexec kernel image measurement across kexec, if desired
and supported on the architecture.

solution: enable IMA_KEXEC

Comparison: 
- Are there any differences between IMA vs. KEXEC_SIG measuring the
kexec kernel image?

One of the main differences is "what" is included in the measurement
list differs.  In both cases, the 'd-ng' field of the IMA measurement
list template (e.g. ima-ng, ima-sig, ima-modsig) is the full file hash
including the appended signature.  With IMA and the 'ima-modsig'
template, an additional hash without the appended signature is defined,
as well as including the appended signature in the 'sig' field.

Including the file hash and appended signature in the measurement list
allows an attestation server, for example, to verify the appended
signature without having to know the file hash without the signature.

Other differences are already included in the Kconfig KEXEC_SIG "Notes"
section.

-- 
thanks,

Mimi

Re: No Linux logs when doing `ppc64_cpu --smt=off/8`

2022-02-14 Thread Michal Suchánek

On Mon, Feb 14, 2022 at 01:33:24PM +0100, Paul Menzel wrote:
> Dear Michal,
> 
> 
> Thank you for your reply.
> 
> Am 14.02.22 um 10:43 schrieb Michal Suchánek:
> 
> > On Mon, Feb 14, 2022 at 07:08:07AM +0100, Paul Menzel wrote:
> > > Dear PPC folks,
> > > 
> > > 
> > > On the POWER8 server IBM S822LC running `ppc64_cpu --smt=off` or 
> > > `ppc64_cpu
> > > --smt=8`, Linux 5.17-rc4 does not log anything. I would have expected a
> > > message about the change in number of processing units.
> > 
> > IIRC it was considered too noisy for systems with many CPUs and the
> > message was dropped. You can always check the resulting state with
> > ppc64_cpu or examining sysfs.
> 
> Yes, simple `nproc` suffice, but I was more thinking about, that the Linux
> log is often used for debugging and the changes of amount of processing
> units might be good to have. `ppc64_cpu --smt=off` or `=8` seems to block
> for quite some time, and each thread/processing unit seems to powered
> down/on sequentially, so it takes quite some time and it blocks. So 140
> messages would indeed be quite noise. No idea how `ppc64_cpu` works, and if
> it could log a message at the beginning and end.

Yes, it enables/disables threads one by one. AFAICT the kernel cannot know that
ppc64_cpu will enable/disable more threads later, it can either log each
or none. Rate limiting would not show the whole picture so it's not
great solution either.

Thanks

Michal

Re: No Linux logs when doing `ppc64_cpu --smt=off/8`

2022-02-14 Thread Paul Menzel


Dear Michal,


Thank you for your reply.

Am 14.02.22 um 10:43 schrieb Michal Suchánek:


On Mon, Feb 14, 2022 at 07:08:07AM +0100, Paul Menzel wrote:

Dear PPC folks,


On the POWER8 server IBM S822LC running `ppc64_cpu --smt=off` or `ppc64_cpu
--smt=8`, Linux 5.17-rc4 does not log anything. I would have expected a
message about the change in number of processing units.


IIRC it was considered too noisy for systems with many CPUs and the
message was dropped. You can always check the resulting state with
ppc64_cpu or examining sysfs.


Yes, simple `nproc` suffice, but I was more thinking about, that the 
Linux log is often used for debugging and the changes of amount of 
processing units might be good to have. `ppc64_cpu --smt=off` or `=8` 
seems to block for quite some time, and each thread/processing unit 
seems to powered down/on sequentially, so it takes quite some time and 
it blocks. So 140 messages would indeed be quite noise. No idea how 
`ppc64_cpu` works, and if it could log a message at the beginning and end.



Kind regards,

Paul

Re: [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines

2022-02-14 Thread Naveen N. Rao


Christophe Leroy wrote:



Le 07/02/2022 à 08:07, Naveen N. Rao a écrit :

This is an early RFC series that adds support for BPF Trampolines on
powerpc64. Some of the selftests are passing for me, but this needs more
testing and I've likely missed a few things as well. A review of the
patches and feedback about the overall approach will be great.

This series depends on some of the other BPF JIT fixes and enhancements
posted previously, as well as on ftrace direct enablement on powerpc
which has also been posted in the past.


Is there any reason to limit this to powerpc64 ?


I have limited this to elf v2, and we won't be able to get this working 
on elf v1, since we don't have DYNAMIC_FTRACE_WITH_REGS supported there. 
We should be able to get this working on ppc32 though.



- Naveen

Re: [PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c

2022-02-14 Thread Mike Rapoport

On Fri, Feb 11, 2022 at 11:41:30AM -0500, Zi Yan wrote:
> From: Zi Yan 
> 
> has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
> mm/page_alloc.c and make it static.
> 
> Signed-off-by: Zi Yan 
> Reviewed-by: Oscar Salvador 

Reviewed-by: Mike Rapoport 

> ---
>  include/linux/page-isolation.h |   2 -
>  mm/page_alloc.c| 119 -
>  mm/page_isolation.c| 119 +
>  3 files changed, 119 insertions(+), 121 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 572458016331..e14eddf6741a 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
>  #define MEMORY_OFFLINE   0x1
>  #define REPORT_FAILURE   0x2
>  
> -struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> -  int migratetype, int flags);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
>  int move_freepages_block(struct zone *zone, struct page *page,
>   int migratetype, int *num_movable);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cface1d38093..e2c6a67fc386 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8962,125 +8962,6 @@ void *__init alloc_large_system_hash(const char 
> *tablename,
>   return table;
>  }
>  
> -/*
> - * This function checks whether pageblock includes unmovable pages or not.
> - *
> - * PageLRU check without isolation or lru_lock could race so that
> - * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
> - * check without lock_page also may miss some movable non-lru pages at
> - * race condition. So you can't expect this function should be exact.
> - *
> - * Returns a page without holding a reference. If the caller wants to
> - * dereference that page (e.g., dumping), it has to make sure that it
> - * cannot get removed (e.g., via memory unplug) concurrently.
> - *
> - */
> -struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> -  int migratetype, int flags)
> -{
> - unsigned long iter = 0;
> - unsigned long pfn = page_to_pfn(page);
> - unsigned long offset = pfn % pageblock_nr_pages;
> -
> - if (is_migrate_cma_page(page)) {
> - /*
> -  * CMA allocations (alloc_contig_range) really need to mark
> -  * isolate CMA pageblocks even when they are not movable in fact
> -  * so consider them movable here.
> -  */
> - if (is_migrate_cma(migratetype))
> - return NULL;
> -
> - return page;
> - }
> -
> - for (; iter < pageblock_nr_pages - offset; iter++) {
> - page = pfn_to_page(pfn + iter);
> -
> - /*
> -  * Both, bootmem allocations and memory holes are marked
> -  * PG_reserved and are unmovable. We can even have unmovable
> -  * allocations inside ZONE_MOVABLE, for example when
> -  * specifying "movablecore".
> -  */
> - if (PageReserved(page))
> - return page;
> -
> - /*
> -  * If the zone is movable and we have ruled out all reserved
> -  * pages then it should be reasonably safe to assume the rest
> -  * is movable.
> -  */
> - if (zone_idx(zone) == ZONE_MOVABLE)
> - continue;
> -
> - /*
> -  * Hugepages are not in LRU lists, but they're movable.
> -  * THPs are on the LRU, but need to be counted as #small pages.
> -  * We need not scan over tail pages because we don't
> -  * handle each tail page individually in migration.
> -  */
> - if (PageHuge(page) || PageTransCompound(page)) {
> - struct page *head = compound_head(page);
> - unsigned int skip_pages;
> -
> - if (PageHuge(page)) {
> - if 
> (!hugepage_migration_supported(page_hstate(head)))
> - return page;
> - } else if (!PageLRU(head) && !__PageMovable(head)) {
> - return page;
> - }
> -
> - skip_pages = compound_nr(head) - (page - head);
> - iter += skip_pages - 1;
> - continue;
> - }
> -
> - /*
> -  * We can't use page_count without pin a page
> -  * because another CPU can free compound page.
> -  * This check already skips compound tails of THP
> -  * because their page->_refcount is zero at all time.
> -  */
> - if (!page_ref_count(page)) {
> -

[PATCH powerpc/next 17/17] powerpc/bpf: Simplify bpf_to_ppc() and adopt it for powerpc64

2022-02-14 Thread Naveen N. Rao

Convert bpf_to_ppc() to a macro to help simplify its usage since
codegen_context is available in all places it is used. Adopt it also for
powerpc64 for uniformity and get rid of the global b2p structure.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h| 11 ++--
 arch/powerpc/net/bpf_jit_comp.c   |  8 +--
 arch/powerpc/net/bpf_jit_comp32.c | 90 ++
 arch/powerpc/net/bpf_jit_comp64.c | 93 ---
 4 files changed, 98 insertions(+), 104 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index d9bdc9df2e48ed..c16271a456b343 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -122,12 +122,6 @@
 #define SEEN_VREG_MASK 0x1ff8 /* Volatile registers r3-r12 */
 #define SEEN_NVREG_MASK0x0003 /* Non volatile registers r14-r31 */
 
-#ifdef CONFIG_PPC64
-extern const int b2p[MAX_BPF_JIT_REG + 2];
-#else
-extern const int b2p[MAX_BPF_JIT_REG + 1];
-#endif
-
 struct codegen_context {
/*
 * This is used to track register usage as well
@@ -141,11 +135,13 @@ struct codegen_context {
unsigned int seen;
unsigned int idx;
unsigned int stack_size;
-   int b2p[ARRAY_SIZE(b2p)];
+   int b2p[MAX_BPF_JIT_REG + 2];
unsigned int exentry_idx;
unsigned int alt_exit_addr;
 };
 
+#define bpf_to_ppc(r)  (ctx->b2p[r])
+
 #ifdef CONFIG_PPC32
 #define BPF_FIXUP_LEN  3 /* Three instructions => 12 bytes */
 #else
@@ -173,6 +169,7 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
ctx->seen &= ~(1 << (31 - i));
 }
 
+void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
 int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
   u32 *addrs, int pass);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 635f7448ff7952..fc160d33c83960 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -72,13 +72,13 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 
*image,
tmp_idx = ctx->idx;
ctx->idx = addrs[i] / 4;
 #ifdef CONFIG_PPC32
-   PPC_LI32(ctx->b2p[insn[i].dst_reg] - 1, (u32)insn[i + 
1].imm);
-   PPC_LI32(ctx->b2p[insn[i].dst_reg], (u32)insn[i].imm);
+   PPC_LI32(bpf_to_ppc(insn[i].dst_reg) - 1, (u32)insn[i + 
1].imm);
+   PPC_LI32(bpf_to_ppc(insn[i].dst_reg), (u32)insn[i].imm);
for (j = ctx->idx - addrs[i] / 4; j < 4; j++)
EMIT(PPC_RAW_NOP());
 #else
func_addr = ((u64)(u32)insn[i].imm) | 
(((u64)(u32)insn[i + 1].imm) << 32);
-   PPC_LI64(b2p[insn[i].dst_reg], func_addr);
+   PPC_LI64(bpf_to_ppc(insn[i].dst_reg), func_addr);
/* overwrite rest with nops */
for (j = ctx->idx - addrs[i] / 4; j < 5; j++)
EMIT(PPC_RAW_NOP());
@@ -179,7 +179,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
}
 
memset(, 0, sizeof(struct codegen_context));
-   memcpy(cgctx.b2p, b2p, sizeof(cgctx.b2p));
+   bpf_jit_init_reg_mapping();
 
/* Make sure that the stack is quadword aligned. */
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index fe4e0eca017ede..7dc716cd64bcbc 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -33,42 +33,38 @@
 /* stack frame, ensure this is quadword aligned */
 #define BPF_PPC_STACKFRAME(ctx)(STACK_FRAME_MIN_SIZE + 
BPF_PPC_STACK_SAVE + (ctx)->stack_size)
 
-/* BPF register usage */
-#define TMP_REG(MAX_BPF_JIT_REG + 0)
-
 #define PPC_EX32(r, i) EMIT(PPC_RAW_LI((r), (i) < 0 ? -1 : 0))
 
+/* PPC NVR range -- update this if we ever use NVRs below r17 */
+#define BPF_PPC_NVR_MIN_R17
+#define BPF_PPC_TC _R16
+
+/* BPF register usage */
+#define TMP_REG(MAX_BPF_JIT_REG + 0)
+
 /* BPF to ppc register mappings */
-const int b2p[MAX_BPF_JIT_REG + 1] = {
+void bpf_jit_init_reg_mapping(struct codegen_context *ctx)
+{
/* function return value */
-   [BPF_REG_0] = _R12,
+   ctx->b2p[BPF_REG_0] = _R12;
/* function arguments */
-   [BPF_REG_1] = _R4,
-   [BPF_REG_2] = _R6,
-   [BPF_REG_3] = _R8,
-   [BPF_REG_4] = _R10,
-   [BPF_REG_5] = _R22,
+   ctx->b2p[BPF_REG_1] = _R4;
+   ctx->b2p[BPF_REG_2] = _R6;
+   ctx->b2p[BPF_REG_3] = _R8;
+   ctx->b2p[BPF_REG_4] = _R10;
+   ctx->b2p[BPF_REG_5] = _R22;
/* non volatile registers */
-   [BPF_REG_6] = _R24,

[PATCH powerpc/next 16/17] powerpc64/bpf: Store temp registers' bpf to ppc mapping

2022-02-14 Thread Naveen N. Rao

From: Jordan Niethe 

In bpf_jit_build_body(), the mapping of TMP_REG_1 and TMP_REG_2's bpf
register to ppc register is evalulated at every use despite not
changing. Instead, determine the ppc register once and store the result.

Signed-off-by: Jordan Niethe 
[Rebased, converted additional usage sites]
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 197 +-
 1 file changed, 86 insertions(+), 111 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index ac06efa7022379..b4de0c35c8a4ab 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -357,6 +357,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
u32 dst_reg = b2p[insn[i].dst_reg];
u32 src_reg = b2p[insn[i].src_reg];
u32 size = BPF_SIZE(code);
+   u32 tmp1_reg = b2p[TMP_REG_1];
+   u32 tmp2_reg = b2p[TMP_REG_2];
s16 off = insn[i].off;
s32 imm = insn[i].imm;
bool func_addr_fixed;
@@ -407,8 +409,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
} else if (imm >= -32768 && imm < 32768) {
EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, 
IMM_L(imm)));
} else {
-   PPC_LI32(b2p[TMP_REG_1], imm);
-   EMIT(PPC_RAW_ADD(dst_reg, dst_reg, 
b2p[TMP_REG_1]));
+   PPC_LI32(tmp1_reg, imm);
+   EMIT(PPC_RAW_ADD(dst_reg, dst_reg, tmp1_reg));
}
goto bpf_alu32_trunc;
case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
@@ -418,8 +420,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
} else if (imm > -32768 && imm <= 32768) {
EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, 
IMM_L(-imm)));
} else {
-   PPC_LI32(b2p[TMP_REG_1], imm);
-   EMIT(PPC_RAW_SUB(dst_reg, dst_reg, 
b2p[TMP_REG_1]));
+   PPC_LI32(tmp1_reg, imm);
+   EMIT(PPC_RAW_SUB(dst_reg, dst_reg, tmp1_reg));
}
goto bpf_alu32_trunc;
case BPF_ALU | BPF_MUL | BPF_X: /* (u32) dst *= (u32) src */
@@ -434,32 +436,28 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
if (imm >= -32768 && imm < 32768)
EMIT(PPC_RAW_MULI(dst_reg, dst_reg, 
IMM_L(imm)));
else {
-   PPC_LI32(b2p[TMP_REG_1], imm);
+   PPC_LI32(tmp1_reg, imm);
if (BPF_CLASS(code) == BPF_ALU)
-   EMIT(PPC_RAW_MULW(dst_reg, dst_reg,
-   b2p[TMP_REG_1]));
+   EMIT(PPC_RAW_MULW(dst_reg, dst_reg, 
tmp1_reg));
else
-   EMIT(PPC_RAW_MULD(dst_reg, dst_reg,
-   b2p[TMP_REG_1]));
+   EMIT(PPC_RAW_MULD(dst_reg, dst_reg, 
tmp1_reg));
}
goto bpf_alu32_trunc;
case BPF_ALU | BPF_DIV | BPF_X: /* (u32) dst /= (u32) src */
case BPF_ALU | BPF_MOD | BPF_X: /* (u32) dst %= (u32) src */
if (BPF_OP(code) == BPF_MOD) {
-   EMIT(PPC_RAW_DIVWU(b2p[TMP_REG_1], dst_reg, 
src_reg));
-   EMIT(PPC_RAW_MULW(b2p[TMP_REG_1], src_reg,
-   b2p[TMP_REG_1]));
-   EMIT(PPC_RAW_SUB(dst_reg, dst_reg, 
b2p[TMP_REG_1]));
+   EMIT(PPC_RAW_DIVWU(tmp1_reg, dst_reg, src_reg));
+   EMIT(PPC_RAW_MULW(tmp1_reg, src_reg, tmp1_reg));
+   EMIT(PPC_RAW_SUB(dst_reg, dst_reg, tmp1_reg));
} else
EMIT(PPC_RAW_DIVWU(dst_reg, dst_reg, src_reg));
goto bpf_alu32_trunc;
case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
if (BPF_OP(code) == BPF_MOD) {
-   EMIT(PPC_RAW_DIVDU(b2p[TMP_REG_1], dst_reg, 
src_reg));
-   EMIT(PPC_RAW_MULD(b2p[TMP_REG_1], src_reg,
-   b2p[TMP_REG_1]));
-

[PATCH powerpc/next 14/17] powerpc/bpf: Move bpf_jit64.h into bpf_jit_comp64.c

2022-02-14 Thread Naveen N. Rao

There is no need for a separate header anymore. Move the contents of
bpf_jit64.h into bpf_jit_comp64.c

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit64.h  | 69 ---
 arch/powerpc/net/bpf_jit_comp64.c | 54 +++-
 2 files changed, 53 insertions(+), 70 deletions(-)
 delete mode 100644 arch/powerpc/net/bpf_jit64.h

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
deleted file mode 100644
index 199348b7296653..00
--- a/arch/powerpc/net/bpf_jit64.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * bpf_jit64.h: BPF JIT compiler for PPC64
- *
- * Copyright 2016 Naveen N. Rao 
- *   IBM Corporation
- */
-#ifndef _BPF_JIT64_H
-#define _BPF_JIT64_H
-
-#include "bpf_jit.h"
-
-/*
- * Stack layout:
- * Ensure the top half (upto local_tmp_var) stays consistent
- * with our redzone usage.
- *
- * [   prev sp ] <-
- * [   nv gpr save area] 5*8   |
- * [tail_call_cnt  ] 8 |
- * [local_tmp_var  ] 16|
- * fp (r31) -->[   ebpf stack space] upto 512  |
- * [ frame header  ] 32/112|
- * sp (r1) --->[stack pointer  ] --
- */
-
-/* for gpr non volatile registers BPG_REG_6 to 10 */
-#define BPF_PPC_STACK_SAVE (5*8)
-/* for bpf JIT code internal usage */
-#define BPF_PPC_STACK_LOCALS   24
-/* stack frame excluding BPF stack, ensure this is quadword aligned */
-#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
-BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
-
-#ifndef __ASSEMBLY__
-
-/* BPF register usage */
-#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)
-#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)
-
-/* BPF to ppc register mappings */
-const int b2p[MAX_BPF_JIT_REG + 2] = {
-   /* function return value */
-   [BPF_REG_0] = 8,
-   /* function arguments */
-   [BPF_REG_1] = 3,
-   [BPF_REG_2] = 4,
-   [BPF_REG_3] = 5,
-   [BPF_REG_4] = 6,
-   [BPF_REG_5] = 7,
-   /* non volatile registers */
-   [BPF_REG_6] = 27,
-   [BPF_REG_7] = 28,
-   [BPF_REG_8] = 29,
-   [BPF_REG_9] = 30,
-   /* frame pointer aka BPF_REG_10 */
-   [BPF_REG_FP] = 31,
-   /* eBPF jit internal registers */
-   [BPF_REG_AX] = 12,
-   [TMP_REG_1] = 9,
-   [TMP_REG_2] = 10
-};
-
-/* PPC NVR range -- update this if we ever use NVRs below r27 */
-#define BPF_PPC_NVR_MIN27
-
-#endif /* !__ASSEMBLY__ */
-
-#endif
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index eeda636cd7be64..3e4ed556094770 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -17,7 +17,59 @@
 #include 
 #include 
 
-#include "bpf_jit64.h"
+#include "bpf_jit.h"
+
+/*
+ * Stack layout:
+ * Ensure the top half (upto local_tmp_var) stays consistent
+ * with our redzone usage.
+ *
+ * [   prev sp ] <-
+ * [   nv gpr save area] 5*8   |
+ * [tail_call_cnt  ] 8 |
+ * [local_tmp_var  ] 16|
+ * fp (r31) -->[   ebpf stack space] upto 512  |
+ * [ frame header  ] 32/112|
+ * sp (r1) --->[stack pointer  ] --
+ */
+
+/* for gpr non volatile registers BPG_REG_6 to 10 */
+#define BPF_PPC_STACK_SAVE (5*8)
+/* for bpf JIT code internal usage */
+#define BPF_PPC_STACK_LOCALS   24
+/* stack frame excluding BPF stack, ensure this is quadword aligned */
+#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
+BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
+
+/* BPF register usage */
+#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)
+#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)
+
+/* BPF to ppc register mappings */
+const int b2p[MAX_BPF_JIT_REG + 2] = {
+   /* function return value */
+   [BPF_REG_0] = 8,
+   /* function arguments */
+   [BPF_REG_1] = 3,
+   [BPF_REG_2] = 4,
+   [BPF_REG_3] = 5,
+   [BPF_REG_4] = 6,
+   [BPF_REG_5] = 7,
+   /* non volatile registers */
+   [BPF_REG_6] = 27,
+   [BPF_REG_7] = 28,
+   [BPF_REG_8] = 29,
+   [BPF_REG_9] = 30,
+   /* frame pointer aka BPF_REG_10 */
+   [BPF_REG_FP] = 31,
+   /* eBPF jit internal registers */
+   [BPF_REG_AX] = 12,
+   [TMP_REG_1] = 9,
+   [TMP_REG_2] = 10
+};
+
+/* PPC NVR range -- update this if we ever use NVRs below r27 */
+#define BPF_PPC_NVR_MIN27
 
 static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
 {
-- 
2.35.1

[PATCH powerpc/next 15/17] powerpc/bpf: Use _Rn macros for GPRs

2022-02-14 Thread Naveen N. Rao

Use _Rn macros to specify register names to make their usage clear.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp32.c | 30 +++---
 arch/powerpc/net/bpf_jit_comp64.c | 68 +++
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 063e3a1be9270d..fe4e0eca017ede 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -41,23 +41,23 @@
 /* BPF to ppc register mappings */
 const int b2p[MAX_BPF_JIT_REG + 1] = {
/* function return value */
-   [BPF_REG_0] = 12,
+   [BPF_REG_0] = _R12,
/* function arguments */
-   [BPF_REG_1] = 4,
-   [BPF_REG_2] = 6,
-   [BPF_REG_3] = 8,
-   [BPF_REG_4] = 10,
-   [BPF_REG_5] = 22,
+   [BPF_REG_1] = _R4,
+   [BPF_REG_2] = _R6,
+   [BPF_REG_3] = _R8,
+   [BPF_REG_4] = _R10,
+   [BPF_REG_5] = _R22,
/* non volatile registers */
-   [BPF_REG_6] = 24,
-   [BPF_REG_7] = 26,
-   [BPF_REG_8] = 28,
-   [BPF_REG_9] = 30,
+   [BPF_REG_6] = _R24,
+   [BPF_REG_7] = _R26,
+   [BPF_REG_8] = _R28,
+   [BPF_REG_9] = _R30,
/* frame pointer aka BPF_REG_10 */
-   [BPF_REG_FP] = 18,
+   [BPF_REG_FP] = _R18,
/* eBPF jit internal registers */
-   [BPF_REG_AX] = 20,
-   [TMP_REG] = 31, /* 32 bits */
+   [BPF_REG_AX] = _R20,
+   [TMP_REG] = _R31,   /* 32 bits */
 };
 
 static int bpf_to_ppc(struct codegen_context *ctx, int reg)
@@ -66,8 +66,8 @@ static int bpf_to_ppc(struct codegen_context *ctx, int reg)
 }
 
 /* PPC NVR range -- update this if we ever use NVRs below r17 */
-#define BPF_PPC_NVR_MIN17
-#define BPF_PPC_TC 16
+#define BPF_PPC_NVR_MIN_R17
+#define BPF_PPC_TC _R16
 
 static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
 {
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 3e4ed556094770..ac06efa7022379 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -48,28 +48,28 @@
 /* BPF to ppc register mappings */
 const int b2p[MAX_BPF_JIT_REG + 2] = {
/* function return value */
-   [BPF_REG_0] = 8,
+   [BPF_REG_0] = _R8,
/* function arguments */
-   [BPF_REG_1] = 3,
-   [BPF_REG_2] = 4,
-   [BPF_REG_3] = 5,
-   [BPF_REG_4] = 6,
-   [BPF_REG_5] = 7,
+   [BPF_REG_1] = _R3,
+   [BPF_REG_2] = _R4,
+   [BPF_REG_3] = _R5,
+   [BPF_REG_4] = _R6,
+   [BPF_REG_5] = _R7,
/* non volatile registers */
-   [BPF_REG_6] = 27,
-   [BPF_REG_7] = 28,
-   [BPF_REG_8] = 29,
-   [BPF_REG_9] = 30,
+   [BPF_REG_6] = _R27,
+   [BPF_REG_7] = _R28,
+   [BPF_REG_8] = _R29,
+   [BPF_REG_9] = _R30,
/* frame pointer aka BPF_REG_10 */
-   [BPF_REG_FP] = 31,
+   [BPF_REG_FP] = _R31,
/* eBPF jit internal registers */
-   [BPF_REG_AX] = 12,
-   [TMP_REG_1] = 9,
-   [TMP_REG_2] = 10
+   [BPF_REG_AX] = _R12,
+   [TMP_REG_1] = _R9,
+   [TMP_REG_2] = _R10
 };
 
 /* PPC NVR range -- update this if we ever use NVRs below r27 */
-#define BPF_PPC_NVR_MIN27
+#define BPF_PPC_NVR_MIN_R27
 
 static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
 {
@@ -136,7 +136,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
if (ctx->seen & SEEN_TAILCALL) {
EMIT(PPC_RAW_LI(b2p[TMP_REG_1], 0));
/* this goes in the redzone */
-   EMIT(PPC_RAW_STD(b2p[TMP_REG_1], 1, -(BPF_PPC_STACK_SAVE + 8)));
+   EMIT(PPC_RAW_STD(b2p[TMP_REG_1], _R1, -(BPF_PPC_STACK_SAVE + 
8)));
} else {
EMIT(PPC_RAW_NOP());
EMIT(PPC_RAW_NOP());
@@ -149,10 +149,10 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 */
if (ctx->seen & SEEN_FUNC) {
EMIT(PPC_RAW_MFLR(_R0));
-   EMIT(PPC_RAW_STD(0, 1, PPC_LR_STKOFF));
+   EMIT(PPC_RAW_STD(_R0, _R1, PPC_LR_STKOFF));
}
 
-   EMIT(PPC_RAW_STDU(1, 1, -(BPF_PPC_STACKFRAME + 
ctx->stack_size)));
+   EMIT(PPC_RAW_STDU(_R1, _R1, -(BPF_PPC_STACKFRAME + 
ctx->stack_size)));
}
 
/*
@@ -162,11 +162,11 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 */
for (i = BPF_REG_6; i <= BPF_REG_10; i++)
if (bpf_is_seen_register(ctx, b2p[i]))
-   EMIT(PPC_RAW_STD(b2p[i], 1, bpf_jit_stack_offsetof(ctx, 
b2p[i])));
+   EMIT(PPC_RAW_STD(b2p[i], _R1, 
bpf_jit_stack_offsetof(ctx, b2p[i])));
 
/* Setup frame pointer to point to the bpf stack area */
if (bpf_is_seen_register(ctx,

[PATCH powerpc/next 13/17] powerpc/bpf: Cleanup bpf_jit.h

2022-02-14 Thread Naveen N. Rao

- PPC_EX32() is only used by ppc32 JIT. Move it to bpf_jit_comp32.c
- PPC_LI64() is only valid in ppc64. #ifdef it
- PPC_FUNC_ADDR() is not used anymore. Remove it.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h| 10 +-
 arch/powerpc/net/bpf_jit_comp32.c |  2 ++
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 0832235a274983..d9bdc9df2e48ed 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -59,10 +59,7 @@
EMIT(PPC_RAW_ORI(d, d, IMM_L(i)));\
} } while(0)
 
-#ifdef CONFIG_PPC32
-#define PPC_EX32(r, i) EMIT(PPC_RAW_LI((r), (i) < 0 ? -1 : 0))
-#endif
-
+#ifdef CONFIG_PPC64
 #define PPC_LI64(d, i) do {  \
if ((long)(i) >= -2147483648 &&   \
(long)(i) < 2147483648)   \
@@ -85,11 +82,6 @@
EMIT(PPC_RAW_ORI(d, d, (uintptr_t)(i) &   \
0x)); \
} } while (0)
-
-#ifdef CONFIG_PPC64
-#define PPC_FUNC_ADDR(d,i) do { PPC_LI64(d, i); } while(0)
-#else
-#define PPC_FUNC_ADDR(d,i) do { PPC_LI32(d, i); } while(0)
 #endif
 
 /*
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index cf66b25ed7c865..063e3a1be9270d 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -36,6 +36,8 @@
 /* BPF register usage */
 #define TMP_REG(MAX_BPF_JIT_REG + 0)
 
+#define PPC_EX32(r, i) EMIT(PPC_RAW_LI((r), (i) < 0 ? -1 : 0))
+
 /* BPF to ppc register mappings */
 const int b2p[MAX_BPF_JIT_REG + 1] = {
/* function return value */
-- 
2.35.1

[PATCH powerpc/next 12/17] powerpc64/bpf: Get rid of PPC_BPF_[LL|STL|STLU] macros

2022-02-14 Thread Naveen N. Rao

All these macros now have a single user. Expand their usage in place.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit64.h  | 22 --
 arch/powerpc/net/bpf_jit_comp64.c | 21 +++--
 2 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 82cdfee412784a..199348b7296653 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -64,28 +64,6 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
 /* PPC NVR range -- update this if we ever use NVRs below r27 */
 #define BPF_PPC_NVR_MIN27
 
-/*
- * WARNING: These can use TMP_REG_2 if the offset is not at word boundary,
- * so ensure that it isn't in use already.
- */
-#define PPC_BPF_LL(r, base, i) do {  \
-   if ((i) % 4) {\
-   EMIT(PPC_RAW_LI(b2p[TMP_REG_2], (i)));\
-   EMIT(PPC_RAW_LDX(r, base, \
-   b2p[TMP_REG_2])); \
-   } else\
-   EMIT(PPC_RAW_LD(r, base, i)); \
-   } while(0)
-#define PPC_BPF_STL(r, base, i) do { \
-   if ((i) % 4) {\
-   EMIT(PPC_RAW_LI(b2p[TMP_REG_2], (i)));\
-   EMIT(PPC_RAW_STDX(r, base,\
-   b2p[TMP_REG_2])); \
-   } else\
-   EMIT(PPC_RAW_STD(r, base, i));\
-   } while(0)
-#define PPC_BPF_STLU(r, base, i) do { EMIT(PPC_RAW_STDU(r, base, i)); } 
while(0)
-
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 411ac41dba4293..eeda636cd7be64 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -100,7 +100,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_STD(0, 1, PPC_LR_STKOFF));
}
 
-   PPC_BPF_STLU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size));
+   EMIT(PPC_RAW_STDU(1, 1, -(BPF_PPC_STACKFRAME + 
ctx->stack_size)));
}
 
/*
@@ -726,7 +726,12 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
PPC_LI32(b2p[TMP_REG_1], imm);
src_reg = b2p[TMP_REG_1];
}
-   PPC_BPF_STL(src_reg, dst_reg, off);
+   if (off % 4) {
+   EMIT(PPC_RAW_LI(b2p[TMP_REG_2], off));
+   EMIT(PPC_RAW_STDX(src_reg, dst_reg, 
b2p[TMP_REG_2]));
+   } else {
+   EMIT(PPC_RAW_STD(src_reg, dst_reg, off));
+   }
break;
 
/*
@@ -802,9 +807,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
PPC_BCC_SHORT(COND_GT, (ctx->idx + 3) * 4);
EMIT(PPC_RAW_LI(dst_reg, 0));
/*
-* Check if 'off' is word aligned because 
PPC_BPF_LL()
-* (BPF_DW case) generates two instructions if 
'off' is not
-* word-aligned and one instruction otherwise.
+* Check if 'off' is word aligned for BPF_DW, 
because
+* we might generate two instructions.
 */
if (BPF_SIZE(code) == BPF_DW && (off & 3))
PPC_JMP((ctx->idx + 3) * 4);
@@ -823,7 +827,12 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
break;
case BPF_DW:
-   PPC_BPF_LL(dst_reg, src_reg, off);
+   if (off % 4) {
+   EMIT(PPC_RAW_LI(b2p[TMP_REG_1], off));
+   EMIT(PPC_RAW_LDX(dst_reg, src_reg, 
b2p[TMP_REG_1]));
+   } else {
+   EMIT(PPC_RAW_LD(dst_reg, src_reg, off));
+   }
break;
}
 
--

[PATCH powerpc/next 03/17] powerpc/bpf: Handle large branch ranges with BPF_EXIT

2022-02-14 Thread Naveen N. Rao

In some scenarios, it is possible that the program epilogue is outside
the branch range for a BPF_EXIT instruction. Instead of rejecting such
programs, emit epilogue as an alternate exit point from the program.
Track the location of the same so that subsequent exits can take either
of the two paths.

Reported-by: Jordan Niethe 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h|  2 ++
 arch/powerpc/net/bpf_jit_comp.c   | 22 +-
 arch/powerpc/net/bpf_jit_comp32.c |  7 +--
 arch/powerpc/net/bpf_jit_comp64.c |  7 +--
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 9cdd33d6be4cc0..3b5c44c0b6638d 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -151,6 +151,7 @@ struct codegen_context {
unsigned int stack_size;
int b2p[ARRAY_SIZE(b2p)];
unsigned int exentry_idx;
+   unsigned int alt_exit_addr;
 };
 
 #ifdef CONFIG_PPC32
@@ -186,6 +187,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
+int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
 
 int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct 
codegen_context *ctx,
  int insn_idx, int jmp_off, int dst_reg);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 56dd1f4e3e4447..141e64585b6458 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -89,6 +89,22 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 
*image,
return 0;
 }
 
+int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr)
+{
+   if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 
4))) {
+   PPC_JMP(exit_addr);
+   } else if (ctx->alt_exit_addr) {
+   if (WARN_ON(!is_offset_in_branch_range((long)ctx->alt_exit_addr 
- (ctx->idx * 4
+   return -1;
+   PPC_JMP(ctx->alt_exit_addr);
+   } else {
+   ctx->alt_exit_addr = ctx->idx * 4;
+   bpf_jit_build_epilogue(image, ctx);
+   }
+
+   return 0;
+}
+
 struct powerpc64_jit_data {
struct bpf_binary_header *header;
u32 *addrs;
@@ -177,8 +193,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 * If we have seen a tail call, we need a second pass.
 * This is because bpf_jit_emit_common_epilogue() is called
 * from bpf_jit_emit_tail_call() with a not yet stable ctx->seen.
+* We also need a second pass if we ended up with too large
+* a program so as to ensure BPF_EXIT branches are in range.
 */
-   if (cgctx.seen & SEEN_TAILCALL) {
+   if (cgctx.seen & SEEN_TAILCALL || 
!is_offset_in_branch_range((long)cgctx.idx * 4)) {
cgctx.idx = 0;
if (bpf_jit_build_body(fp, 0, , addrs, 0)) {
fp = org_fp;
@@ -193,6 +211,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 * calculate total size from idx.
 */
bpf_jit_build_prologue(0, );
+   addrs[fp->len] = cgctx.idx * 4;
bpf_jit_build_epilogue(0, );
 
fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
@@ -233,6 +252,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
for (pass = 1; pass < 3; pass++) {
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
+   cgctx.alt_exit_addr = 0;
bpf_jit_build_prologue(code_base, );
if (bpf_jit_build_body(fp, code_base, , addrs, pass)) {
bpf_jit_binary_free(bpf_hdr);
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 81e0c56661ddf2..f401bfc5a67684 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -929,8 +929,11 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
 * the epilogue. If we _are_ the last instruction,
 * we'll just fall through to the epilogue.
 */
-   if (i != flen - 1)
-   PPC_JMP(exit_addr);
+   if (i != flen - 1) {
+   ret = bpf_jit_emit_exit_insn(image, ctx, _R0, 
exit_addr);
+   if (ret)
+   return ret;
+   }
/* else fall through to the epilogue */
break;
 
diff --git

[PATCH powerpc/next 11/17] powerpc64/bpf: Convert some of the uses of PPC_BPF_[LL|STL] to PPC_BPF_[LD|STD]

2022-02-14 Thread Naveen N. Rao

PPC_BPF_[LL|STL] are macros meant for scenarios where we may have to
deal with a non-word aligned offset. Limit their usage to only those
scenarios by converting the rest to just use PPC_BPF_[LD|STD].

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index bff200723e7282..411ac41dba4293 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -74,7 +74,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
int i;
 
if (__is_defined(PPC64_ELF_ABI_v2))
-   PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
+   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, 
kernel_toc)));
 
/*
 * Initialize tail_call_cnt if we do tail calls.
@@ -84,7 +84,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
if (ctx->seen & SEEN_TAILCALL) {
EMIT(PPC_RAW_LI(b2p[TMP_REG_1], 0));
/* this goes in the redzone */
-   PPC_BPF_STL(b2p[TMP_REG_1], 1, -(BPF_PPC_STACK_SAVE + 8));
+   EMIT(PPC_RAW_STD(b2p[TMP_REG_1], 1, -(BPF_PPC_STACK_SAVE + 8)));
} else {
EMIT(PPC_RAW_NOP());
EMIT(PPC_RAW_NOP());
@@ -97,7 +97,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 */
if (ctx->seen & SEEN_FUNC) {
EMIT(PPC_RAW_MFLR(_R0));
-   PPC_BPF_STL(0, 1, PPC_LR_STKOFF);
+   EMIT(PPC_RAW_STD(0, 1, PPC_LR_STKOFF));
}
 
PPC_BPF_STLU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size));
@@ -110,7 +110,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 */
for (i = BPF_REG_6; i <= BPF_REG_10; i++)
if (bpf_is_seen_register(ctx, b2p[i]))
-   PPC_BPF_STL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, 
b2p[i]));
+   EMIT(PPC_RAW_STD(b2p[i], 1, bpf_jit_stack_offsetof(ctx, 
b2p[i])));
 
/* Setup frame pointer to point to the bpf stack area */
if (bpf_is_seen_register(ctx, b2p[BPF_REG_FP]))
@@ -125,13 +125,13 @@ static void bpf_jit_emit_common_epilogue(u32 *image, 
struct codegen_context *ctx
/* Restore NVRs */
for (i = BPF_REG_6; i <= BPF_REG_10; i++)
if (bpf_is_seen_register(ctx, b2p[i]))
-   PPC_BPF_LL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, 
b2p[i]));
+   EMIT(PPC_RAW_LD(b2p[i], 1, bpf_jit_stack_offsetof(ctx, 
b2p[i])));
 
/* Tear down our stack frame */
if (bpf_has_stack_frame(ctx)) {
EMIT(PPC_RAW_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size));
if (ctx->seen & SEEN_FUNC) {
-   PPC_BPF_LL(0, 1, PPC_LR_STKOFF);
+   EMIT(PPC_RAW_LD(0, 1, PPC_LR_STKOFF));
EMIT(PPC_RAW_MTLR(0));
}
}
@@ -229,7 +229,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
 * if (tail_call_cnt >= MAX_TAIL_CALL_CNT)
 *   goto out;
 */
-   PPC_BPF_LL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx));
+   EMIT(PPC_RAW_LD(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx)));
EMIT(PPC_RAW_CMPLWI(b2p[TMP_REG_1], MAX_TAIL_CALL_CNT));
PPC_BCC_SHORT(COND_GE, out);
 
@@ -237,12 +237,12 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
 * tail_call_cnt++;
 */
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], b2p[TMP_REG_1], 1));
-   PPC_BPF_STL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx));
+   EMIT(PPC_RAW_STD(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx)));
 
/* prog = array->ptrs[index]; */
EMIT(PPC_RAW_MULI(b2p[TMP_REG_1], b2p_index, 8));
EMIT(PPC_RAW_ADD(b2p[TMP_REG_1], b2p[TMP_REG_1], b2p_bpf_array));
-   PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_array, 
ptrs));
+   EMIT(PPC_RAW_LD(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct 
bpf_array, ptrs)));
 
/*
 * if (prog == NULL)
@@ -252,7 +252,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
PPC_BCC_SHORT(COND_EQ, out);
 
/* goto *(prog->bpf_func + prologue_size); */
-   PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_prog, 
bpf_func));
+   EMIT(PPC_RAW_LD(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct 
bpf_prog, bpf_func)));
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], b2p[TMP_REG_1],
FUNCTION_DESCR_SIZE + bpf_tailcall_prologue_size));
EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
@@ -628,7 +628,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,

[PATCH powerpc/next 10/17] powerpc/bpf: Rename PPC_BL_ABS() to PPC_BL()

2022-02-14 Thread Naveen N. Rao

PPC_BL_ABS() is just doing a relative branch with link. The name
suggests that it is for branching to an absolute address, which is
incorrect. Rename the macro to a more appropriate PPC_BL().

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h| 6 +++---
 arch/powerpc/net/bpf_jit_comp32.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 5cb3efd76715a9..0832235a274983 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -34,9 +34,9 @@
EMIT(PPC_RAW_BRANCH(offset)); \
} while (0)
 
-/* blr; (unconditional 'branch' with link) to absolute address */
-#define PPC_BL_ABS(dest)   EMIT(PPC_INST_BL |\
-(((dest) - (unsigned long)(image + 
ctx->idx)) & 0x03fc))
+/* bl (unconditional 'branch' with link) */
+#define PPC_BL(dest)   EMIT(PPC_INST_BL | (((dest) - (unsigned long)(image + 
ctx->idx)) & 0x03fc))
+
 /* "cond" here covers BO:BI fields. */
 #define PPC_BCC_SHORT(cond, dest)\
do {  \
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 014cf893ce90d6..cf66b25ed7c865 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -190,7 +190,7 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 func
s32 rel = (s32)func - (s32)(image + ctx->idx);
 
if (image && rel < 0x200 && rel >= -0x200) {
-   PPC_BL_ABS(func);
+   PPC_BL(func);
EMIT(PPC_RAW_NOP());
EMIT(PPC_RAW_NOP());
EMIT(PPC_RAW_NOP());
-- 
2.35.1

[PATCH powerpc/next 09/17] powerpc64/bpf: Optimize instruction sequence used for function calls

2022-02-14 Thread Naveen N. Rao

When calling BPF helpers, we load the function address to call into a
register. This can result in upto 5 instructions. Optimize this by
instead using the kernel toc in r2 and adjusting offset to the BPF
helper. This works since all BPF helpers are part of kernel text, and
all BPF programs/functions utilize the kernel TOC.

Further more:
- load the actual function entry address in elf v1, rather than loading
  it through the function descriptor address.
- load the Local Entry Point (LEP) in elf v2 skipping TOC setup.
- consolidate code across elf abi v1 and v2 by using r12 on both.

Reported-by: Anton Blanchard 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index e9fd4694226fe0..bff200723e7282 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -150,22 +150,20 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
 static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, 
u64 func)
 {
unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
+   long reladdr;
 
if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
return -EINVAL;
 
-#ifdef PPC64_ELF_ABI_v1
-   /* func points to the function descriptor */
-   PPC_LI64(b2p[TMP_REG_2], func);
-   /* Load actual entry point from function descriptor */
-   PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
-   /* ... and move it to CTR */
-   EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
-#else
-   /* We can clobber r12 */
-   PPC_FUNC_ADDR(12, func);
-   EMIT(PPC_RAW_MTCTR(12));
-#endif
+   reladdr = func_addr - kernel_toc_addr();
+   if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
+   pr_err("eBPF: address of %ps out of range of kernel_toc.\n", 
(void *)func);
+   return -ERANGE;
+   }
+
+   EMIT(PPC_RAW_ADDIS(_R12, _R2, PPC_HA(reladdr)));
+   EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr)));
+   EMIT(PPC_RAW_MTCTR(_R12));
EMIT(PPC_RAW_BCTRL());
 
return 0;
@@ -178,6 +176,9 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 func
if (WARN_ON_ONCE(func && is_module_text_address(func)))
return -EINVAL;
 
+   /* skip past descriptor if elf v1 */
+   func += FUNCTION_DESCR_SIZE;
+
/* Load function address into r12 */
PPC_LI64(12, func);
 
@@ -194,11 +195,6 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 func
for (i = ctx->idx - ctx_idx; i < 5; i++)
EMIT(PPC_RAW_NOP());
 
-#ifdef PPC64_ELF_ABI_v1
-   /* Load actual entry point from function descriptor */
-   PPC_BPF_LL(12, 12, 0);
-#endif
-
EMIT(PPC_RAW_MTCTR(12));
EMIT(PPC_RAW_BCTRL());
 
-- 
2.35.1

[PATCH powerpc/next 08/17] powerpc64/bpf elfv1: Do not load TOC before calling functions

2022-02-14 Thread Naveen N. Rao

BPF helpers always reside in core kernel and all BPF programs use the
kernel TOC. As such, there is no need to load the TOC before calling
helpers or other BPF functions. Drop code to do the same.

Add a check to ensure we don't proceed if this assumption ever changes
in future.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h|  2 +-
 arch/powerpc/net/bpf_jit_comp.c   |  4 +++-
 arch/powerpc/net/bpf_jit_comp32.c |  8 +--
 arch/powerpc/net/bpf_jit_comp64.c | 39 ---
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 3b5c44c0b6638d..5cb3efd76715a9 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -181,7 +181,7 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
ctx->seen &= ~(1 << (31 - i));
 }
 
-void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
+int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
   u32 *addrs, int pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 141e64585b6458..635f7448ff7952 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -59,7 +59,9 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 
*image,
 */
tmp_idx = ctx->idx;
ctx->idx = addrs[i] / 4;
-   bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   if (ret)
+   return ret;
 
/*
 * Restore ctx->idx here. This is safe as the length
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index f401bfc5a67684..014cf893ce90d6 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -185,7 +185,7 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_BLR());
 }
 
-void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func)
+int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func)
 {
s32 rel = (s32)func - (s32)(image + ctx->idx);
 
@@ -201,6 +201,8 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 fun
EMIT(PPC_RAW_MTCTR(_R0));
EMIT(PPC_RAW_BCTRL());
}
+
+   return 0;
 }
 
 static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 
out)
@@ -953,7 +955,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
EMIT(PPC_RAW_STW(bpf_to_ppc(ctx, BPF_REG_5), 
_R1, 12));
}
 
-   bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   if (ret)
+   return ret;
 
EMIT(PPC_RAW_MR(bpf_to_ppc(ctx, BPF_REG_0) - 1, _R3));
EMIT(PPC_RAW_MR(bpf_to_ppc(ctx, BPF_REG_0), _R4));
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 44314ee60155e4..e9fd4694226fe0 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -147,9 +147,13 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_BLR());
 }
 
-static void bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx,
-  u64 func)
+static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, 
u64 func)
 {
+   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
+
+   if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
+   return -EINVAL;
+
 #ifdef PPC64_ELF_ABI_v1
/* func points to the function descriptor */
PPC_LI64(b2p[TMP_REG_2], func);
@@ -157,25 +161,23 @@ static void bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx,
PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
/* ... and move it to CTR */
EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
-   /*
-* Load TOC from function descriptor at offset 8.
-* We can clobber r2 since we get called through a
-* function pointer (so caller will save/restore r2)
-* and since we don't use a TOC ourself.
-*/
-   PPC_BPF_LL(2, b2p[TMP_REG_2], 8);
 #else
/* We can clobber r12 */
PPC_FUNC_ADDR(12, func);
EMIT(PPC_RAW_MTCTR(12));
 #endif

[PATCH powerpc/next 07/17] powerpc64/bpf elfv2: Setup kernel TOC in r2 on entry

2022-02-14 Thread Naveen N. Rao

In preparation for using kernel TOC, load the same in r2 on entry. With
elfv1, the kernel TOC is already setup by our caller.

We adjust the number of instructions to skip on a tail call accordingly.
We get rid of the #ifdef in bpf_jit_emit_tail_call() since
FUNCTION_DESCR_SIZE is itself under a #ifdef.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 27ac2fc7670298..44314ee60155e4 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -73,6 +73,9 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 {
int i;
 
+   if (__is_defined(PPC64_ELF_ABI_v2))
+   PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
+
/*
 * Initialize tail_call_cnt if we do tail calls.
 * Otherwise, put in NOPs so that it can be skipped when we are
@@ -87,8 +90,6 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_NOP());
}
 
-#define BPF_TAILCALL_PROLOGUE_SIZE 8
-
if (bpf_has_stack_frame(ctx)) {
/*
 * We need a stack frame, but we don't necessarily need to
@@ -217,6 +218,10 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
 */
int b2p_bpf_array = b2p[BPF_REG_2];
int b2p_index = b2p[BPF_REG_3];
+   int bpf_tailcall_prologue_size = 8;
+
+   if (__is_defined(PPC64_ELF_ABI_v2))
+   bpf_tailcall_prologue_size += 4; /* skip past the toc load */
 
/*
 * if (index >= array->map.max_entries)
@@ -255,13 +260,8 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
 
/* goto *(prog->bpf_func + prologue_size); */
PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_prog, 
bpf_func));
-#ifdef PPC64_ELF_ABI_v1
-   /* skip past the function descriptor */
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], b2p[TMP_REG_1],
-   FUNCTION_DESCR_SIZE + BPF_TAILCALL_PROLOGUE_SIZE));
-#else
-   EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], b2p[TMP_REG_1], 
BPF_TAILCALL_PROLOGUE_SIZE));
-#endif
+   FUNCTION_DESCR_SIZE + bpf_tailcall_prologue_size));
EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
 
/* tear down stack, restore NVRs, ... */
-- 
2.35.1

[PATCH powerpc/next 06/17] powerpc64: Set PPC64_ELF_ABI_v[1|2] macros to 1

2022-02-14 Thread Naveen N. Rao

Set macros to 1 so that they can be used with __is_defined().

Suggested-by: Christophe Leroy 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/types.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/types.h b/arch/powerpc/include/asm/types.h
index f1630c553efe43..6c0411ce406255 100644
--- a/arch/powerpc/include/asm/types.h
+++ b/arch/powerpc/include/asm/types.h
@@ -13,9 +13,9 @@
 
 #ifdef __powerpc64__
 #if defined(_CALL_ELF) && _CALL_ELF == 2
-#define PPC64_ELF_ABI_v2
+#define PPC64_ELF_ABI_v2 1
 #else
-#define PPC64_ELF_ABI_v1
+#define PPC64_ELF_ABI_v1 1
 #endif
 #endif /* __powerpc64__ */
 
-- 
2.35.1

[PATCH powerpc/next 05/17] powerpc64/bpf: Use r12 for constant blinding

2022-02-14 Thread Naveen N. Rao

In preparation for preserving kernel toc in r2, switch BPF_REG_AX from
r2 to r12. r12 is not used by bpf JIT except during external helper/bpf
calls, or with BPF_NOSPEC. These sequences aren't emitted when
BPF_REG_AX is used for constant blinding and other purposes.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit64.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index b63b35e45e558c..82cdfee412784a 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -56,7 +56,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
/* frame pointer aka BPF_REG_10 */
[BPF_REG_FP] = 31,
/* eBPF jit internal registers */
-   [BPF_REG_AX] = 2,
+   [BPF_REG_AX] = 12,
[TMP_REG_1] = 9,
[TMP_REG_2] = 10
 };
-- 
2.35.1

[PATCH powerpc/next 04/17] powerpc64/bpf: Do not save/restore LR on each call to bpf_stf_barrier()

2022-02-14 Thread Naveen N. Rao

Instead of saving and restoring LR before each invocation to
bpf_stf_barrier(), set SEEN_FUNC flag so that we save/restore LR in
prologue/epilogue.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 371bd5a16859c7..27ac2fc7670298 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -690,11 +690,10 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
break;
case STF_BARRIER_FALLBACK:
-   EMIT(PPC_RAW_MFLR(b2p[TMP_REG_1]));
+   ctx->seen |= SEEN_FUNC;
PPC_LI64(12, 
dereference_kernel_function_descriptor(bpf_stf_barrier));
EMIT(PPC_RAW_MTCTR(12));
EMIT(PPC_RAW_BCTRL());
-   EMIT(PPC_RAW_MTLR(b2p[TMP_REG_1]));
break;
case STF_BARRIER_NONE:
break;
-- 
2.35.1

1 2 >

1 - 100 of 112 matches

Mail list logo