ppc64le bzImage and Build_id elf note

2022-06-03 Thread Donald Zickus
Hi Michael,

I am working on two packaging issues with Fedora and CKI that I am hoping
you can give me some guidance on.

1 - Fedora has always packaged an eu-strip'd vmlinux file for powerpc.  The
other arches we support used native compressed images.  I was looking into
using powerpc's zImage (pseries) binary to remove the powerpc workarounds
in our rpm spec file.

However, the rpmbuild fails because it can't find a build-id with
eu-readelf -n zImage.  Sure enough the build-id is found in vmlinux and
vmlinux.stripped but disappears with vmlinux.stripped.gz.  I had hoped
arch/powerpc/boot/addnote would stick it back in but it doesn't (I am
ignorant of how addnote works).

eu-readelf -n  data
vmlinux:

Displaying notes found in: .notes
  OwnerData sizeDescription
  GNU  0x0014   NT_GNU_BUILD_ID (unique build ID
bitstring)
Build ID: b4c026d72ead7b4316a221cddb7f2b10d75fb313
  Linux0x0004   func
   description data: 00 00 00 00
  Linux0x0001   OPEN
   description data: 00
  PowerPC  0x0004   NT_VERSION (version)
   description data: 01 00 00 00

zImage:

Displaying notes found at file offset 0x0158 with length 0x002c:
  OwnerData sizeDescription
  PowerPC  0x0018   Unknown note type: (0x1275)
   description data: ff ff ff ff 02 00 00 00 ff ff ff ff ff ff ff ff ff ff
ff ff 00 00 40 00

Displaying notes found at file offset 0x0184 with length 0x0044:
  OwnerData sizeDescription
  IBM,RPA-Client-[...] 0x0020   Unknown note type: (0x1275)
   description data: 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 28 00 00
00 01 ff ff ff ff 00 00 00 00 00 00 00 01

Is this something that can be addressed?  Or should I/we expect the
build-id to never make it into the zImage and just continue with our
current vmlinux process?


2 - CKI builds kernels using 'make targz-pkg'.  The arches we support
(x86_64, s390, aarch64) provide compressed binaries to package using
KBUILD_IMAGE or a specific entry in scripts/package/buildtar.  As a result,
because powerpc doesn't have a KBUILD_IMAGE variable defined, the script
builds vmlinx and cp's that to vmlinux-kbuild.  The problem with powerpc is
that vmlinux for us is huge ( >256MB) and cp'ing that to vmlinux-kbuild
occupies > 512MB of /boot and our distro runs out of disk space on that
partition.

I was hoping to add a patch to arch/powerpc/Makefile that defines
KBUILD_IMAGE:=$(boot)/zImage (mimicing arch/s390), which I believe would
solve our problem.  However, that circles back to our first problem above.

Thoughts?  Help?

Cheers,
Don


Re: [PATCH] PCI/AER: Iterate over error counters instead of error

2022-06-03 Thread Bjorn Helgaas
On Fri, Jun 03, 2022 at 10:12:47PM +, Mohamed Khalfella wrote:
> Is there any chance for this to land in 5.19?

Too late for v5.19, since the merge window will end in a couple days.
Remind me again if you don't see it in -next by v5.20-rc5 or so.

> On 5/10/22 14:17, Mohamed Khalfella wrote:
> > > Thanks for catching this; it definitely looks like a real issue!  I
> > > guess you're probably seeing junk in the sysfs files?
> > 
> > That is correct. The initial report was seeing junk when reading sysfs
> > files. As descibed, this is happening because we reading data past the
> > end of the stats counters array.
> > 
> > 
> > > I think maybe we should populate the currently NULL entries in the
> > > string[] arrays and simplify the code here, e.g.,
> > > 
> > > static const char *aer_correctable_error_string[] = {
> > >"RxErr",/* Bit Position 0   */
> > >"dev_cor_errs_bit[1]",
> > >   ...
> > >
> > >  if (stats[i])
> > >len += sysfs_emit_at(buf, len, "%s %llu\n", strings_array[i], 
> > > stats[i]);
> > 
> > Doing it this way will change the output format. In this case we will show
> > stats only if their value is greater than zero. The current code shows all 
> > the
> > stats those have names (regardless of their value) plus those have non-zero
> > values.
> > 
> > >> @@ -1342,6 +1342,11 @@ static int aer_probe(struct pcie_device *dev)
> > >>  struct device *device = &dev->device;
> > >>  struct pci_dev *port = dev->port;
> > >>
> > >> +BUILD_BUG_ON(ARRAY_SIZE(aer_correctable_error_string) <
> > >> + AER_MAX_TYPEOF_COR_ERRS);
> > >> +BUILD_BUG_ON(ARRAY_SIZE(aer_uncorrectable_error_string) <
> > >> + AER_MAX_TYPEOF_UNCOR_ERRS);
> > >
> > > And make these check for "!=" instead of "<".
> 
> I am happy to remove these BUILD_BUG_ON() if you think it is a good
> idea to do so.

I think it's good to enforce correctness there somehow, so let's leave
them there unless somebody has a better idea.

> > This will require unnecessarily extending stats arrays to have 32 entries
> > in order to match names arrays. If you don't feel strogly about changing
> > "<" to "!=", I prefer to keep the code as it is. 


[PATCH 00/14] mtd: Fix platform remove callbacks to always return 0

2022-06-03 Thread Uwe Kleine-König
Hello,

this series prepares to make platform remove callbacks return void.
Therefor first update them to always return 0. The rationale is that the
Linux device model doesn't handle failures on remove and if a remove
callback returns an error, it just emits a quite generic error message
and still removes the device.

Best regards
Uwe

Uwe Kleine-König (14):
  mtd: hyperbus: Make hyperbus_unregister_device() return void
  mtd: spi-nor: aspeed-smc: Make aspeed_smc_unregister() return void
  mtd: powernv_flash: Warn about failure to unregister mtd device
  mtd: st-spi_fsm: Warn about failure to unregister mtd device
  mtd: lpddr2_nvm: Warn about failure to unregister mtd device
  mtd: spear_smi: Don't skip cleanup after mtd_device_unregister()
failed
  mtd: spear_smi: Drop if with an always false condition
  mtd: rawnand: atmel: Warn about failure to unregister mtd device
  mtd: rawnand: omap2: Suppress error message after WARN in .remove()
  mtd: rawnand: tegra: Don't skip cleanup after mtd_device_unregister()
failed
  mtd: rawnand: meson: Don't skip cleanup after mtd_device_unregister()
failed
  mtd: rawnand: meson: Drop cleaning platform data in .remove()
  mtd: physmap: Don't skip cleanup after mtd_device_unregister() failed
  mtd: physmap: Drop if with an always false condition

 drivers/mtd/devices/powernv_flash.c  |  4 +++-
 drivers/mtd/devices/spear_smi.c  | 10 ++
 drivers/mtd/devices/st_spi_fsm.c |  4 +++-
 drivers/mtd/hyperbus/hbmc-am654.c|  6 +++---
 drivers/mtd/hyperbus/hyperbus-core.c |  8 ++--
 drivers/mtd/hyperbus/rpc-if.c|  5 +++--
 drivers/mtd/lpddr/lpddr2_nvm.c   |  4 +++-
 drivers/mtd/maps/physmap-core.c  | 13 +++--
 drivers/mtd/nand/raw/atmel/nand-controller.c |  5 -
 drivers/mtd/nand/raw/meson_nand.c| 16 +++-
 drivers/mtd/nand/raw/omap2.c |  6 ++
 drivers/mtd/nand/raw/tegra_nand.c|  5 +
 drivers/mtd/spi-nor/controllers/aspeed-smc.c |  8 
 include/linux/mtd/hyperbus.h |  4 +---
 14 files changed, 37 insertions(+), 61 deletions(-)

base-commit: 4b0986a3613c92f4ec1bdc7f60ec66fea135991f
-- 
2.36.1



[PATCH 03/14] mtd: powernv_flash: Warn about failure to unregister mtd device

2022-06-03 Thread Uwe Kleine-König
mtd_device_unregister() shouldn't fail. Wail loudly if it does anyhow.

This matches how other drivers (e.g. nand/raw/nandsim.c) use
mtd_device_unregister().

By returning 0 in the platform remove callback a generic error message
by the device core is suppressed, nothing else changes.

This is a preparation for making platform remove callbacks return void.

Signed-off-by: Uwe Kleine-König 
---
 drivers/mtd/devices/powernv_flash.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 6950a8764815..36e060386e59 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -270,7 +270,9 @@ static int powernv_flash_release(struct platform_device 
*pdev)
struct powernv_flash *data = dev_get_drvdata(&(pdev->dev));
 
/* All resources should be freed automatically */
-   return mtd_device_unregister(&(data->mtd));
+   WARN_ON(mtd_device_unregister(&data->mtd));
+
+   return 0;
 }
 
 static const struct of_device_id powernv_flash_match[] = {
-- 
2.36.1



Re: [PATCH v2 1/4] of: constify of_property_check_flags() prop argument

2022-06-03 Thread Rob Herring
On Wed, 01 Jun 2022 10:17:58 +0200, Clément Léger wrote:
> This argument is not modified and thus can be set as const.
> 
> Signed-off-by: Clément Léger 
> ---
>  include/linux/of.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 

As this looks independent of the rest of the series, applied, thanks!


Re: [PATCH v2 4/4] powerpc/pseries: use of_property_alloc/free() and of_node_alloc()

2022-06-03 Thread Rob Herring
On Wed, Jun 01, 2022 at 10:18:01AM +0200, Clément Léger wrote:
> Use of_property_alloc/free() and of_node_alloc() to create and free
> device-tree nodes and properties.
> 
> Signed-off-by: Clément Léger 
> ---
>  arch/powerpc/platforms/pseries/dlpar.c| 51 +++
>  .../platforms/pseries/hotplug-memory.c| 21 +---
>  arch/powerpc/platforms/pseries/reconfig.c | 45 +---
>  3 files changed, 21 insertions(+), 96 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
> b/arch/powerpc/platforms/pseries/dlpar.c
> index 498d6efcb5ae..5a04566e98a4 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -38,61 +38,25 @@ struct cc_workarea {
>   __be32  prop_offset;
>  };
>  
> -void dlpar_free_cc_property(struct property *prop)
> -{
> - kfree(prop->name);
> - kfree(prop->value);
> - kfree(prop);
> -}
> -
>  static struct property *dlpar_parse_cc_property(struct cc_workarea *ccwa)
>  {
> - struct property *prop;
> - char *name;
> - char *value;
> -
> - prop = kzalloc(sizeof(*prop), GFP_KERNEL);
> - if (!prop)
> - return NULL;
> + int length;
> + char *name, *value;
>  
>   name = (char *)ccwa + be32_to_cpu(ccwa->name_offset);
> - prop->name = kstrdup(name, GFP_KERNEL);
> - if (!prop->name) {
> - dlpar_free_cc_property(prop);
> - return NULL;
> - }
> -
> - prop->length = be32_to_cpu(ccwa->prop_length);
> + length = be32_to_cpu(ccwa->prop_length);
>   value = (char *)ccwa + be32_to_cpu(ccwa->prop_offset);
> - prop->value = kmemdup(value, prop->length, GFP_KERNEL);
> - if (!prop->value) {
> - dlpar_free_cc_property(prop);
> - return NULL;
> - }
>  
> - return prop;
> + return of_property_alloc(name, value, length, GFP_KERNEL);
>  }
>  
>  static struct device_node *dlpar_parse_cc_node(struct cc_workarea *ccwa)
>  {
> - struct device_node *dn;
>   const char *name;
>  
> - dn = kzalloc(sizeof(*dn), GFP_KERNEL);
> - if (!dn)
> - return NULL;
> -
>   name = (const char *)ccwa + be32_to_cpu(ccwa->name_offset);
> - dn->full_name = kstrdup(name, GFP_KERNEL);
> - if (!dn->full_name) {
> - kfree(dn);
> - return NULL;
> - }
>  
> - of_node_set_flag(dn, OF_DYNAMIC);
> - of_node_init(dn);
> -
> - return dn;
> + return of_node_alloc(name, GFP_KERNEL);

Do you have any need for different flags? I can't really see a need for 
atomic or dma allocs or ???, so drop it I think.

>  }
>  
>  static void dlpar_free_one_cc_node(struct device_node *dn)
> @@ -102,11 +66,10 @@ static void dlpar_free_one_cc_node(struct device_node 
> *dn)
>   while (dn->properties) {
>   prop = dn->properties;
>   dn->properties = prop->next;
> - dlpar_free_cc_property(prop);
> + of_property_free(prop);

We should be able to just put the node and all the properties already 
attached will be freed.

Looking at the history of this code, it originally did the kref_init 
much later in dlpar_attach_node(). So there was a window of allocating 
the node and adding properties where you'd need to manually free 
everything. Now that the node is referenced from the start, a put should 
free everything.

>   }
>  
> - kfree(dn->full_name);
> - kfree(dn);
> + of_node_put(dn);
>  }
>  
>  void dlpar_free_cc_nodes(struct device_node *dn)
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 2e3a317722a8..2ddf2a0ba048 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -69,33 +69,16 @@ unsigned long pseries_memory_block_size(void)
>   return memblock_size;
>  }
>  
> -static void dlpar_free_property(struct property *prop)
> -{
> - kfree(prop->name);
> - kfree(prop->value);
> - kfree(prop);
> -}
> -
>  static struct property *dlpar_clone_property(struct property *prop,
>u32 prop_size)
>  {
> - struct property *new_prop;
> -
> - new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
> + struct property *new_prop = of_property_alloc(prop->name, NULL,
> +   prop_size, GFP_KERNEL);
>   if (!new_prop)
>   return NULL;
>  
> - new_prop->name = kstrdup(prop->name, GFP_KERNEL);
> - new_prop->value = kzalloc(prop_size, GFP_KERNEL);
> - if (!new_prop->name || !new_prop->value) {
> - dlpar_free_property(new_prop);
> - return NULL;
> - }
> -
>   memcpy(new_prop->value, prop->value, prop->length);
> - new_prop->length = prop_size;
>  
> - of_property_set_flag(new_prop, OF_DYNAMIC);
>   return new_prop;
>  }
>  
> diff --git a/arch/powerpc/platforms/pseries/r

[PATCH v2] powerpc/irq: Increase stack_overflow detection limit when KASAN is enabled

2022-06-03 Thread Christophe Leroy
When KASAN is enabled, as shown by the Oops below, the 2k limit is not
enough to allow stack dump after a stack overflow detection when
CONFIG_DEBUG_STACKOVERFLOW is selected:

do_IRQ: stack overflow: 1984
CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1
Call Trace:
Oops: Kernel stack overflow, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: sr_mod cdrom radeon(+) ohci_pci(+) hwmon 
i2c_algo_bit drm_ttm_helper ttm drm_dp_helper snd_aoa_i2sbus snd_aoa_soundbus 
snd_pcm ehci_pci snd_timer ohci_hcd snd ssb ehci_hcd 8250_pci soundcore 
drm_kms_helper pcmcia 8250 pcmcia_core syscopyarea usbcore sysfillrect 
8250_base sysimgblt serial_mctrl_gpio fb_sys_fops usb_common pkcs8_key_parser 
fuse drm drm_panel_orientation_quirks configfs
CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1
NIP:  c02e5558 LR: c07eb3bc CTR: c07f46a8
REGS: e7fe9f50 TRAP:    Not tainted  (5.18.0-gentoo-PMacG4)
MSR:  1032   CR: 44a14824  XER: 2000

GPR00: c07eb3bc eaa1c000 c26baea0 eaa1c0a0 0008  c07eb3bc 
eaa1c010
GPR08: eaa1c0a8 04f3f3f3 f1f1f1f1 c07f4c84 44a14824 0080f7e4 0005 
0010
GPR16: 0025 eaa1c154 eaa1c158 c0dbad64 0020 fd543810 eaa1c0a0 
eaa1c29e
GPR24: c0dbad44 c0db8740 05ff fd543802 eaa1c150 c0c9a3c0 eaa1c0a0 
c0c9a3c0
NIP [c02e5558] kasan_check_range+0xc/0x2b4
LR [c07eb3bc] format_decode+0x80/0x604
Call Trace:
[eaa1c000] [c07eb3bc] format_decode+0x80/0x604 (unreliable)
[eaa1c070] [c07f4dac] vsnprintf+0x128/0x938
[eaa1c110] [c07f5788] sprintf+0xa0/0xc0
[eaa1c180] [c0154c1c] __sprint_symbol.constprop.0+0x170/0x198
[eaa1c230] [c07ee71c] symbol_string+0xf8/0x260
[eaa1c430] [c07f46d0] pointer+0x15c/0x710
[eaa1c4b0] [c07f4fbc] vsnprintf+0x338/0x938
[eaa1c550] [c00e8fa0] vprintk_store+0x2a8/0x678
[eaa1c690] [c00e94e4] vprintk_emit+0x174/0x378
[eaa1c6d0] [c00ea008] _printk+0x9c/0xc0
[eaa1c750] [c000ca94] show_stack+0x21c/0x260
[eaa1c7a0] [c07d0bd4] dump_stack_lvl+0x60/0x90
[eaa1c7c0] [c0009234] __do_IRQ+0x170/0x174
[eaa1c800] [c0009258] do_IRQ+0x20/0x34
[eaa1c820] [c00045b4] HardwareInterrupt_virt+0x108/0x10c
...

An investigation shows that on PPC32, calling dump_stack() requires
more than 1k when KASAN is not selected and a bit more than 2k bytes
when KASAN is selected.

On PPC64 the registers are twice the size of PPC32 registers, so the
need should be approximately twice the need on PPC32.

In the meantime we have THREAD_SIZE which is twice larger on PPC64
than PPC32, and twice larger when KASAN is selected.

So we can easily use the value of THREAD_SIZE to set the limit.

On PPC32, THREAD_SIZE is 8k without KASAN and 16k with KASAN.
On PPC64, THREAD_SIZE is 16k without KASAN.

To be on the safe side, leave 2k on PPC32 without KASAN, 4k with
KASAN, and 4k on PPC64 without KASAN. It means the limit should be
one fourth of THREAD_SIZE.

Reported-by: Erhard Furtner 
Cc: Arnd Bergmann 
Signed-off-by: Christophe Leroy 
---
v2: Use a ratio of THREAD_SIZE
---
 arch/powerpc/kernel/irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 32409cdcbed0..72abf438a22e 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -191,8 +191,8 @@ static inline void check_stack_overflow(unsigned long sp)
 
sp = current_stack_pointer & (THREAD_SIZE - 1);
 
-   /* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < 2048)) {
+   /* check for stack overflow: is there less than 1/4th free? */
+   if (unlikely(sp < THREAD_SIZE / 4)) {
pr_err("do_IRQ: stack overflow: %ld\n", sp);
dump_stack();
}
-- 
2.35.3



Re: [PATCH] powerpc/64: Drop ppc_inst_as_str()

2022-06-03 Thread Segher Boessenkool
Hi!

On Fri, Jun 03, 2022 at 03:03:05PM +1000, Jordan Niethe wrote:
> On Thu, Jun 2, 2022 at 6:49 PM Segher Boessenkool
>  wrote:
> > On Thu, Jun 02, 2022 at 01:01:04PM +1000, Jordan Niethe wrote:
> > > > What about the more fundamental thing?  Have the order of the two halves
> > > > of a prefixed insn as ulong not depend on endianness?  It really is two
> > > > opcodes, and the prefixed one is first, always, even in LE.
> > > The reason would be the value of as ulong is then used to write a
> > > prefixed instruction to
> > > memory with std.
> > > If both endiannesses had the halves the same one of them would store
> > > the suffix in front of the prefix.
> >
> > You cannot do such a (possibly) unaligned access from C though, not
> > without invoking undefined behaviour.  The compiler usually lets you get
> > away with it, but there are no guarantees.  You can make sure you only
> > ever do such an access from assembler code of course.
> 
> Would using inline assembly to do it be ok?

You cannot refer to the instruction words as one 8-byte integer in
memory (with "m" or similar), since such a thing is not valid C.  You
can input the address of it to the inline asm of course, and use a
clobber "memory", or also give it the actual memory as input, but as
bytes or words or such, with something like
  "m"(*(const u32 (*)[2]) ptr)

> > Swapping the two halves of a register costs at most one insn.  It is
> > harmful premature optimisation to make this single cycle advantage
> > override more important consideration (almost everything else :-) )
> 
> I'm not sure I follow. We are not doing this as an optimisation, but
> out of the necessity of writing
> the prefixed instruction to memory in a single instruction so that we
> don't end up with half an
> instruction in the kernel image.

Ah.  That does not necessitate having this different for LE at all
though!  The function that does the patching has to do it as one
atomic memory access (which an ld only is because prefixed insns cannot
cross 64-byte address boundaries btw), but that does not mean the kernel
has to use the packing into one u64 it needs for that anywhere else,
certainly not if that just complicates matters!


Segher


Re: [PATCH] powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader

2022-06-03 Thread Arnd Bergmann
On Fri, Jun 3, 2022 at 2:15 PM Miaoqian Lin  wrote:
>
> of_find_node_by_path() returns remote device nodepointer with
> refcount incremented, we should use of_node_put() on it when done.
> Add missing of_node_put() to avoid refcount leak.
>
> Fixes: 0afacde3df4c ("[POWERPC] spufs: allow isolated mode apps by starting 
> the SPE loader")
> Signed-off-by: Miaoqian Lin 


Acked-by: Arnd Bergmann 


Re: [PATCH 2/6] s390/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-06-03 Thread Christophe Leroy


Le 03/06/2022 à 12:14, Anshuman Khandual a écrit :
> This defines and exports a platform specific custom vm_get_page_prot() via
> subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
> macros can be dropped which are no longer needed.
> 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: linux-s...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Acked-by: Sven Schnelle 
> Acked-by: Alexander Gordeev 
> Signed-off-by: Anshuman Khandual 
> ---
>   arch/s390/Kconfig   |  1 +
>   arch/s390/include/asm/pgtable.h | 17 -
>   arch/s390/mm/mmap.c | 33 +
>   3 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index b17239ae7bd4..cdcf678deab1 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -81,6 +81,7 @@ config S390
>   select ARCH_HAS_SYSCALL_WRAPPER
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
>   select ARCH_HAS_VDSO_DATA
> + select ARCH_HAS_VM_GET_PAGE_PROT
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select ARCH_INLINE_READ_LOCK
>   select ARCH_INLINE_READ_LOCK_BH
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index a397b072a580..c63a05b5368a 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -424,23 +424,6 @@ static inline int is_module_addr(void *addr)
>* implies read permission.
>*/
>/*xwr*/
> -#define __P000   PAGE_NONE
> -#define __P001   PAGE_RO
> -#define __P010   PAGE_RO
> -#define __P011   PAGE_RO
> -#define __P100   PAGE_RX
> -#define __P101   PAGE_RX
> -#define __P110   PAGE_RX
> -#define __P111   PAGE_RX
> -
> -#define __S000   PAGE_NONE
> -#define __S001   PAGE_RO
> -#define __S010   PAGE_RW
> -#define __S011   PAGE_RW
> -#define __S100   PAGE_RX
> -#define __S101   PAGE_RX
> -#define __S110   PAGE_RWX
> -#define __S111   PAGE_RWX
>   
>   /*
>* Segment entry (large page) protection definitions.
> diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
> index d545f5c39f7e..11d75b8d5ec0 100644
> --- a/arch/s390/mm/mmap.c
> +++ b/arch/s390/mm/mmap.c
> @@ -188,3 +188,36 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct 
> rlimit *rlim_stack)
>   mm->get_unmapped_area = arch_get_unmapped_area_topdown;
>   }
>   }
> +
> +pgprot_t vm_get_page_prot(unsigned long vm_flags)
> +{
> + switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) {
> + case VM_NONE:
> + return PAGE_NONE;
> + case VM_READ:
> + case VM_WRITE:
> + case VM_WRITE | VM_READ:
> + return PAGE_RO;
> + case VM_EXEC:
> + case VM_EXEC | VM_READ:
> + case VM_EXEC | VM_WRITE:
> + case VM_EXEC | VM_WRITE | VM_READ:
> + return PAGE_RX;
> + case VM_SHARED:
> + return PAGE_NONE;
> + case VM_SHARED | VM_READ:
> + return PAGE_RO;
> + case VM_SHARED | VM_WRITE:
> + case VM_SHARED | VM_WRITE | VM_READ:
> + return PAGE_RW;
> + case VM_SHARED | VM_EXEC:
> + case VM_SHARED | VM_EXEC | VM_READ:
> + return PAGE_RX;
> + case VM_SHARED | VM_EXEC | VM_WRITE:
> + case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ:
> + return PAGE_RWX;
> + default:
> + BUILD_BUG();
> + }
> +}
> +EXPORT_SYMBOL(vm_get_page_prot);

Wasn't it demonstrated in previous discussions that a switch/case is 
suboptimal compared to a table cell read ?

In order to get rid of the _Sxxx/_Pxxx macros, my preference would go to 
having architectures provide their own protection_map[] table, and keep 
the generic vm_get_page_prot() for the architectures would don't need a 
specific version of it.

This comment applies to all following patches as well.

Re: [PATCH 1/6] mm/mmap: Restrict generic protection_map[] array visibility

2022-06-03 Thread Christophe Leroy


Le 03/06/2022 à 12:14, Anshuman Khandual a écrit :
> Restrict generic protection_map[] array visibility only for platforms which
> do not enable ARCH_HAS_VM_GET_PAGE_PROT. For other platforms that do define
> their own vm_get_page_prot() enabling ARCH_HAS_VM_GET_PAGE_PROT, could have
> their private static protection_map[] still implementing an array look up.
> These private protection_map[] array could do without __PXXX/__SXXX macros,
> making them redundant and dropping them off.
> 
> But platforms which do not define their custom vm_get_page_prot() enabling
> ARCH_HAS_VM_GET_PAGE_PROT, will still have to provide __PXXX/__SXXX macros.
> Although this now provides a method for other willing platforms to drop off
> __PXXX/__SXX macros completely.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: "David S. Miller" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Andrew Morton 
> Cc: x...@kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Anshuman Khandual 
> ---
>   arch/arm64/include/asm/pgtable-prot.h | 18 --
>   arch/arm64/mm/mmap.c  | 21 +
>   arch/powerpc/include/asm/pgtable.h|  2 ++
>   arch/powerpc/mm/book3s64/pgtable.c| 20 
>   arch/sparc/include/asm/pgtable_32.h   |  2 ++
>   arch/sparc/include/asm/pgtable_64.h   | 19 ---
>   arch/sparc/mm/init_64.c   | 20 
>   arch/x86/include/asm/pgtable_types.h  | 19 ---
>   arch/x86/mm/pgprot.c  | 19 +++
>   include/linux/mm.h|  2 ++
>   mm/mmap.c |  2 +-
>   11 files changed, 87 insertions(+), 57 deletions(-)
> 

> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index d564d0ecd4cd..8ed2a80c896e 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -21,6 +21,7 @@ struct mm_struct;
>   #endif /* !CONFIG_PPC_BOOK3S */
>   
>   /* Note due to the way vm flags are laid out, the bits are XWR */
> +#ifndef CONFIG_ARCH_HAS_VM_GET_PAGE_PROT

Ok, so until now it was common to all powerpc platforms. Now you define 
a different way whether it is a PPC_BOOK3S_64 or another platform ? 
What's the point ?

>   #define __P000  PAGE_NONE
>   #define __P001  PAGE_READONLY
>   #define __P010  PAGE_COPY
> @@ -38,6 +39,7 @@ struct mm_struct;
>   #define __S101  PAGE_READONLY_X
>   #define __S110  PAGE_SHARED_X
>   #define __S111  PAGE_SHARED_X
> +#endif
>   
>   #ifndef __ASSEMBLY__
>   
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 7b9966402b25..2cf10a17c0a9 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -551,6 +551,26 @@ unsigned long memremap_compat_align(void)
>   EXPORT_SYMBOL_GPL(memremap_compat_align);
>   #endif
>   
> +/* Note due to the way vm flags are laid out, the bits are XWR */
> +static pgprot_t protection_map[16] __ro_after_init = {

I don't think powerpc modifies that at all. Could be const instead of 
ro_after_init.

> + [VM_NONE]   = PAGE_NONE,
> + [VM_READ]   = PAGE_READONLY,
> + [VM_WRITE]  = PAGE_COPY,
> + [VM_WRITE | VM_READ]= PAGE_COPY,
> + [VM_EXEC]   = PAGE_READONLY_X,
> + [VM_EXEC | VM_READ] = PAGE_READONLY_X,
> + [VM_EXEC | VM_WRITE]= PAGE_COPY_X,
> + [VM_EXEC | VM_WRITE | VM_READ]  = PAGE_COPY_X,
> + [VM_SHARED] = PAGE_NONE,
> + [VM_SHARED | VM_READ]   = PAGE_READONLY,
> + [VM_SHARED | VM_WRITE]  = PAGE_SHARED,
> + [VM_SHARED | VM_WRITE | VM_READ]= PAGE_SHARED,
> + [VM_SHARED | VM_EXEC]   = PAGE_READONLY_X,
> + [VM_SHARED | VM_EXEC | VM_READ] = PAGE_READONLY_X,
> + [VM_SHARED | VM_EXEC | VM_WRITE]= PAGE_SHARED_X,
> + [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]  = PAGE_SHARED_X
> +};

That's nice but it could apply to all powerpc platforms. Why restrict it 
to book3s/64 ?

> +
>   pgprot_t vm_get_page_prot(unsigned long vm_flags)
>   {
>   unsigned long prot = pgprot_val(protection_map[vm_flags &
> diff --git a/arch/sparc/include/asm/pgtable_32.h 
> b/arch/sparc/include/asm/pgtable_32.h
> index 4866625da314..bca98b280fdd 100644
> --- a/arch/sparc/include/asm/pgtable_32.h
> +++ b/arch/sparc/include/asm/pgtable_32.h
> @@ -65,6 +65,7 @@ void paging_init(void);
>   extern

[PATCH v2] powerpc/85xx: Fix reference leak in xes_mpc85xx_setup_arch

2022-06-03 Thread Miaoqian Lin
of_find_node_by_path() returns remote device nodepointer with
refcount incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.

Fixes: 3038acf9091f ("powerpc/85xx: Add platform support for X-ES MPC85xx 
boards")
Signed-off-by: Miaoqian Lin 
---
changes in v2:
- update Fixes tag.
v1 Link: https://lore.kernel.org/all/20220603120907.1-1-linmq...@gmail.com
---
 arch/powerpc/platforms/85xx/xes_mpc85xx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/85xx/xes_mpc85xx.c 
b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
index 5836e4ecb7a0..93f67b430714 100644
--- a/arch/powerpc/platforms/85xx/xes_mpc85xx.c
+++ b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
@@ -130,6 +130,8 @@ static void __init xes_mpc85xx_setup_arch(void)
mpc85xx_smp_init();
 
fsl_pci_assign_primary();
+
+   of_node_put(root);
 }
 
 machine_arch_initcall(xes_mpc8572, mpc85xx_common_publish_devices);
-- 
2.25.1



[PATCH] powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader

2022-06-03 Thread Miaoqian Lin
of_find_node_by_path() returns remote device nodepointer with
refcount incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.

Fixes: 0afacde3df4c ("[POWERPC] spufs: allow isolated mode apps by starting the 
SPE loader")
Signed-off-by: Miaoqian Lin 
---
 arch/powerpc/platforms/cell/spufs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index 34334c32b7f5..320008528edd 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -660,6 +660,7 @@ spufs_init_isolated_loader(void)
return;
 
loader = of_get_property(dn, "loader", &size);
+   of_node_put(dn);
if (!loader)
return;
 
-- 
2.25.1



[PATCH] powerpc/85xx: Fix reference leak in xes_mpc85xx_setup_arch

2022-06-03 Thread Miaoqian Lin
of_find_node_by_path() returns remote device nodepointer with
refcount incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.

Fixes: format:3038acf9091f ("powerpc/85xx: Add platform support for X-ES 
MPC85xx boards")
Signed-off-by: Miaoqian Lin 
---
 arch/powerpc/platforms/85xx/xes_mpc85xx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/85xx/xes_mpc85xx.c 
b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
index 5836e4ecb7a0..93f67b430714 100644
--- a/arch/powerpc/platforms/85xx/xes_mpc85xx.c
+++ b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
@@ -130,6 +130,8 @@ static void __init xes_mpc85xx_setup_arch(void)
mpc85xx_smp_init();
 
fsl_pci_assign_primary();
+
+   of_node_put(root);
 }
 
 machine_arch_initcall(xes_mpc8572, mpc85xx_common_publish_devices);
-- 
2.25.1



Re: [PATCH 06/10] scsi/ibmvscsi_tgt: Replace work tasklet with threaded irq

2022-06-03 Thread Sebastian Andrzej Siewior
On 2022-05-30 16:15:08 [-0700], Davidlohr Bueso wrote:
> diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
> b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> index eee1a24f7e15..fafadb7158a3 100644
> --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> @@ -2948,9 +2948,8 @@ static irqreturn_t ibmvscsis_interrupt(int dummy, void 
> *data)
>   struct scsi_info *vscsi = data;
>  
>   vio_disable_interrupts(vscsi->dma_dev);
> - tasklet_schedule(&vscsi->work_task);
looks good.

> - return IRQ_HANDLED;
> + return IRQ_WAKE_THREAD;
>  }
>  
>  /**
> @@ -3340,7 +3339,7 @@ static void ibmvscsis_handle_crq(unsigned long data)
>   dev_dbg(&vscsi->dev, "handle_crq, don't process: flags 0x%x, 
> state 0x%hx\n",
>   vscsi->flags, vscsi->state);
>   spin_unlock_bh(&vscsi->intr_lock);

So if you move it away from from tasklet you can replace the spin_lock_bh()
with spin_lock() since BH does not matter anymore. Except if there is a
timer because it matters for a timer_list timer which is invoked in
softirq context. This isn't the case except that there is a hrtimer
invoking ibmvscsis_service_wait_q(). This is bad because a hrtimer is
which is invoked in hard-irq context so a BH lock must not be acquired.
lockdep would scream here. You could use HRTIMER_MODE_REL_SOFT which
would make it a BH timer. Then you could keep the BH locking but
actually you want to get rid of it ;)

> - return;
> + goto done;
>   }
>  
>   rc = vscsi->flags & SCHEDULE_DISCONNECT;

Sebastian


Re: [PATCH 4/4] powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING

2022-06-03 Thread shrikanth suresh hegde




On 5/18/22 7:09 PM, Nicholas Piggin wrote:

CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not implement
stolen time accounting. Implement it with the paravirt time
accounting option.

Signed-off-by: Nicholas Piggin 


Tested-by: Shrikanth Hegde 

Patch fails to compile with CONFIG_PARAVIRT=y with below error.
In file included from kernel/sched/core.c:81:
kernel/sched/sched.h:87:11: fatal error: asm/paravirt_api_clock.h: No
such file or directory
   87 | # include 
compilation terminated.

after adding the file, it compiled. Please add the file as well. patch i did.

diff --git a/arch/powerpc/include/asm/paravirt_api_clock.h
b/arch/powerpc/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/powerpc/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 


After successful compilation, it was tested on Power10 Shared LPAR. system has
two LPAR. we will call first one LPAR1 and second one as LPAR2.  Test was
carried out in SMT=1. Similar observation was seen in SMT=8 as well.

LPAR config header from each LPAR is below. LPAR1 is twice as big as LPAR2.
Since Both are sharing the same underlying hardware, work stealing will happen
when both the LPAR's are contending for the same resource.
LPAR1:
type=Shared mode=Uncapped smt=Off lcpu=40 mem=2094637056 kB cpus=40 ent=20.00
LPAR2:
type=Shared mode=Uncapped smt=Off lcpu=20 mem=2083908608 kB cpus=40 ent=10.00

mpstat was used to check for the utilization. stress-ng has been used as the
workload. Few cases are tested. when the both LPAR are idle there is no steal
time. when LPAR1 starts running at 100% which consumes all of the physical
resource, steal time starts to get accounted.  With LPAR1 running at 100% and
LPAR2 starts running, steal time starts increasing. This is as expected. When
the LPAR2 Load is increased further, steal time increases further.

Case 1: 0% LPAR1; 0% LPAR2
CPU%usr   %nice%sys %iowait%irq   %soft  %steal  %guest %gnice   
%idle
all0.000.000.050.000.000.000.000.000.00   
99.95

Case 2: 100% LPAR1; 0% LPAR2
CPU%usr   %nice%sys %iowait%irq   %soft  %steal  %guest %gnice   
%idle
all   97.680.000.000.000.000.002.320.00  0.00
0.00

Case 3: 100% LPAR1; 50% LPAR2
CPU%usr   %nice%sys %iowait%irq   %soft  %steal  %guest %gnice   
%idle
all   86.340.000.100.000.000.03   13.540.00  0.00
0.00

Case 4: 100% LPAR1; 100% LPAR2
CPU%usr   %nice%sys %iowait%irq   %soft  %steal  %guest %gnice   
%idle
all   78.540.000.070.000.000.02   21.360.00  0.00
0.00

Case 5: 50% LPAR1; 100% LPAR2
CPU%usr   %nice%sys %iowait%irq   %soft  %steal  %guest %gnice   
%idle
all   49.370.000.000.000.000.001.170.00  0.00
49.47

Patch is accounting for the steal time and basic tests are holding good.

-- Shrikanth Hegde


[PATCH 6/6] openrisc/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-06-03 Thread Anshuman Khandual
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
macros can be dropped which are no longer needed.

Cc: Jonas Bonn 
Cc: openr...@lists.librecores.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Stafford Horne 
Signed-off-by: Anshuman Khandual 
---
 arch/openrisc/Kconfig   |  1 +
 arch/openrisc/include/asm/pgtable.h | 18 -
 arch/openrisc/mm/init.c | 41 +
 3 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index e814df4c483c..fe0dfb50eb86 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -10,6 +10,7 @@ config OPENRISC
select ARCH_HAS_DMA_SET_UNCACHED
select ARCH_HAS_DMA_CLEAR_UNCACHED
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select ARCH_HAS_VM_GET_PAGE_PROT
select COMMON_CLK
select OF
select OF_EARLY_FLATTREE
diff --git a/arch/openrisc/include/asm/pgtable.h 
b/arch/openrisc/include/asm/pgtable.h
index c3abbf71e09f..dcae8aea132f 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -176,24 +176,6 @@ extern void paging_init(void);
__pgprot(_PAGE_ALL | _PAGE_SRE | _PAGE_SWE \
 | _PAGE_SHARED | _PAGE_DIRTY | _PAGE_EXEC | _PAGE_CI)
 
-#define __P000 PAGE_NONE
-#define __P001 PAGE_READONLY_X
-#define __P010 PAGE_COPY
-#define __P011 PAGE_COPY_X
-#define __P100 PAGE_READONLY
-#define __P101 PAGE_READONLY_X
-#define __P110 PAGE_COPY
-#define __P111 PAGE_COPY_X
-
-#define __S000 PAGE_NONE
-#define __S001 PAGE_READONLY_X
-#define __S010 PAGE_SHARED
-#define __S011 PAGE_SHARED_X
-#define __S100 PAGE_READONLY
-#define __S101 PAGE_READONLY_X
-#define __S110 PAGE_SHARED
-#define __S111 PAGE_SHARED_X
-
 /* zero page used for uninitialized stuff */
 extern unsigned long empty_zero_page[2048];
 #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
index 3a021ab6f1ae..266dc68c32e5 100644
--- a/arch/openrisc/mm/init.c
+++ b/arch/openrisc/mm/init.c
@@ -208,3 +208,44 @@ void __init mem_init(void)
mem_init_done = 1;
return;
 }
+
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) {
+   case VM_NONE:
+   return PAGE_NONE;
+   case VM_READ:
+   return PAGE_READONLY_X;
+   case VM_WRITE:
+   return PAGE_COPY;
+   case VM_WRITE | VM_READ:
+   return PAGE_COPY_X;
+   case VM_EXEC:
+   return PAGE_READONLY;
+   case VM_EXEC | VM_READ:
+   return PAGE_READONLY_X;
+   case VM_EXEC | VM_WRITE:
+   return PAGE_COPY;
+   case VM_EXEC | VM_WRITE | VM_READ:
+   return PAGE_COPY_X;
+   case VM_SHARED:
+   return PAGE_NONE;
+   case VM_SHARED | VM_READ:
+   return PAGE_READONLY_X;
+   case VM_SHARED | VM_WRITE:
+   return PAGE_SHARED;
+   case VM_SHARED | VM_WRITE | VM_READ:
+   return PAGE_SHARED_X;
+   case VM_SHARED | VM_EXEC:
+   return PAGE_READONLY;
+   case VM_SHARED | VM_EXEC | VM_READ:
+   return PAGE_READONLY_X;
+   case VM_SHARED | VM_EXEC | VM_WRITE:
+   return PAGE_SHARED;
+   case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ:
+   return PAGE_SHARED_X;
+   default:
+   BUILD_BUG();
+   }
+}
+EXPORT_SYMBOL(vm_get_page_prot);
-- 
2.25.1



[PATCH 5/6] nios2/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-06-03 Thread Anshuman Khandual
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
macros can be dropped which are no longer needed.

Cc: Dinh Nguyen 
Cc: linux-ker...@vger.kernel.org
Acked-by: Dinh Nguyen 
Signed-off-by: Anshuman Khandual 
---
 arch/nios2/Kconfig   |  1 +
 arch/nios2/include/asm/pgtable.h | 24 
 arch/nios2/mm/init.c | 47 
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index 4167f1eb4cd8..e0459dffd218 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -6,6 +6,7 @@ config NIOS2
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_DMA_SET_UNCACHED
+   select ARCH_HAS_VM_GET_PAGE_PROT
select ARCH_NO_SWAP
select COMMON_CLK
select TIMER_OF
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 262d0609268c..3c9f83c22733 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -34,30 +34,6 @@ struct mm_struct;
((x) ? _PAGE_EXEC : 0) |\
((r) ? _PAGE_READ : 0) |\
((w) ? _PAGE_WRITE : 0))
-/*
- * These are the macros that generic kernel code needs
- * (to populate protection_map[])
- */
-
-/* Remove W bit on private pages for COW support */
-#define __P000 MKP(0, 0, 0)
-#define __P001 MKP(0, 0, 1)
-#define __P010 MKP(0, 0, 0)/* COW */
-#define __P011 MKP(0, 0, 1)/* COW */
-#define __P100 MKP(1, 0, 0)
-#define __P101 MKP(1, 0, 1)
-#define __P110 MKP(1, 0, 0)/* COW */
-#define __P111 MKP(1, 0, 1)/* COW */
-
-/* Shared pages can have exact HW mapping */
-#define __S000 MKP(0, 0, 0)
-#define __S001 MKP(0, 0, 1)
-#define __S010 MKP(0, 1, 0)
-#define __S011 MKP(0, 1, 1)
-#define __S100 MKP(1, 0, 0)
-#define __S101 MKP(1, 0, 1)
-#define __S110 MKP(1, 1, 0)
-#define __S111 MKP(1, 1, 1)
 
 /* Used all over the kernel */
 #define PAGE_KERNEL __pgprot(_PAGE_PRESENT | _PAGE_CACHED | _PAGE_READ | \
diff --git a/arch/nios2/mm/init.c b/arch/nios2/mm/init.c
index 613fcaa5988a..e867f5d85580 100644
--- a/arch/nios2/mm/init.c
+++ b/arch/nios2/mm/init.c
@@ -124,3 +124,50 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 {
return (vma->vm_start == KUSER_BASE) ? "[kuser]" : NULL;
 }
+
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) {
+   /* Remove W bit on private pages for COW support */
+   case VM_NONE:
+   return MKP(0, 0, 0);
+   case VM_READ:
+   return MKP(0, 0, 1);
+   /* COW */
+   case VM_WRITE:
+   return MKP(0, 0, 0);
+   /* COW */
+   case VM_WRITE | VM_READ:
+   return MKP(0, 0, 1);
+   case VM_EXEC:
+   return MKP(1, 0, 0);
+   case VM_EXEC | VM_READ:
+   return MKP(1, 0, 1);
+   /* COW */
+   case VM_EXEC | VM_WRITE:
+   return MKP(1, 0, 0);
+   /* COW */
+   case VM_EXEC | VM_WRITE | VM_READ:
+   return MKP(1, 0, 1);
+   /* Shared pages can have exact HW mapping */
+   case VM_SHARED:
+   return MKP(0, 0, 0);
+   case VM_SHARED | VM_READ:
+   return MKP(0, 0, 1);
+   case VM_SHARED | VM_WRITE:
+   return MKP(0, 1, 0);
+   case VM_SHARED | VM_WRITE | VM_READ:
+   return MKP(0, 1, 1);
+   case VM_SHARED | VM_EXEC:
+   return MKP(1, 0, 0);
+   case VM_SHARED | VM_EXEC | VM_READ:
+   return MKP(1, 0, 1);
+   case VM_SHARED | VM_EXEC | VM_WRITE:
+   return MKP(1, 1, 0);
+   case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ:
+   return MKP(1, 1, 1);
+   default:
+   BUILD_BUG();
+   }
+}
+EXPORT_SYMBOL(vm_get_page_prot);
-- 
2.25.1



[PATCH 4/6] csky/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-06-03 Thread Anshuman Khandual
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
macros can be dropped which are no longer needed.

Cc: Geert Uytterhoeven 
Cc: linux-c...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Guo Ren 
Signed-off-by: Anshuman Khandual 
---
 arch/csky/Kconfig   |  1 +
 arch/csky/include/asm/pgtable.h | 18 --
 arch/csky/mm/init.c | 32 
 3 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 21d72b078eef..588b8a9c68ed 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -6,6 +6,7 @@ config CSKY
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select ARCH_HAS_VM_GET_PAGE_PROT
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && 
$(cc-option,-mbacktrace)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index bbe24511..229a5f4ad7fc 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -77,24 +77,6 @@
 #define MAX_SWAPFILES_CHECK() \
BUILD_BUG_ON(MAX_SWAPFILES_SHIFT != 5)
 
-#define __P000 PAGE_NONE
-#define __P001 PAGE_READ
-#define __P010 PAGE_READ
-#define __P011 PAGE_READ
-#define __P100 PAGE_READ
-#define __P101 PAGE_READ
-#define __P110 PAGE_READ
-#define __P111 PAGE_READ
-
-#define __S000 PAGE_NONE
-#define __S001 PAGE_READ
-#define __S010 PAGE_WRITE
-#define __S011 PAGE_WRITE
-#define __S100 PAGE_READ
-#define __S101 PAGE_READ
-#define __S110 PAGE_WRITE
-#define __S111 PAGE_WRITE
-
 extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 #define ZERO_PAGE(vaddr)   (virt_to_page(empty_zero_page))
 
diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index bf2004aa811a..f9babbed17d4 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -197,3 +197,35 @@ void __init fixaddr_init(void)
vaddr = __fix_to_virt(__end_of_fixed_addresses - 1) & PMD_MASK;
fixrange_init(vaddr, vaddr + PMD_SIZE, swapper_pg_dir);
 }
+
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) {
+   case VM_NONE:
+   return PAGE_NONE;
+   case VM_READ:
+   case VM_WRITE:
+   case VM_WRITE | VM_READ:
+   case VM_EXEC:
+   case VM_EXEC | VM_READ:
+   case VM_EXEC | VM_WRITE:
+   case VM_EXEC | VM_WRITE | VM_READ:
+   return PAGE_READ;
+   case VM_SHARED:
+   return PAGE_NONE;
+   case VM_SHARED | VM_READ:
+   return PAGE_READ;
+   case VM_SHARED | VM_WRITE:
+   case VM_SHARED | VM_WRITE | VM_READ:
+   return PAGE_WRITE;
+   case VM_SHARED | VM_EXEC:
+   case VM_SHARED | VM_EXEC | VM_READ:
+   return PAGE_READ;
+   case VM_SHARED | VM_EXEC | VM_WRITE:
+   case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ:
+   return PAGE_WRITE;
+   default:
+   BUILD_BUG();
+   }
+}
+EXPORT_SYMBOL(vm_get_page_prot);
-- 
2.25.1



[PATCH 3/6] mips/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-06-03 Thread Anshuman Khandual
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
macros can be dropped which are no longer needed.

Cc: Thomas Bogendoerfer 
Cc: linux-m...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Thomas Bogendoerfer 
Signed-off-by: Anshuman Khandual 
---
 arch/mips/Kconfig   |  1 +
 arch/mips/include/asm/pgtable.h | 22 
 arch/mips/mm/cache.c| 60 +++--
 3 files changed, 36 insertions(+), 47 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index db09d45d59ec..d0b7eb11ec81 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -14,6 +14,7 @@ config MIPS
select ARCH_HAS_STRNLEN_USER
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
+   select ARCH_HAS_VM_GET_PAGE_PROT
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_KEEP_MEMBLOCK
select ARCH_SUPPORTS_UPROBES
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 374c6322775d..6caec386ad2f 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -41,28 +41,6 @@ struct vm_area_struct;
  * by reasonable means..
  */
 
-/*
- * Dummy values to fill the table in mmap.c
- * The real values will be generated at runtime
- */
-#define __P000 __pgprot(0)
-#define __P001 __pgprot(0)
-#define __P010 __pgprot(0)
-#define __P011 __pgprot(0)
-#define __P100 __pgprot(0)
-#define __P101 __pgprot(0)
-#define __P110 __pgprot(0)
-#define __P111 __pgprot(0)
-
-#define __S000 __pgprot(0)
-#define __S001 __pgprot(0)
-#define __S010 __pgprot(0)
-#define __S011 __pgprot(0)
-#define __S100 __pgprot(0)
-#define __S101 __pgprot(0)
-#define __S110 __pgprot(0)
-#define __S111 __pgprot(0)
-
 extern unsigned long _page_cachable_default;
 extern void __update_cache(unsigned long address, pte_t pte);
 
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 7be7240f7703..012862004431 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -159,30 +159,6 @@ EXPORT_SYMBOL(_page_cachable_default);
 
 #define PM(p)  __pgprot(_page_cachable_default | (p))
 
-static inline void setup_protection_map(void)
-{
-   protection_map[0]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-   protection_map[1]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
-   protection_map[2]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-   protection_map[3]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
-   protection_map[4]  = PM(_PAGE_PRESENT);
-   protection_map[5]  = PM(_PAGE_PRESENT);
-   protection_map[6]  = PM(_PAGE_PRESENT);
-   protection_map[7]  = PM(_PAGE_PRESENT);
-
-   protection_map[8]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-   protection_map[9]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
-   protection_map[10] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE |
-   _PAGE_NO_READ);
-   protection_map[11] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE);
-   protection_map[12] = PM(_PAGE_PRESENT);
-   protection_map[13] = PM(_PAGE_PRESENT);
-   protection_map[14] = PM(_PAGE_PRESENT | _PAGE_WRITE);
-   protection_map[15] = PM(_PAGE_PRESENT | _PAGE_WRITE);
-}
-
-#undef PM
-
 void cpu_cache_init(void)
 {
if (cpu_has_3k_cache) {
@@ -201,6 +177,40 @@ void cpu_cache_init(void)
 
octeon_cache_init();
}
+}
 
-   setup_protection_map();
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) {
+   case VM_NONE:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
+   case VM_READ:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
+   case VM_WRITE:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
+   case VM_WRITE | VM_READ:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
+   case VM_EXEC:
+   case VM_EXEC | VM_READ:
+   case VM_EXEC | VM_WRITE:
+   case VM_EXEC | VM_WRITE | VM_READ:
+   return PM(_PAGE_PRESENT);
+   case VM_SHARED:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
+   case VM_SHARED | VM_READ:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
+   case VM_SHARED | VM_WRITE:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE | 
_PAGE_NO_READ);
+   case VM_SHARED | VM_WRITE | VM_READ:
+   return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE);
+   case VM_SHARED | VM_EXEC:
+   case VM_SHARED | VM_EXEC | VM_READ:
+   return PM(_PAGE_PRESENT);
+   case VM_SHARED | VM_EXEC | VM_WRITE:
+   case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ:
+   return PM(_PAGE_PRESENT | _PAGE_WRITE);
+   default:
+   BUILD_BUG();
+   }
 }
+EX

[PATCH 2/6] s390/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-06-03 Thread Anshuman Khandual
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX
macros can be dropped which are no longer needed.

Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: linux-s...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Sven Schnelle 
Acked-by: Alexander Gordeev 
Signed-off-by: Anshuman Khandual 
---
 arch/s390/Kconfig   |  1 +
 arch/s390/include/asm/pgtable.h | 17 -
 arch/s390/mm/mmap.c | 33 +
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b17239ae7bd4..cdcf678deab1 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -81,6 +81,7 @@ config S390
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_VDSO_DATA
+   select ARCH_HAS_VM_GET_PAGE_PROT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_INLINE_READ_LOCK
select ARCH_INLINE_READ_LOCK_BH
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index a397b072a580..c63a05b5368a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -424,23 +424,6 @@ static inline int is_module_addr(void *addr)
  * implies read permission.
  */
  /*xwr*/
-#define __P000 PAGE_NONE
-#define __P001 PAGE_RO
-#define __P010 PAGE_RO
-#define __P011 PAGE_RO
-#define __P100 PAGE_RX
-#define __P101 PAGE_RX
-#define __P110 PAGE_RX
-#define __P111 PAGE_RX
-
-#define __S000 PAGE_NONE
-#define __S001 PAGE_RO
-#define __S010 PAGE_RW
-#define __S011 PAGE_RW
-#define __S100 PAGE_RX
-#define __S101 PAGE_RX
-#define __S110 PAGE_RWX
-#define __S111 PAGE_RWX
 
 /*
  * Segment entry (large page) protection definitions.
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index d545f5c39f7e..11d75b8d5ec0 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -188,3 +188,36 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct 
rlimit *rlim_stack)
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
 }
+
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) {
+   case VM_NONE:
+   return PAGE_NONE;
+   case VM_READ:
+   case VM_WRITE:
+   case VM_WRITE | VM_READ:
+   return PAGE_RO;
+   case VM_EXEC:
+   case VM_EXEC | VM_READ:
+   case VM_EXEC | VM_WRITE:
+   case VM_EXEC | VM_WRITE | VM_READ:
+   return PAGE_RX;
+   case VM_SHARED:
+   return PAGE_NONE;
+   case VM_SHARED | VM_READ:
+   return PAGE_RO;
+   case VM_SHARED | VM_WRITE:
+   case VM_SHARED | VM_WRITE | VM_READ:
+   return PAGE_RW;
+   case VM_SHARED | VM_EXEC:
+   case VM_SHARED | VM_EXEC | VM_READ:
+   return PAGE_RX;
+   case VM_SHARED | VM_EXEC | VM_WRITE:
+   case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ:
+   return PAGE_RWX;
+   default:
+   BUILD_BUG();
+   }
+}
+EXPORT_SYMBOL(vm_get_page_prot);
-- 
2.25.1



[PATCH 1/6] mm/mmap: Restrict generic protection_map[] array visibility

2022-06-03 Thread Anshuman Khandual
Restrict generic protection_map[] array visibility only for platforms which
do not enable ARCH_HAS_VM_GET_PAGE_PROT. For other platforms that do define
their own vm_get_page_prot() enabling ARCH_HAS_VM_GET_PAGE_PROT, could have
their private static protection_map[] still implementing an array look up.
These private protection_map[] array could do without __PXXX/__SXXX macros,
making them redundant and dropping them off.

But platforms which do not define their custom vm_get_page_prot() enabling
ARCH_HAS_VM_GET_PAGE_PROT, will still have to provide __PXXX/__SXXX macros.
Although this now provides a method for other willing platforms to drop off
__PXXX/__SXX macros completely.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/include/asm/pgtable-prot.h | 18 --
 arch/arm64/mm/mmap.c  | 21 +
 arch/powerpc/include/asm/pgtable.h|  2 ++
 arch/powerpc/mm/book3s64/pgtable.c| 20 
 arch/sparc/include/asm/pgtable_32.h   |  2 ++
 arch/sparc/include/asm/pgtable_64.h   | 19 ---
 arch/sparc/mm/init_64.c   | 20 
 arch/x86/include/asm/pgtable_types.h  | 19 ---
 arch/x86/mm/pgprot.c  | 19 +++
 include/linux/mm.h|  2 ++
 mm/mmap.c |  2 +-
 11 files changed, 87 insertions(+), 57 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h 
b/arch/arm64/include/asm/pgtable-prot.h
index 62e0ebeed720..9b165117a454 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -89,24 +89,6 @@ extern bool arm64_use_ng_mappings;
 #define PAGE_READONLY_EXEC __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY 
| PTE_NG | PTE_PXN)
 #define PAGE_EXECONLY  __pgprot(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | 
PTE_PXN)
 
-#define __P000  PAGE_NONE
-#define __P001  PAGE_READONLY
-#define __P010  PAGE_READONLY
-#define __P011  PAGE_READONLY
-#define __P100  PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */
-#define __P101  PAGE_READONLY_EXEC
-#define __P110  PAGE_READONLY_EXEC
-#define __P111  PAGE_READONLY_EXEC
-
-#define __S000  PAGE_NONE
-#define __S001  PAGE_READONLY
-#define __S010  PAGE_SHARED
-#define __S011  PAGE_SHARED
-#define __S100  PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */
-#define __S101  PAGE_READONLY_EXEC
-#define __S110  PAGE_SHARED_EXEC
-#define __S111  PAGE_SHARED_EXEC
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_PROT_H */
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 78e9490f748d..8f5b7ce857ed 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -13,6 +13,27 @@
 #include 
 #include 
 
+static pgprot_t protection_map[16] __ro_after_init = {
+   [VM_NONE]   = PAGE_NONE,
+   [VM_READ]   = PAGE_READONLY,
+   [VM_WRITE]  = PAGE_READONLY,
+   [VM_WRITE | VM_READ]= PAGE_READONLY,
+   /* PAGE_EXECONLY if Enhanced PAN */
+   [VM_EXEC]   = PAGE_READONLY_EXEC,
+   [VM_EXEC | VM_READ] = PAGE_READONLY_EXEC,
+   [VM_EXEC | VM_WRITE]= PAGE_READONLY_EXEC,
+   [VM_EXEC | VM_WRITE | VM_READ]  = PAGE_READONLY_EXEC,
+   [VM_SHARED] = PAGE_NONE,
+   [VM_SHARED | VM_READ]   = PAGE_READONLY,
+   [VM_SHARED | VM_WRITE]  = PAGE_SHARED,
+   [VM_SHARED | VM_WRITE | VM_READ]= PAGE_SHARED,
+   /* PAGE_EXECONLY if Enhanced PAN */
+   [VM_SHARED | VM_EXEC]   = PAGE_READONLY_EXEC,
+   [VM_SHARED | VM_EXEC | VM_READ] = PAGE_READONLY_EXEC,
+   [VM_SHARED | VM_EXEC | VM_WRITE]= PAGE_SHARED_EXEC,
+   [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]  = PAGE_SHARED_EXEC
+};
+
 /*
  * You really shouldn't be using read() or write() on /dev/mem.  This might go
  * away in the future.
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index d564d0ecd4cd..8ed2a80c896e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -21,6 +21,7 @@ struct mm_struct;
 #endif /* !CONFIG_PPC_BOOK3S */
 
 /* Note due to the way vm flags are laid out, the bits are XWR */
+#ifndef CONFIG_ARCH_HAS_VM_GET_PAGE_PROT
 #define __P000 PAGE_NONE
 #define __P001 PAGE_READONLY
 #define __P010 PAGE_

[PATCH 0/6] mm/mmap: Enable more platforms with ARCH_HAS_VM_GET_PAGE_PROT

2022-06-03 Thread Anshuman Khandual
>From the last discussion [1], some more platforms (s390, mips, csky, nios2,
openrisc) were willing to enable ARCH_HAS_VM_GET_PAGE_PROT and also provide
custom vm_get_page_prot() via switch case statement implementation without
any objection. All those platform specific patches have already been acked.

This series makes protection_map[] array private on platforms which define
their own vm_get_page_prot() via ARCH_HAS_VM_GET_PAGE_PROT, and also drops
off their __PXXX/__SXXX macros. This also enables new platforms as in this
series, to drop off their __PXXX/__SXXX macros as generic protection_map[]
is no longer visible to them.

[1] 
https://lore.kernel.org/all/1646045273-9343-2-git-send-email-anshuman.khand...@arm.com/

This series applies on current mainline and also has been build tested on
multiple platforms.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Thomas Bogendoerfer 
Cc: Geert Uytterhoeven 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: openr...@lists.librecores.org
Cc: linux-c...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org

Anshuman Khandual (6):
  mm/mmap: Restrict generic protection_map[] array visibility
  s390/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
  mips/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
  csky/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
  nios2/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
  openrisc/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

 arch/arm64/include/asm/pgtable-prot.h | 18 
 arch/arm64/mm/mmap.c  | 21 ++
 arch/csky/Kconfig |  1 +
 arch/csky/include/asm/pgtable.h   | 18 
 arch/csky/mm/init.c   | 32 ++
 arch/mips/Kconfig |  1 +
 arch/mips/include/asm/pgtable.h   | 22 --
 arch/mips/mm/cache.c  | 60 ---
 arch/nios2/Kconfig|  1 +
 arch/nios2/include/asm/pgtable.h  | 24 ---
 arch/nios2/mm/init.c  | 47 +
 arch/openrisc/Kconfig |  1 +
 arch/openrisc/include/asm/pgtable.h   | 18 
 arch/openrisc/mm/init.c   | 41 ++
 arch/powerpc/include/asm/pgtable.h|  2 +
 arch/powerpc/mm/book3s64/pgtable.c| 20 +
 arch/s390/Kconfig |  1 +
 arch/s390/include/asm/pgtable.h   | 17 
 arch/s390/mm/mmap.c   | 33 +++
 arch/sparc/include/asm/pgtable_32.h   |  2 +
 arch/sparc/include/asm/pgtable_64.h   | 19 -
 arch/sparc/mm/init_64.c   | 20 +
 arch/x86/include/asm/pgtable_types.h  | 19 -
 arch/x86/mm/pgprot.c  | 19 +
 include/linux/mm.h|  2 +
 mm/mmap.c |  2 +-
 26 files changed, 280 insertions(+), 181 deletions(-)

-- 
2.25.1



Re: [PATCH 2/6] powerpc: Provide syscall wrapper

2022-06-03 Thread Arnd Bergmann
On Wed, Jun 1, 2022 at 7:48 AM Rohan McLure  wrote:
>
> Syscall wrapper implemented as per s390, x86, arm64, providing the
> option for gprs to be cleared on entry to the kernel, reducing caller
> influence influence on speculation within syscall routine. The wrapper
> is a macro that emits syscall handler implementations with parameters
> passed by stack pointer.
>
> For platforms supporting this syscall wrapper, emit symbols with usual
> in-register parameters (`sys...`) to support calls to syscall handlers
> from within the kernel.

Nice work!

> Syscalls are wrapped on all platforms except Cell processor. SPUs require
> access syscall prototypes which are omitted with ARCH_HAS_SYSCALL_WRAPPER
> enabled.

Right, I think it's ok to leave out the SPU side. In the long run, I
would like to
go back to requiring the prototypes for everything on all architectures, to
enforce type checking, but that's a separate piece of work.

> +/*
> + * For PowerPC specific syscall implementations, wrapper takes exact name and
> + * return type for a given function.
> + */
> +
> +#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
> +#define PPC_SYSCALL_DEFINE(x, type, name, ...)   
>   \
> +   asmlinkage type __powerpc_##name(const struct pt_regs *regs); 
>   \
> +   ALLOW_ERROR_INJECTION(__powerpc_##name, ERRNO);   
>   \
> +   type sys_##name(__MAP(x,__SC_DECL,__VA_ARGS__));  
>   \
> +   static type __se_##name(__MAP(x,__SC_LONG,__VA_ARGS__));  
>   \
> +   static inline type __do_##name(__MAP(x,__SC_DECL,__VA_ARGS__));   
>   \

What is the benefit of having a separate set of macros for this? I think that
adds more complexity than it saves in the end.

> @@ -68,52 +69,63 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t 
> len,
>  #define merge_64(high, low) ((u64)high << 32) | low
>  #endif
>
> -compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
> compat_size_t count,
> -u32 reg6, u32 pos1, u32 pos2)
> +PPC_SYSCALL_DEFINE(6, compat_ssize_t, compat_sys_pread64,
> +  unsigned int, fd,
> +  char __user *, ubuf, compat_size_t, count,
> +  u32, reg6, u32, pos1, u32, pos2)
>  {
> return ksys_pread64(fd, ubuf, count, merge_64(pos1, pos2));
>  }

We now have generalized versions of most of these system calls, as of 5.19-rc1
with the addition of the riscv compat support. I think it would be
best to try removing
the powerpc private versions wherever possible and use the common version,
modifying it further where necessary.

If this doesn't work for some of the syscalls, can you use the
COMPAT_SYSCALL_DEFINE for those in place of PPC_SYSCALL_DEFINE?

 Arnd


Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer

2022-06-03 Thread Laurent Dufour
On 02/06/2022, 19:58:31, Nathan Lynch wrote:
> Laurent Dufour  writes:
>> When a partition is transferred, once it arrives at the destination node,
>> the partition is active but much of its memory must be transferred from the
>> start node.
>>
>> It depends on the activity in the partition, but the more CPU the partition
>> has, the more memory to be transferred is likely to be. This causes latency
>> when accessing pages that need to be transferred, and often, for large
>> partitions, it triggers the NMI watchdog.
> 
> It also triggers warnings from other watchdogs and subsystems that
> have soft latency requirements  - softlockup, RCU, workqueue. The issue
> is more general than the NMI watchdog.

I agree, but, as you can read in the title, this series is focusing on the
NMI watchdog which may have some dangerous side effects.

>> The NMI watchdog causes the CPU stack to dump where it appears to be
>> stuck. In this case, it does not bring much information since it can happen
>> during any memory access of the kernel.
> 
> When the site of a watchdog backtrace shows a thread stuck on a routine
> memory access as opposed to something like a lock acquisition, that is
> actually useful information that shouldn't be discarded. It tells us the
> platform is failing to adequately virtualize partition memory. This
> isn't a benign situation and it's likely to unacceptably affect real
> workloads. The kernel is ideally situated to detect and warn about this.
> 

I agree, but the information provided are most of the time misleading,
pointing to various part in the kernel where the last page fault of a
series generated by the kernel happened. There is no real added value,
since this is well known that the memory transfer is introducing latency
that is detected by the kernel. Furthermore, soft lockups are still
triggered and report as well this latency without any side effect.

>> In addition, the NMI interrupt mechanism is not secure and can generate a
>> dump system in the event that the interruption is taken while
>> MSR[RI]=0.
> 
> This sounds like a general problem with that facility that isn't
> specific to partition migration? Maybe it should be disabled altogether
> until that can be fixed?

We already discuss that with Nick and it sounds that it is not so easy to
fix that. Furthermore, the NMI watchdog is considered as last option for
analyzing a potential dying system. So taking the risk of generating a
crash because of the NMI interrupt looks acceptable. But disabling it
totally because of that is not the right option.

In the LPM's case, the system is dependent on the LPM's latency, it is not
really dying or in a really bad shape, so that risk is too expansive.

Fixing the latency at the source is definitively the best option, and the
PHYP team is already investigating that. But, in the meantime, there is a
way to prevent the system to die because of that side effect by disabling
the NMI watchdog during the memory transfer.

> 
>> Given how often hard lockups are detected when transferring large
>> partitions, it seems best to disable the watchdog NMI until the memory
>> transfer from the start node is complete.
> 
> At this time, I'm far from convinced. Disabling the watchdog is going to
> make the underlying problems in the platform and/or network harder to
> understand.

I was also reluctant, and would like the NMI watchdog to remain active
during LPM. But there is currently no other way to work around the LPM's
latency, and its potential risk of system crash.

I've spent a lot of time analyzing many crashes happening during LPM and
all of them are now pointing to the NMI watchdog issue. Furthermore, on a
system with thousands of CPUs, I saw a system crash because a CPU was not
able to respond in time (1s) to the NMI interrupt and thus generate the panic.

In addition, we now know that a RTAS call, made right after the system is
running again on the arrival side, is taking ages and is most of the time
triggering the NMI watchdog.

There are  ongoing investigations to clarify where and how this latency is
happening. I'm not excluding any other issue in the Linux kernel, but right
now, this looks to be the best option to prevent system crash during LPM.

I'm hoping that the PHYP team will be able to improve that latency. At that
time, this commit can be reverted, but until then, I don't see how we can
do without that workaround.

Laurent.


Re: [PATCH 2/6] powerpc: Provide syscall wrapper

2022-06-03 Thread Christophe Leroy


Le 03/06/2022 à 09:09, Andrew Donnellan a écrit :
> On Fri, 2022-06-03 at 13:24 +1000, Rohan McLure wrote:
>> The implementation of ppc_personality can be immediately reworked to
>> call ksys_personality, but I can’t do the same for sys_old_select for
>> example, which is required to implement ppc_select. As such we emit
>> both
> 
> For ppc_select, I suggest we resurrect
> https://lore.kernel.org/lkml/5811950d-ef14-d416-35e6-d694ef920...@csgroup.eu/T/#u
> and just get rid of the hack.
> 

Not sure I understand, what would you like to resurrect ? You want to 
resurrect the discussion, or revert commit fd69d544b0e7 
("powerpc/syscalls: Use sys_old_select() in ppc_select()") ?

We are talking about a 25 years old ack 

However calling sys_old_select() from ppc_select() is just a tail call 
so I can't see why it would require duplicating the pt_regs.

c0004ddc :
c0004ddc:   28 03 0f ff cmplwi  r3,4095
c0004de0:   41 81 00 08 bgt c0004de8 
c0004de4:   48 15 b9 98 b   c016077c <__se_sys_select>
c0004de8:   48 15 bc c0 b   c0160aa8 <__se_sys_old_select>

Christophe

Re: [PATCH 2/6] powerpc: Provide syscall wrapper

2022-06-03 Thread Andrew Donnellan
On Fri, 2022-06-03 at 13:24 +1000, Rohan McLure wrote:
> The implementation of ppc_personality can be immediately reworked to
> call ksys_personality, but I can’t do the same for sys_old_select for
> example, which is required to implement ppc_select. As such we emit
> both 

For ppc_select, I suggest we resurrect
https://lore.kernel.org/lkml/5811950d-ef14-d416-35e6-d694ef920...@csgroup.eu/T/#u
and just get rid of the hack.


Andrew

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited