ppc64le bzImage and Build_id elf note
Hi Michael, I am working on two packaging issues with Fedora and CKI that I am hoping you can give me some guidance on. 1 - Fedora has always packaged an eu-strip'd vmlinux file for powerpc. The other arches we support used native compressed images. I was looking into using powerpc's zImage (pseries) binary to remove the powerpc workarounds in our rpm spec file. However, the rpmbuild fails because it can't find a build-id with eu-readelf -n zImage. Sure enough the build-id is found in vmlinux and vmlinux.stripped but disappears with vmlinux.stripped.gz. I had hoped arch/powerpc/boot/addnote would stick it back in but it doesn't (I am ignorant of how addnote works). eu-readelf -n data vmlinux: Displaying notes found in: .notes OwnerData sizeDescription GNU 0x0014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: b4c026d72ead7b4316a221cddb7f2b10d75fb313 Linux0x0004 func description data: 00 00 00 00 Linux0x0001 OPEN description data: 00 PowerPC 0x0004 NT_VERSION (version) description data: 01 00 00 00 zImage: Displaying notes found at file offset 0x0158 with length 0x002c: OwnerData sizeDescription PowerPC 0x0018 Unknown note type: (0x1275) description data: ff ff ff ff 02 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff 00 00 40 00 Displaying notes found at file offset 0x0184 with length 0x0044: OwnerData sizeDescription IBM,RPA-Client-[...] 0x0020 Unknown note type: (0x1275) description data: 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 28 00 00 00 01 ff ff ff ff 00 00 00 00 00 00 00 01 Is this something that can be addressed? Or should I/we expect the build-id to never make it into the zImage and just continue with our current vmlinux process? 2 - CKI builds kernels using 'make targz-pkg'. The arches we support (x86_64, s390, aarch64) provide compressed binaries to package using KBUILD_IMAGE or a specific entry in scripts/package/buildtar. As a result, because powerpc doesn't have a KBUILD_IMAGE variable defined, the script builds vmlinx and cp's that to vmlinux-kbuild. The problem with powerpc is that vmlinux for us is huge ( >256MB) and cp'ing that to vmlinux-kbuild occupies > 512MB of /boot and our distro runs out of disk space on that partition. I was hoping to add a patch to arch/powerpc/Makefile that defines KBUILD_IMAGE:=$(boot)/zImage (mimicing arch/s390), which I believe would solve our problem. However, that circles back to our first problem above. Thoughts? Help? Cheers, Don
Re: [PATCH] PCI/AER: Iterate over error counters instead of error
On Fri, Jun 03, 2022 at 10:12:47PM +, Mohamed Khalfella wrote: > Is there any chance for this to land in 5.19? Too late for v5.19, since the merge window will end in a couple days. Remind me again if you don't see it in -next by v5.20-rc5 or so. > On 5/10/22 14:17, Mohamed Khalfella wrote: > > > Thanks for catching this; it definitely looks like a real issue! I > > > guess you're probably seeing junk in the sysfs files? > > > > That is correct. The initial report was seeing junk when reading sysfs > > files. As descibed, this is happening because we reading data past the > > end of the stats counters array. > > > > > > > I think maybe we should populate the currently NULL entries in the > > > string[] arrays and simplify the code here, e.g., > > > > > > static const char *aer_correctable_error_string[] = { > > >"RxErr",/* Bit Position 0 */ > > >"dev_cor_errs_bit[1]", > > > ... > > > > > > if (stats[i]) > > >len += sysfs_emit_at(buf, len, "%s %llu\n", strings_array[i], > > > stats[i]); > > > > Doing it this way will change the output format. In this case we will show > > stats only if their value is greater than zero. The current code shows all > > the > > stats those have names (regardless of their value) plus those have non-zero > > values. > > > > >> @@ -1342,6 +1342,11 @@ static int aer_probe(struct pcie_device *dev) > > >> struct device *device = &dev->device; > > >> struct pci_dev *port = dev->port; > > >> > > >> +BUILD_BUG_ON(ARRAY_SIZE(aer_correctable_error_string) < > > >> + AER_MAX_TYPEOF_COR_ERRS); > > >> +BUILD_BUG_ON(ARRAY_SIZE(aer_uncorrectable_error_string) < > > >> + AER_MAX_TYPEOF_UNCOR_ERRS); > > > > > > And make these check for "!=" instead of "<". > > I am happy to remove these BUILD_BUG_ON() if you think it is a good > idea to do so. I think it's good to enforce correctness there somehow, so let's leave them there unless somebody has a better idea. > > This will require unnecessarily extending stats arrays to have 32 entries > > in order to match names arrays. If you don't feel strogly about changing > > "<" to "!=", I prefer to keep the code as it is.
[PATCH 00/14] mtd: Fix platform remove callbacks to always return 0
Hello, this series prepares to make platform remove callbacks return void. Therefor first update them to always return 0. The rationale is that the Linux device model doesn't handle failures on remove and if a remove callback returns an error, it just emits a quite generic error message and still removes the device. Best regards Uwe Uwe Kleine-König (14): mtd: hyperbus: Make hyperbus_unregister_device() return void mtd: spi-nor: aspeed-smc: Make aspeed_smc_unregister() return void mtd: powernv_flash: Warn about failure to unregister mtd device mtd: st-spi_fsm: Warn about failure to unregister mtd device mtd: lpddr2_nvm: Warn about failure to unregister mtd device mtd: spear_smi: Don't skip cleanup after mtd_device_unregister() failed mtd: spear_smi: Drop if with an always false condition mtd: rawnand: atmel: Warn about failure to unregister mtd device mtd: rawnand: omap2: Suppress error message after WARN in .remove() mtd: rawnand: tegra: Don't skip cleanup after mtd_device_unregister() failed mtd: rawnand: meson: Don't skip cleanup after mtd_device_unregister() failed mtd: rawnand: meson: Drop cleaning platform data in .remove() mtd: physmap: Don't skip cleanup after mtd_device_unregister() failed mtd: physmap: Drop if with an always false condition drivers/mtd/devices/powernv_flash.c | 4 +++- drivers/mtd/devices/spear_smi.c | 10 ++ drivers/mtd/devices/st_spi_fsm.c | 4 +++- drivers/mtd/hyperbus/hbmc-am654.c| 6 +++--- drivers/mtd/hyperbus/hyperbus-core.c | 8 ++-- drivers/mtd/hyperbus/rpc-if.c| 5 +++-- drivers/mtd/lpddr/lpddr2_nvm.c | 4 +++- drivers/mtd/maps/physmap-core.c | 13 +++-- drivers/mtd/nand/raw/atmel/nand-controller.c | 5 - drivers/mtd/nand/raw/meson_nand.c| 16 +++- drivers/mtd/nand/raw/omap2.c | 6 ++ drivers/mtd/nand/raw/tegra_nand.c| 5 + drivers/mtd/spi-nor/controllers/aspeed-smc.c | 8 include/linux/mtd/hyperbus.h | 4 +--- 14 files changed, 37 insertions(+), 61 deletions(-) base-commit: 4b0986a3613c92f4ec1bdc7f60ec66fea135991f -- 2.36.1
[PATCH 03/14] mtd: powernv_flash: Warn about failure to unregister mtd device
mtd_device_unregister() shouldn't fail. Wail loudly if it does anyhow. This matches how other drivers (e.g. nand/raw/nandsim.c) use mtd_device_unregister(). By returning 0 in the platform remove callback a generic error message by the device core is suppressed, nothing else changes. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König --- drivers/mtd/devices/powernv_flash.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index 6950a8764815..36e060386e59 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -270,7 +270,9 @@ static int powernv_flash_release(struct platform_device *pdev) struct powernv_flash *data = dev_get_drvdata(&(pdev->dev)); /* All resources should be freed automatically */ - return mtd_device_unregister(&(data->mtd)); + WARN_ON(mtd_device_unregister(&data->mtd)); + + return 0; } static const struct of_device_id powernv_flash_match[] = { -- 2.36.1
Re: [PATCH v2 1/4] of: constify of_property_check_flags() prop argument
On Wed, 01 Jun 2022 10:17:58 +0200, Clément Léger wrote: > This argument is not modified and thus can be set as const. > > Signed-off-by: Clément Léger > --- > include/linux/of.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > As this looks independent of the rest of the series, applied, thanks!
Re: [PATCH v2 4/4] powerpc/pseries: use of_property_alloc/free() and of_node_alloc()
On Wed, Jun 01, 2022 at 10:18:01AM +0200, Clément Léger wrote: > Use of_property_alloc/free() and of_node_alloc() to create and free > device-tree nodes and properties. > > Signed-off-by: Clément Léger > --- > arch/powerpc/platforms/pseries/dlpar.c| 51 +++ > .../platforms/pseries/hotplug-memory.c| 21 +--- > arch/powerpc/platforms/pseries/reconfig.c | 45 +--- > 3 files changed, 21 insertions(+), 96 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/dlpar.c > b/arch/powerpc/platforms/pseries/dlpar.c > index 498d6efcb5ae..5a04566e98a4 100644 > --- a/arch/powerpc/platforms/pseries/dlpar.c > +++ b/arch/powerpc/platforms/pseries/dlpar.c > @@ -38,61 +38,25 @@ struct cc_workarea { > __be32 prop_offset; > }; > > -void dlpar_free_cc_property(struct property *prop) > -{ > - kfree(prop->name); > - kfree(prop->value); > - kfree(prop); > -} > - > static struct property *dlpar_parse_cc_property(struct cc_workarea *ccwa) > { > - struct property *prop; > - char *name; > - char *value; > - > - prop = kzalloc(sizeof(*prop), GFP_KERNEL); > - if (!prop) > - return NULL; > + int length; > + char *name, *value; > > name = (char *)ccwa + be32_to_cpu(ccwa->name_offset); > - prop->name = kstrdup(name, GFP_KERNEL); > - if (!prop->name) { > - dlpar_free_cc_property(prop); > - return NULL; > - } > - > - prop->length = be32_to_cpu(ccwa->prop_length); > + length = be32_to_cpu(ccwa->prop_length); > value = (char *)ccwa + be32_to_cpu(ccwa->prop_offset); > - prop->value = kmemdup(value, prop->length, GFP_KERNEL); > - if (!prop->value) { > - dlpar_free_cc_property(prop); > - return NULL; > - } > > - return prop; > + return of_property_alloc(name, value, length, GFP_KERNEL); > } > > static struct device_node *dlpar_parse_cc_node(struct cc_workarea *ccwa) > { > - struct device_node *dn; > const char *name; > > - dn = kzalloc(sizeof(*dn), GFP_KERNEL); > - if (!dn) > - return NULL; > - > name = (const char *)ccwa + be32_to_cpu(ccwa->name_offset); > - dn->full_name = kstrdup(name, GFP_KERNEL); > - if (!dn->full_name) { > - kfree(dn); > - return NULL; > - } > > - of_node_set_flag(dn, OF_DYNAMIC); > - of_node_init(dn); > - > - return dn; > + return of_node_alloc(name, GFP_KERNEL); Do you have any need for different flags? I can't really see a need for atomic or dma allocs or ???, so drop it I think. > } > > static void dlpar_free_one_cc_node(struct device_node *dn) > @@ -102,11 +66,10 @@ static void dlpar_free_one_cc_node(struct device_node > *dn) > while (dn->properties) { > prop = dn->properties; > dn->properties = prop->next; > - dlpar_free_cc_property(prop); > + of_property_free(prop); We should be able to just put the node and all the properties already attached will be freed. Looking at the history of this code, it originally did the kref_init much later in dlpar_attach_node(). So there was a window of allocating the node and adding properties where you'd need to manually free everything. Now that the node is referenced from the start, a put should free everything. > } > > - kfree(dn->full_name); > - kfree(dn); > + of_node_put(dn); > } > > void dlpar_free_cc_nodes(struct device_node *dn) > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c > b/arch/powerpc/platforms/pseries/hotplug-memory.c > index 2e3a317722a8..2ddf2a0ba048 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c > @@ -69,33 +69,16 @@ unsigned long pseries_memory_block_size(void) > return memblock_size; > } > > -static void dlpar_free_property(struct property *prop) > -{ > - kfree(prop->name); > - kfree(prop->value); > - kfree(prop); > -} > - > static struct property *dlpar_clone_property(struct property *prop, >u32 prop_size) > { > - struct property *new_prop; > - > - new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL); > + struct property *new_prop = of_property_alloc(prop->name, NULL, > + prop_size, GFP_KERNEL); > if (!new_prop) > return NULL; > > - new_prop->name = kstrdup(prop->name, GFP_KERNEL); > - new_prop->value = kzalloc(prop_size, GFP_KERNEL); > - if (!new_prop->name || !new_prop->value) { > - dlpar_free_property(new_prop); > - return NULL; > - } > - > memcpy(new_prop->value, prop->value, prop->length); > - new_prop->length = prop_size; > > - of_property_set_flag(new_prop, OF_DYNAMIC); > return new_prop; > } > > diff --git a/arch/powerpc/platforms/pseries/r
[PATCH v2] powerpc/irq: Increase stack_overflow detection limit when KASAN is enabled
When KASAN is enabled, as shown by the Oops below, the 2k limit is not enough to allow stack dump after a stack overflow detection when CONFIG_DEBUG_STACKOVERFLOW is selected: do_IRQ: stack overflow: 1984 CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1 Call Trace: Oops: Kernel stack overflow, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: sr_mod cdrom radeon(+) ohci_pci(+) hwmon i2c_algo_bit drm_ttm_helper ttm drm_dp_helper snd_aoa_i2sbus snd_aoa_soundbus snd_pcm ehci_pci snd_timer ohci_hcd snd ssb ehci_hcd 8250_pci soundcore drm_kms_helper pcmcia 8250 pcmcia_core syscopyarea usbcore sysfillrect 8250_base sysimgblt serial_mctrl_gpio fb_sys_fops usb_common pkcs8_key_parser fuse drm drm_panel_orientation_quirks configfs CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1 NIP: c02e5558 LR: c07eb3bc CTR: c07f46a8 REGS: e7fe9f50 TRAP: Not tainted (5.18.0-gentoo-PMacG4) MSR: 1032 CR: 44a14824 XER: 2000 GPR00: c07eb3bc eaa1c000 c26baea0 eaa1c0a0 0008 c07eb3bc eaa1c010 GPR08: eaa1c0a8 04f3f3f3 f1f1f1f1 c07f4c84 44a14824 0080f7e4 0005 0010 GPR16: 0025 eaa1c154 eaa1c158 c0dbad64 0020 fd543810 eaa1c0a0 eaa1c29e GPR24: c0dbad44 c0db8740 05ff fd543802 eaa1c150 c0c9a3c0 eaa1c0a0 c0c9a3c0 NIP [c02e5558] kasan_check_range+0xc/0x2b4 LR [c07eb3bc] format_decode+0x80/0x604 Call Trace: [eaa1c000] [c07eb3bc] format_decode+0x80/0x604 (unreliable) [eaa1c070] [c07f4dac] vsnprintf+0x128/0x938 [eaa1c110] [c07f5788] sprintf+0xa0/0xc0 [eaa1c180] [c0154c1c] __sprint_symbol.constprop.0+0x170/0x198 [eaa1c230] [c07ee71c] symbol_string+0xf8/0x260 [eaa1c430] [c07f46d0] pointer+0x15c/0x710 [eaa1c4b0] [c07f4fbc] vsnprintf+0x338/0x938 [eaa1c550] [c00e8fa0] vprintk_store+0x2a8/0x678 [eaa1c690] [c00e94e4] vprintk_emit+0x174/0x378 [eaa1c6d0] [c00ea008] _printk+0x9c/0xc0 [eaa1c750] [c000ca94] show_stack+0x21c/0x260 [eaa1c7a0] [c07d0bd4] dump_stack_lvl+0x60/0x90 [eaa1c7c0] [c0009234] __do_IRQ+0x170/0x174 [eaa1c800] [c0009258] do_IRQ+0x20/0x34 [eaa1c820] [c00045b4] HardwareInterrupt_virt+0x108/0x10c ... An investigation shows that on PPC32, calling dump_stack() requires more than 1k when KASAN is not selected and a bit more than 2k bytes when KASAN is selected. On PPC64 the registers are twice the size of PPC32 registers, so the need should be approximately twice the need on PPC32. In the meantime we have THREAD_SIZE which is twice larger on PPC64 than PPC32, and twice larger when KASAN is selected. So we can easily use the value of THREAD_SIZE to set the limit. On PPC32, THREAD_SIZE is 8k without KASAN and 16k with KASAN. On PPC64, THREAD_SIZE is 16k without KASAN. To be on the safe side, leave 2k on PPC32 without KASAN, 4k with KASAN, and 4k on PPC64 without KASAN. It means the limit should be one fourth of THREAD_SIZE. Reported-by: Erhard Furtner Cc: Arnd Bergmann Signed-off-by: Christophe Leroy --- v2: Use a ratio of THREAD_SIZE --- arch/powerpc/kernel/irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 32409cdcbed0..72abf438a22e 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -191,8 +191,8 @@ static inline void check_stack_overflow(unsigned long sp) sp = current_stack_pointer & (THREAD_SIZE - 1); - /* check for stack overflow: is there less than 2KB free? */ - if (unlikely(sp < 2048)) { + /* check for stack overflow: is there less than 1/4th free? */ + if (unlikely(sp < THREAD_SIZE / 4)) { pr_err("do_IRQ: stack overflow: %ld\n", sp); dump_stack(); } -- 2.35.3
Re: [PATCH] powerpc/64: Drop ppc_inst_as_str()
Hi! On Fri, Jun 03, 2022 at 03:03:05PM +1000, Jordan Niethe wrote: > On Thu, Jun 2, 2022 at 6:49 PM Segher Boessenkool > wrote: > > On Thu, Jun 02, 2022 at 01:01:04PM +1000, Jordan Niethe wrote: > > > > What about the more fundamental thing? Have the order of the two halves > > > > of a prefixed insn as ulong not depend on endianness? It really is two > > > > opcodes, and the prefixed one is first, always, even in LE. > > > The reason would be the value of as ulong is then used to write a > > > prefixed instruction to > > > memory with std. > > > If both endiannesses had the halves the same one of them would store > > > the suffix in front of the prefix. > > > > You cannot do such a (possibly) unaligned access from C though, not > > without invoking undefined behaviour. The compiler usually lets you get > > away with it, but there are no guarantees. You can make sure you only > > ever do such an access from assembler code of course. > > Would using inline assembly to do it be ok? You cannot refer to the instruction words as one 8-byte integer in memory (with "m" or similar), since such a thing is not valid C. You can input the address of it to the inline asm of course, and use a clobber "memory", or also give it the actual memory as input, but as bytes or words or such, with something like "m"(*(const u32 (*)[2]) ptr) > > Swapping the two halves of a register costs at most one insn. It is > > harmful premature optimisation to make this single cycle advantage > > override more important consideration (almost everything else :-) ) > > I'm not sure I follow. We are not doing this as an optimisation, but > out of the necessity of writing > the prefixed instruction to memory in a single instruction so that we > don't end up with half an > instruction in the kernel image. Ah. That does not necessitate having this different for LE at all though! The function that does the patching has to do it as one atomic memory access (which an ld only is because prefixed insns cannot cross 64-byte address boundaries btw), but that does not mean the kernel has to use the packing into one u64 it needs for that anywhere else, certainly not if that just complicates matters! Segher
Re: [PATCH] powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader
On Fri, Jun 3, 2022 at 2:15 PM Miaoqian Lin wrote: > > of_find_node_by_path() returns remote device nodepointer with > refcount incremented, we should use of_node_put() on it when done. > Add missing of_node_put() to avoid refcount leak. > > Fixes: 0afacde3df4c ("[POWERPC] spufs: allow isolated mode apps by starting > the SPE loader") > Signed-off-by: Miaoqian Lin Acked-by: Arnd Bergmann
Re: [PATCH 2/6] s390/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
Le 03/06/2022 à 12:14, Anshuman Khandual a écrit : > This defines and exports a platform specific custom vm_get_page_prot() via > subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX > macros can be dropped which are no longer needed. > > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: linux-s...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Acked-by: Sven Schnelle > Acked-by: Alexander Gordeev > Signed-off-by: Anshuman Khandual > --- > arch/s390/Kconfig | 1 + > arch/s390/include/asm/pgtable.h | 17 - > arch/s390/mm/mmap.c | 33 + > 3 files changed, 34 insertions(+), 17 deletions(-) > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index b17239ae7bd4..cdcf678deab1 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -81,6 +81,7 @@ config S390 > select ARCH_HAS_SYSCALL_WRAPPER > select ARCH_HAS_UBSAN_SANITIZE_ALL > select ARCH_HAS_VDSO_DATA > + select ARCH_HAS_VM_GET_PAGE_PROT > select ARCH_HAVE_NMI_SAFE_CMPXCHG > select ARCH_INLINE_READ_LOCK > select ARCH_INLINE_READ_LOCK_BH > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h > index a397b072a580..c63a05b5368a 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -424,23 +424,6 @@ static inline int is_module_addr(void *addr) >* implies read permission. >*/ >/*xwr*/ > -#define __P000 PAGE_NONE > -#define __P001 PAGE_RO > -#define __P010 PAGE_RO > -#define __P011 PAGE_RO > -#define __P100 PAGE_RX > -#define __P101 PAGE_RX > -#define __P110 PAGE_RX > -#define __P111 PAGE_RX > - > -#define __S000 PAGE_NONE > -#define __S001 PAGE_RO > -#define __S010 PAGE_RW > -#define __S011 PAGE_RW > -#define __S100 PAGE_RX > -#define __S101 PAGE_RX > -#define __S110 PAGE_RWX > -#define __S111 PAGE_RWX > > /* >* Segment entry (large page) protection definitions. > diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c > index d545f5c39f7e..11d75b8d5ec0 100644 > --- a/arch/s390/mm/mmap.c > +++ b/arch/s390/mm/mmap.c > @@ -188,3 +188,36 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct > rlimit *rlim_stack) > mm->get_unmapped_area = arch_get_unmapped_area_topdown; > } > } > + > +pgprot_t vm_get_page_prot(unsigned long vm_flags) > +{ > + switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) { > + case VM_NONE: > + return PAGE_NONE; > + case VM_READ: > + case VM_WRITE: > + case VM_WRITE | VM_READ: > + return PAGE_RO; > + case VM_EXEC: > + case VM_EXEC | VM_READ: > + case VM_EXEC | VM_WRITE: > + case VM_EXEC | VM_WRITE | VM_READ: > + return PAGE_RX; > + case VM_SHARED: > + return PAGE_NONE; > + case VM_SHARED | VM_READ: > + return PAGE_RO; > + case VM_SHARED | VM_WRITE: > + case VM_SHARED | VM_WRITE | VM_READ: > + return PAGE_RW; > + case VM_SHARED | VM_EXEC: > + case VM_SHARED | VM_EXEC | VM_READ: > + return PAGE_RX; > + case VM_SHARED | VM_EXEC | VM_WRITE: > + case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ: > + return PAGE_RWX; > + default: > + BUILD_BUG(); > + } > +} > +EXPORT_SYMBOL(vm_get_page_prot); Wasn't it demonstrated in previous discussions that a switch/case is suboptimal compared to a table cell read ? In order to get rid of the _Sxxx/_Pxxx macros, my preference would go to having architectures provide their own protection_map[] table, and keep the generic vm_get_page_prot() for the architectures would don't need a specific version of it. This comment applies to all following patches as well.
Re: [PATCH 1/6] mm/mmap: Restrict generic protection_map[] array visibility
Le 03/06/2022 à 12:14, Anshuman Khandual a écrit : > Restrict generic protection_map[] array visibility only for platforms which > do not enable ARCH_HAS_VM_GET_PAGE_PROT. For other platforms that do define > their own vm_get_page_prot() enabling ARCH_HAS_VM_GET_PAGE_PROT, could have > their private static protection_map[] still implementing an array look up. > These private protection_map[] array could do without __PXXX/__SXXX macros, > making them redundant and dropping them off. > > But platforms which do not define their custom vm_get_page_prot() enabling > ARCH_HAS_VM_GET_PAGE_PROT, will still have to provide __PXXX/__SXXX macros. > Although this now provides a method for other willing platforms to drop off > __PXXX/__SXX macros completely. > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Michael Ellerman > Cc: Paul Mackerras > Cc: "David S. Miller" > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Andrew Morton > Cc: x...@kernel.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: sparcli...@vger.kernel.org > Cc: linux...@kvack.org > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Anshuman Khandual > --- > arch/arm64/include/asm/pgtable-prot.h | 18 -- > arch/arm64/mm/mmap.c | 21 + > arch/powerpc/include/asm/pgtable.h| 2 ++ > arch/powerpc/mm/book3s64/pgtable.c| 20 > arch/sparc/include/asm/pgtable_32.h | 2 ++ > arch/sparc/include/asm/pgtable_64.h | 19 --- > arch/sparc/mm/init_64.c | 20 > arch/x86/include/asm/pgtable_types.h | 19 --- > arch/x86/mm/pgprot.c | 19 +++ > include/linux/mm.h| 2 ++ > mm/mmap.c | 2 +- > 11 files changed, 87 insertions(+), 57 deletions(-) > > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index d564d0ecd4cd..8ed2a80c896e 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -21,6 +21,7 @@ struct mm_struct; > #endif /* !CONFIG_PPC_BOOK3S */ > > /* Note due to the way vm flags are laid out, the bits are XWR */ > +#ifndef CONFIG_ARCH_HAS_VM_GET_PAGE_PROT Ok, so until now it was common to all powerpc platforms. Now you define a different way whether it is a PPC_BOOK3S_64 or another platform ? What's the point ? > #define __P000 PAGE_NONE > #define __P001 PAGE_READONLY > #define __P010 PAGE_COPY > @@ -38,6 +39,7 @@ struct mm_struct; > #define __S101 PAGE_READONLY_X > #define __S110 PAGE_SHARED_X > #define __S111 PAGE_SHARED_X > +#endif > > #ifndef __ASSEMBLY__ > > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > b/arch/powerpc/mm/book3s64/pgtable.c > index 7b9966402b25..2cf10a17c0a9 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -551,6 +551,26 @@ unsigned long memremap_compat_align(void) > EXPORT_SYMBOL_GPL(memremap_compat_align); > #endif > > +/* Note due to the way vm flags are laid out, the bits are XWR */ > +static pgprot_t protection_map[16] __ro_after_init = { I don't think powerpc modifies that at all. Could be const instead of ro_after_init. > + [VM_NONE] = PAGE_NONE, > + [VM_READ] = PAGE_READONLY, > + [VM_WRITE] = PAGE_COPY, > + [VM_WRITE | VM_READ]= PAGE_COPY, > + [VM_EXEC] = PAGE_READONLY_X, > + [VM_EXEC | VM_READ] = PAGE_READONLY_X, > + [VM_EXEC | VM_WRITE]= PAGE_COPY_X, > + [VM_EXEC | VM_WRITE | VM_READ] = PAGE_COPY_X, > + [VM_SHARED] = PAGE_NONE, > + [VM_SHARED | VM_READ] = PAGE_READONLY, > + [VM_SHARED | VM_WRITE] = PAGE_SHARED, > + [VM_SHARED | VM_WRITE | VM_READ]= PAGE_SHARED, > + [VM_SHARED | VM_EXEC] = PAGE_READONLY_X, > + [VM_SHARED | VM_EXEC | VM_READ] = PAGE_READONLY_X, > + [VM_SHARED | VM_EXEC | VM_WRITE]= PAGE_SHARED_X, > + [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_X > +}; That's nice but it could apply to all powerpc platforms. Why restrict it to book3s/64 ? > + > pgprot_t vm_get_page_prot(unsigned long vm_flags) > { > unsigned long prot = pgprot_val(protection_map[vm_flags & > diff --git a/arch/sparc/include/asm/pgtable_32.h > b/arch/sparc/include/asm/pgtable_32.h > index 4866625da314..bca98b280fdd 100644 > --- a/arch/sparc/include/asm/pgtable_32.h > +++ b/arch/sparc/include/asm/pgtable_32.h > @@ -65,6 +65,7 @@ void paging_init(void); > extern
[PATCH v2] powerpc/85xx: Fix reference leak in xes_mpc85xx_setup_arch
of_find_node_by_path() returns remote device nodepointer with refcount incremented, we should use of_node_put() on it when done. Add missing of_node_put() to avoid refcount leak. Fixes: 3038acf9091f ("powerpc/85xx: Add platform support for X-ES MPC85xx boards") Signed-off-by: Miaoqian Lin --- changes in v2: - update Fixes tag. v1 Link: https://lore.kernel.org/all/20220603120907.1-1-linmq...@gmail.com --- arch/powerpc/platforms/85xx/xes_mpc85xx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/85xx/xes_mpc85xx.c b/arch/powerpc/platforms/85xx/xes_mpc85xx.c index 5836e4ecb7a0..93f67b430714 100644 --- a/arch/powerpc/platforms/85xx/xes_mpc85xx.c +++ b/arch/powerpc/platforms/85xx/xes_mpc85xx.c @@ -130,6 +130,8 @@ static void __init xes_mpc85xx_setup_arch(void) mpc85xx_smp_init(); fsl_pci_assign_primary(); + + of_node_put(root); } machine_arch_initcall(xes_mpc8572, mpc85xx_common_publish_devices); -- 2.25.1
[PATCH] powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader
of_find_node_by_path() returns remote device nodepointer with refcount incremented, we should use of_node_put() on it when done. Add missing of_node_put() to avoid refcount leak. Fixes: 0afacde3df4c ("[POWERPC] spufs: allow isolated mode apps by starting the SPE loader") Signed-off-by: Miaoqian Lin --- arch/powerpc/platforms/cell/spufs/inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index 34334c32b7f5..320008528edd 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -660,6 +660,7 @@ spufs_init_isolated_loader(void) return; loader = of_get_property(dn, "loader", &size); + of_node_put(dn); if (!loader) return; -- 2.25.1
[PATCH] powerpc/85xx: Fix reference leak in xes_mpc85xx_setup_arch
of_find_node_by_path() returns remote device nodepointer with refcount incremented, we should use of_node_put() on it when done. Add missing of_node_put() to avoid refcount leak. Fixes: format:3038acf9091f ("powerpc/85xx: Add platform support for X-ES MPC85xx boards") Signed-off-by: Miaoqian Lin --- arch/powerpc/platforms/85xx/xes_mpc85xx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/85xx/xes_mpc85xx.c b/arch/powerpc/platforms/85xx/xes_mpc85xx.c index 5836e4ecb7a0..93f67b430714 100644 --- a/arch/powerpc/platforms/85xx/xes_mpc85xx.c +++ b/arch/powerpc/platforms/85xx/xes_mpc85xx.c @@ -130,6 +130,8 @@ static void __init xes_mpc85xx_setup_arch(void) mpc85xx_smp_init(); fsl_pci_assign_primary(); + + of_node_put(root); } machine_arch_initcall(xes_mpc8572, mpc85xx_common_publish_devices); -- 2.25.1
Re: [PATCH 06/10] scsi/ibmvscsi_tgt: Replace work tasklet with threaded irq
On 2022-05-30 16:15:08 [-0700], Davidlohr Bueso wrote: > diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c > b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c > index eee1a24f7e15..fafadb7158a3 100644 > --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c > +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c > @@ -2948,9 +2948,8 @@ static irqreturn_t ibmvscsis_interrupt(int dummy, void > *data) > struct scsi_info *vscsi = data; > > vio_disable_interrupts(vscsi->dma_dev); > - tasklet_schedule(&vscsi->work_task); looks good. > - return IRQ_HANDLED; > + return IRQ_WAKE_THREAD; > } > > /** > @@ -3340,7 +3339,7 @@ static void ibmvscsis_handle_crq(unsigned long data) > dev_dbg(&vscsi->dev, "handle_crq, don't process: flags 0x%x, > state 0x%hx\n", > vscsi->flags, vscsi->state); > spin_unlock_bh(&vscsi->intr_lock); So if you move it away from from tasklet you can replace the spin_lock_bh() with spin_lock() since BH does not matter anymore. Except if there is a timer because it matters for a timer_list timer which is invoked in softirq context. This isn't the case except that there is a hrtimer invoking ibmvscsis_service_wait_q(). This is bad because a hrtimer is which is invoked in hard-irq context so a BH lock must not be acquired. lockdep would scream here. You could use HRTIMER_MODE_REL_SOFT which would make it a BH timer. Then you could keep the BH locking but actually you want to get rid of it ;) > - return; > + goto done; > } > > rc = vscsi->flags & SCHEDULE_DISCONNECT; Sebastian
Re: [PATCH 4/4] powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING
On 5/18/22 7:09 PM, Nicholas Piggin wrote: CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not implement stolen time accounting. Implement it with the paravirt time accounting option. Signed-off-by: Nicholas Piggin Tested-by: Shrikanth Hegde Patch fails to compile with CONFIG_PARAVIRT=y with below error. In file included from kernel/sched/core.c:81: kernel/sched/sched.h:87:11: fatal error: asm/paravirt_api_clock.h: No such file or directory 87 | # include compilation terminated. after adding the file, it compiled. Please add the file as well. patch i did. diff --git a/arch/powerpc/include/asm/paravirt_api_clock.h b/arch/powerpc/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/powerpc/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include After successful compilation, it was tested on Power10 Shared LPAR. system has two LPAR. we will call first one LPAR1 and second one as LPAR2. Test was carried out in SMT=1. Similar observation was seen in SMT=8 as well. LPAR config header from each LPAR is below. LPAR1 is twice as big as LPAR2. Since Both are sharing the same underlying hardware, work stealing will happen when both the LPAR's are contending for the same resource. LPAR1: type=Shared mode=Uncapped smt=Off lcpu=40 mem=2094637056 kB cpus=40 ent=20.00 LPAR2: type=Shared mode=Uncapped smt=Off lcpu=20 mem=2083908608 kB cpus=40 ent=10.00 mpstat was used to check for the utilization. stress-ng has been used as the workload. Few cases are tested. when the both LPAR are idle there is no steal time. when LPAR1 starts running at 100% which consumes all of the physical resource, steal time starts to get accounted. With LPAR1 running at 100% and LPAR2 starts running, steal time starts increasing. This is as expected. When the LPAR2 Load is increased further, steal time increases further. Case 1: 0% LPAR1; 0% LPAR2 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %gnice %idle all0.000.000.050.000.000.000.000.000.00 99.95 Case 2: 100% LPAR1; 0% LPAR2 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %gnice %idle all 97.680.000.000.000.000.002.320.00 0.00 0.00 Case 3: 100% LPAR1; 50% LPAR2 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %gnice %idle all 86.340.000.100.000.000.03 13.540.00 0.00 0.00 Case 4: 100% LPAR1; 100% LPAR2 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %gnice %idle all 78.540.000.070.000.000.02 21.360.00 0.00 0.00 Case 5: 50% LPAR1; 100% LPAR2 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %gnice %idle all 49.370.000.000.000.000.001.170.00 0.00 49.47 Patch is accounting for the steal time and basic tests are holding good. -- Shrikanth Hegde
[PATCH 6/6] openrisc/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
This defines and exports a platform specific custom vm_get_page_prot() via subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX macros can be dropped which are no longer needed. Cc: Jonas Bonn Cc: openr...@lists.librecores.org Cc: linux-ker...@vger.kernel.org Acked-by: Stafford Horne Signed-off-by: Anshuman Khandual --- arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/pgtable.h | 18 - arch/openrisc/mm/init.c | 41 + 3 files changed, 42 insertions(+), 18 deletions(-) diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig index e814df4c483c..fe0dfb50eb86 100644 --- a/arch/openrisc/Kconfig +++ b/arch/openrisc/Kconfig @@ -10,6 +10,7 @@ config OPENRISC select ARCH_HAS_DMA_SET_UNCACHED select ARCH_HAS_DMA_CLEAR_UNCACHED select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select ARCH_HAS_VM_GET_PAGE_PROT select COMMON_CLK select OF select OF_EARLY_FLATTREE diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h index c3abbf71e09f..dcae8aea132f 100644 --- a/arch/openrisc/include/asm/pgtable.h +++ b/arch/openrisc/include/asm/pgtable.h @@ -176,24 +176,6 @@ extern void paging_init(void); __pgprot(_PAGE_ALL | _PAGE_SRE | _PAGE_SWE \ | _PAGE_SHARED | _PAGE_DIRTY | _PAGE_EXEC | _PAGE_CI) -#define __P000 PAGE_NONE -#define __P001 PAGE_READONLY_X -#define __P010 PAGE_COPY -#define __P011 PAGE_COPY_X -#define __P100 PAGE_READONLY -#define __P101 PAGE_READONLY_X -#define __P110 PAGE_COPY -#define __P111 PAGE_COPY_X - -#define __S000 PAGE_NONE -#define __S001 PAGE_READONLY_X -#define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED_X -#define __S100 PAGE_READONLY -#define __S101 PAGE_READONLY_X -#define __S110 PAGE_SHARED -#define __S111 PAGE_SHARED_X - /* zero page used for uninitialized stuff */ extern unsigned long empty_zero_page[2048]; #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page)) diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c index 3a021ab6f1ae..266dc68c32e5 100644 --- a/arch/openrisc/mm/init.c +++ b/arch/openrisc/mm/init.c @@ -208,3 +208,44 @@ void __init mem_init(void) mem_init_done = 1; return; } + +pgprot_t vm_get_page_prot(unsigned long vm_flags) +{ + switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) { + case VM_NONE: + return PAGE_NONE; + case VM_READ: + return PAGE_READONLY_X; + case VM_WRITE: + return PAGE_COPY; + case VM_WRITE | VM_READ: + return PAGE_COPY_X; + case VM_EXEC: + return PAGE_READONLY; + case VM_EXEC | VM_READ: + return PAGE_READONLY_X; + case VM_EXEC | VM_WRITE: + return PAGE_COPY; + case VM_EXEC | VM_WRITE | VM_READ: + return PAGE_COPY_X; + case VM_SHARED: + return PAGE_NONE; + case VM_SHARED | VM_READ: + return PAGE_READONLY_X; + case VM_SHARED | VM_WRITE: + return PAGE_SHARED; + case VM_SHARED | VM_WRITE | VM_READ: + return PAGE_SHARED_X; + case VM_SHARED | VM_EXEC: + return PAGE_READONLY; + case VM_SHARED | VM_EXEC | VM_READ: + return PAGE_READONLY_X; + case VM_SHARED | VM_EXEC | VM_WRITE: + return PAGE_SHARED; + case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ: + return PAGE_SHARED_X; + default: + BUILD_BUG(); + } +} +EXPORT_SYMBOL(vm_get_page_prot); -- 2.25.1
[PATCH 5/6] nios2/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
This defines and exports a platform specific custom vm_get_page_prot() via subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX macros can be dropped which are no longer needed. Cc: Dinh Nguyen Cc: linux-ker...@vger.kernel.org Acked-by: Dinh Nguyen Signed-off-by: Anshuman Khandual --- arch/nios2/Kconfig | 1 + arch/nios2/include/asm/pgtable.h | 24 arch/nios2/mm/init.c | 47 3 files changed, 48 insertions(+), 24 deletions(-) diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig index 4167f1eb4cd8..e0459dffd218 100644 --- a/arch/nios2/Kconfig +++ b/arch/nios2/Kconfig @@ -6,6 +6,7 @@ config NIOS2 select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE select ARCH_HAS_DMA_SET_UNCACHED + select ARCH_HAS_VM_GET_PAGE_PROT select ARCH_NO_SWAP select COMMON_CLK select TIMER_OF diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h index 262d0609268c..3c9f83c22733 100644 --- a/arch/nios2/include/asm/pgtable.h +++ b/arch/nios2/include/asm/pgtable.h @@ -34,30 +34,6 @@ struct mm_struct; ((x) ? _PAGE_EXEC : 0) |\ ((r) ? _PAGE_READ : 0) |\ ((w) ? _PAGE_WRITE : 0)) -/* - * These are the macros that generic kernel code needs - * (to populate protection_map[]) - */ - -/* Remove W bit on private pages for COW support */ -#define __P000 MKP(0, 0, 0) -#define __P001 MKP(0, 0, 1) -#define __P010 MKP(0, 0, 0)/* COW */ -#define __P011 MKP(0, 0, 1)/* COW */ -#define __P100 MKP(1, 0, 0) -#define __P101 MKP(1, 0, 1) -#define __P110 MKP(1, 0, 0)/* COW */ -#define __P111 MKP(1, 0, 1)/* COW */ - -/* Shared pages can have exact HW mapping */ -#define __S000 MKP(0, 0, 0) -#define __S001 MKP(0, 0, 1) -#define __S010 MKP(0, 1, 0) -#define __S011 MKP(0, 1, 1) -#define __S100 MKP(1, 0, 0) -#define __S101 MKP(1, 0, 1) -#define __S110 MKP(1, 1, 0) -#define __S111 MKP(1, 1, 1) /* Used all over the kernel */ #define PAGE_KERNEL __pgprot(_PAGE_PRESENT | _PAGE_CACHED | _PAGE_READ | \ diff --git a/arch/nios2/mm/init.c b/arch/nios2/mm/init.c index 613fcaa5988a..e867f5d85580 100644 --- a/arch/nios2/mm/init.c +++ b/arch/nios2/mm/init.c @@ -124,3 +124,50 @@ const char *arch_vma_name(struct vm_area_struct *vma) { return (vma->vm_start == KUSER_BASE) ? "[kuser]" : NULL; } + +pgprot_t vm_get_page_prot(unsigned long vm_flags) +{ + switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) { + /* Remove W bit on private pages for COW support */ + case VM_NONE: + return MKP(0, 0, 0); + case VM_READ: + return MKP(0, 0, 1); + /* COW */ + case VM_WRITE: + return MKP(0, 0, 0); + /* COW */ + case VM_WRITE | VM_READ: + return MKP(0, 0, 1); + case VM_EXEC: + return MKP(1, 0, 0); + case VM_EXEC | VM_READ: + return MKP(1, 0, 1); + /* COW */ + case VM_EXEC | VM_WRITE: + return MKP(1, 0, 0); + /* COW */ + case VM_EXEC | VM_WRITE | VM_READ: + return MKP(1, 0, 1); + /* Shared pages can have exact HW mapping */ + case VM_SHARED: + return MKP(0, 0, 0); + case VM_SHARED | VM_READ: + return MKP(0, 0, 1); + case VM_SHARED | VM_WRITE: + return MKP(0, 1, 0); + case VM_SHARED | VM_WRITE | VM_READ: + return MKP(0, 1, 1); + case VM_SHARED | VM_EXEC: + return MKP(1, 0, 0); + case VM_SHARED | VM_EXEC | VM_READ: + return MKP(1, 0, 1); + case VM_SHARED | VM_EXEC | VM_WRITE: + return MKP(1, 1, 0); + case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ: + return MKP(1, 1, 1); + default: + BUILD_BUG(); + } +} +EXPORT_SYMBOL(vm_get_page_prot); -- 2.25.1
[PATCH 4/6] csky/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
This defines and exports a platform specific custom vm_get_page_prot() via subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX macros can be dropped which are no longer needed. Cc: Geert Uytterhoeven Cc: linux-c...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Acked-by: Guo Ren Signed-off-by: Anshuman Khandual --- arch/csky/Kconfig | 1 + arch/csky/include/asm/pgtable.h | 18 -- arch/csky/mm/init.c | 32 3 files changed, 33 insertions(+), 18 deletions(-) diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index 21d72b078eef..588b8a9c68ed 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -6,6 +6,7 @@ config CSKY select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select ARCH_HAS_VM_GET_PAGE_PROT select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_QUEUED_RWLOCKS select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && $(cc-option,-mbacktrace) diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index bbe24511..229a5f4ad7fc 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -77,24 +77,6 @@ #define MAX_SWAPFILES_CHECK() \ BUILD_BUG_ON(MAX_SWAPFILES_SHIFT != 5) -#define __P000 PAGE_NONE -#define __P001 PAGE_READ -#define __P010 PAGE_READ -#define __P011 PAGE_READ -#define __P100 PAGE_READ -#define __P101 PAGE_READ -#define __P110 PAGE_READ -#define __P111 PAGE_READ - -#define __S000 PAGE_NONE -#define __S001 PAGE_READ -#define __S010 PAGE_WRITE -#define __S011 PAGE_WRITE -#define __S100 PAGE_READ -#define __S101 PAGE_READ -#define __S110 PAGE_WRITE -#define __S111 PAGE_WRITE - extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]; #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page)) diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c index bf2004aa811a..f9babbed17d4 100644 --- a/arch/csky/mm/init.c +++ b/arch/csky/mm/init.c @@ -197,3 +197,35 @@ void __init fixaddr_init(void) vaddr = __fix_to_virt(__end_of_fixed_addresses - 1) & PMD_MASK; fixrange_init(vaddr, vaddr + PMD_SIZE, swapper_pg_dir); } + +pgprot_t vm_get_page_prot(unsigned long vm_flags) +{ + switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) { + case VM_NONE: + return PAGE_NONE; + case VM_READ: + case VM_WRITE: + case VM_WRITE | VM_READ: + case VM_EXEC: + case VM_EXEC | VM_READ: + case VM_EXEC | VM_WRITE: + case VM_EXEC | VM_WRITE | VM_READ: + return PAGE_READ; + case VM_SHARED: + return PAGE_NONE; + case VM_SHARED | VM_READ: + return PAGE_READ; + case VM_SHARED | VM_WRITE: + case VM_SHARED | VM_WRITE | VM_READ: + return PAGE_WRITE; + case VM_SHARED | VM_EXEC: + case VM_SHARED | VM_EXEC | VM_READ: + return PAGE_READ; + case VM_SHARED | VM_EXEC | VM_WRITE: + case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ: + return PAGE_WRITE; + default: + BUILD_BUG(); + } +} +EXPORT_SYMBOL(vm_get_page_prot); -- 2.25.1
[PATCH 3/6] mips/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
This defines and exports a platform specific custom vm_get_page_prot() via subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX macros can be dropped which are no longer needed. Cc: Thomas Bogendoerfer Cc: linux-m...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Acked-by: Thomas Bogendoerfer Signed-off-by: Anshuman Khandual --- arch/mips/Kconfig | 1 + arch/mips/include/asm/pgtable.h | 22 arch/mips/mm/cache.c| 60 +++-- 3 files changed, 36 insertions(+), 47 deletions(-) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index db09d45d59ec..d0b7eb11ec81 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -14,6 +14,7 @@ config MIPS select ARCH_HAS_STRNLEN_USER select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UBSAN_SANITIZE_ALL + select ARCH_HAS_VM_GET_PAGE_PROT select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_KEEP_MEMBLOCK select ARCH_SUPPORTS_UPROBES diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 374c6322775d..6caec386ad2f 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -41,28 +41,6 @@ struct vm_area_struct; * by reasonable means.. */ -/* - * Dummy values to fill the table in mmap.c - * The real values will be generated at runtime - */ -#define __P000 __pgprot(0) -#define __P001 __pgprot(0) -#define __P010 __pgprot(0) -#define __P011 __pgprot(0) -#define __P100 __pgprot(0) -#define __P101 __pgprot(0) -#define __P110 __pgprot(0) -#define __P111 __pgprot(0) - -#define __S000 __pgprot(0) -#define __S001 __pgprot(0) -#define __S010 __pgprot(0) -#define __S011 __pgprot(0) -#define __S100 __pgprot(0) -#define __S101 __pgprot(0) -#define __S110 __pgprot(0) -#define __S111 __pgprot(0) - extern unsigned long _page_cachable_default; extern void __update_cache(unsigned long address, pte_t pte); diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c index 7be7240f7703..012862004431 100644 --- a/arch/mips/mm/cache.c +++ b/arch/mips/mm/cache.c @@ -159,30 +159,6 @@ EXPORT_SYMBOL(_page_cachable_default); #define PM(p) __pgprot(_page_cachable_default | (p)) -static inline void setup_protection_map(void) -{ - protection_map[0] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); - protection_map[1] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC); - protection_map[2] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); - protection_map[3] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC); - protection_map[4] = PM(_PAGE_PRESENT); - protection_map[5] = PM(_PAGE_PRESENT); - protection_map[6] = PM(_PAGE_PRESENT); - protection_map[7] = PM(_PAGE_PRESENT); - - protection_map[8] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); - protection_map[9] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC); - protection_map[10] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE | - _PAGE_NO_READ); - protection_map[11] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE); - protection_map[12] = PM(_PAGE_PRESENT); - protection_map[13] = PM(_PAGE_PRESENT); - protection_map[14] = PM(_PAGE_PRESENT | _PAGE_WRITE); - protection_map[15] = PM(_PAGE_PRESENT | _PAGE_WRITE); -} - -#undef PM - void cpu_cache_init(void) { if (cpu_has_3k_cache) { @@ -201,6 +177,40 @@ void cpu_cache_init(void) octeon_cache_init(); } +} - setup_protection_map(); +pgprot_t vm_get_page_prot(unsigned long vm_flags) +{ + switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) { + case VM_NONE: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); + case VM_READ: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC); + case VM_WRITE: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); + case VM_WRITE | VM_READ: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC); + case VM_EXEC: + case VM_EXEC | VM_READ: + case VM_EXEC | VM_WRITE: + case VM_EXEC | VM_WRITE | VM_READ: + return PM(_PAGE_PRESENT); + case VM_SHARED: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ); + case VM_SHARED | VM_READ: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC); + case VM_SHARED | VM_WRITE: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE | _PAGE_NO_READ); + case VM_SHARED | VM_WRITE | VM_READ: + return PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE); + case VM_SHARED | VM_EXEC: + case VM_SHARED | VM_EXEC | VM_READ: + return PM(_PAGE_PRESENT); + case VM_SHARED | VM_EXEC | VM_WRITE: + case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ: + return PM(_PAGE_PRESENT | _PAGE_WRITE); + default: + BUILD_BUG(); + } } +EX
[PATCH 2/6] s390/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT
This defines and exports a platform specific custom vm_get_page_prot() via subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX macros can be dropped which are no longer needed. Cc: Heiko Carstens Cc: Vasily Gorbik Cc: linux-s...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Acked-by: Sven Schnelle Acked-by: Alexander Gordeev Signed-off-by: Anshuman Khandual --- arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h | 17 - arch/s390/mm/mmap.c | 33 + 3 files changed, 34 insertions(+), 17 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index b17239ae7bd4..cdcf678deab1 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -81,6 +81,7 @@ config S390 select ARCH_HAS_SYSCALL_WRAPPER select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_VDSO_DATA + select ARCH_HAS_VM_GET_PAGE_PROT select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_INLINE_READ_LOCK select ARCH_INLINE_READ_LOCK_BH diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index a397b072a580..c63a05b5368a 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -424,23 +424,6 @@ static inline int is_module_addr(void *addr) * implies read permission. */ /*xwr*/ -#define __P000 PAGE_NONE -#define __P001 PAGE_RO -#define __P010 PAGE_RO -#define __P011 PAGE_RO -#define __P100 PAGE_RX -#define __P101 PAGE_RX -#define __P110 PAGE_RX -#define __P111 PAGE_RX - -#define __S000 PAGE_NONE -#define __S001 PAGE_RO -#define __S010 PAGE_RW -#define __S011 PAGE_RW -#define __S100 PAGE_RX -#define __S101 PAGE_RX -#define __S110 PAGE_RWX -#define __S111 PAGE_RWX /* * Segment entry (large page) protection definitions. diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index d545f5c39f7e..11d75b8d5ec0 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -188,3 +188,36 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack) mm->get_unmapped_area = arch_get_unmapped_area_topdown; } } + +pgprot_t vm_get_page_prot(unsigned long vm_flags) +{ + switch (vm_flags & (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)) { + case VM_NONE: + return PAGE_NONE; + case VM_READ: + case VM_WRITE: + case VM_WRITE | VM_READ: + return PAGE_RO; + case VM_EXEC: + case VM_EXEC | VM_READ: + case VM_EXEC | VM_WRITE: + case VM_EXEC | VM_WRITE | VM_READ: + return PAGE_RX; + case VM_SHARED: + return PAGE_NONE; + case VM_SHARED | VM_READ: + return PAGE_RO; + case VM_SHARED | VM_WRITE: + case VM_SHARED | VM_WRITE | VM_READ: + return PAGE_RW; + case VM_SHARED | VM_EXEC: + case VM_SHARED | VM_EXEC | VM_READ: + return PAGE_RX; + case VM_SHARED | VM_EXEC | VM_WRITE: + case VM_SHARED | VM_EXEC | VM_WRITE | VM_READ: + return PAGE_RWX; + default: + BUILD_BUG(); + } +} +EXPORT_SYMBOL(vm_get_page_prot); -- 2.25.1
[PATCH 1/6] mm/mmap: Restrict generic protection_map[] array visibility
Restrict generic protection_map[] array visibility only for platforms which do not enable ARCH_HAS_VM_GET_PAGE_PROT. For other platforms that do define their own vm_get_page_prot() enabling ARCH_HAS_VM_GET_PAGE_PROT, could have their private static protection_map[] still implementing an array look up. These private protection_map[] array could do without __PXXX/__SXXX macros, making them redundant and dropping them off. But platforms which do not define their custom vm_get_page_prot() enabling ARCH_HAS_VM_GET_PAGE_PROT, will still have to provide __PXXX/__SXXX macros. Although this now provides a method for other willing platforms to drop off __PXXX/__SXX macros completely. Cc: Catalin Marinas Cc: Will Deacon Cc: Michael Ellerman Cc: Paul Mackerras Cc: "David S. Miller" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Andrew Morton Cc: x...@kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparcli...@vger.kernel.org Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/include/asm/pgtable-prot.h | 18 -- arch/arm64/mm/mmap.c | 21 + arch/powerpc/include/asm/pgtable.h| 2 ++ arch/powerpc/mm/book3s64/pgtable.c| 20 arch/sparc/include/asm/pgtable_32.h | 2 ++ arch/sparc/include/asm/pgtable_64.h | 19 --- arch/sparc/mm/init_64.c | 20 arch/x86/include/asm/pgtable_types.h | 19 --- arch/x86/mm/pgprot.c | 19 +++ include/linux/mm.h| 2 ++ mm/mmap.c | 2 +- 11 files changed, 87 insertions(+), 57 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 62e0ebeed720..9b165117a454 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -89,24 +89,6 @@ extern bool arm64_use_ng_mappings; #define PAGE_READONLY_EXEC __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN) #define PAGE_EXECONLY __pgprot(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN) -#define __P000 PAGE_NONE -#define __P001 PAGE_READONLY -#define __P010 PAGE_READONLY -#define __P011 PAGE_READONLY -#define __P100 PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */ -#define __P101 PAGE_READONLY_EXEC -#define __P110 PAGE_READONLY_EXEC -#define __P111 PAGE_READONLY_EXEC - -#define __S000 PAGE_NONE -#define __S001 PAGE_READONLY -#define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED -#define __S100 PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */ -#define __S101 PAGE_READONLY_EXEC -#define __S110 PAGE_SHARED_EXEC -#define __S111 PAGE_SHARED_EXEC - #endif /* __ASSEMBLY__ */ #endif /* __ASM_PGTABLE_PROT_H */ diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 78e9490f748d..8f5b7ce857ed 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -13,6 +13,27 @@ #include #include +static pgprot_t protection_map[16] __ro_after_init = { + [VM_NONE] = PAGE_NONE, + [VM_READ] = PAGE_READONLY, + [VM_WRITE] = PAGE_READONLY, + [VM_WRITE | VM_READ]= PAGE_READONLY, + /* PAGE_EXECONLY if Enhanced PAN */ + [VM_EXEC] = PAGE_READONLY_EXEC, + [VM_EXEC | VM_READ] = PAGE_READONLY_EXEC, + [VM_EXEC | VM_WRITE]= PAGE_READONLY_EXEC, + [VM_EXEC | VM_WRITE | VM_READ] = PAGE_READONLY_EXEC, + [VM_SHARED] = PAGE_NONE, + [VM_SHARED | VM_READ] = PAGE_READONLY, + [VM_SHARED | VM_WRITE] = PAGE_SHARED, + [VM_SHARED | VM_WRITE | VM_READ]= PAGE_SHARED, + /* PAGE_EXECONLY if Enhanced PAN */ + [VM_SHARED | VM_EXEC] = PAGE_READONLY_EXEC, + [VM_SHARED | VM_EXEC | VM_READ] = PAGE_READONLY_EXEC, + [VM_SHARED | VM_EXEC | VM_WRITE]= PAGE_SHARED_EXEC, + [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_EXEC +}; + /* * You really shouldn't be using read() or write() on /dev/mem. This might go * away in the future. diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index d564d0ecd4cd..8ed2a80c896e 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -21,6 +21,7 @@ struct mm_struct; #endif /* !CONFIG_PPC_BOOK3S */ /* Note due to the way vm flags are laid out, the bits are XWR */ +#ifndef CONFIG_ARCH_HAS_VM_GET_PAGE_PROT #define __P000 PAGE_NONE #define __P001 PAGE_READONLY #define __P010 PAGE_
[PATCH 0/6] mm/mmap: Enable more platforms with ARCH_HAS_VM_GET_PAGE_PROT
>From the last discussion [1], some more platforms (s390, mips, csky, nios2, openrisc) were willing to enable ARCH_HAS_VM_GET_PAGE_PROT and also provide custom vm_get_page_prot() via switch case statement implementation without any objection. All those platform specific patches have already been acked. This series makes protection_map[] array private on platforms which define their own vm_get_page_prot() via ARCH_HAS_VM_GET_PAGE_PROT, and also drops off their __PXXX/__SXXX macros. This also enables new platforms as in this series, to drop off their __PXXX/__SXXX macros as generic protection_map[] is no longer visible to them. [1] https://lore.kernel.org/all/1646045273-9343-2-git-send-email-anshuman.khand...@arm.com/ This series applies on current mainline and also has been build tested on multiple platforms. Cc: Catalin Marinas Cc: Will Deacon Cc: Michael Ellerman Cc: Paul Mackerras Cc: "David S. Miller" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Thomas Bogendoerfer Cc: Geert Uytterhoeven Cc: Dinh Nguyen Cc: Jonas Bonn Cc: Andrew Morton Cc: x...@kernel.org Cc: openr...@lists.librecores.org Cc: linux-c...@vger.kernel.org Cc: linux-m...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparcli...@vger.kernel.org Cc: linux-s...@vger.kernel.org Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Anshuman Khandual (6): mm/mmap: Restrict generic protection_map[] array visibility s390/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT mips/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT csky/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT nios2/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT openrisc/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT arch/arm64/include/asm/pgtable-prot.h | 18 arch/arm64/mm/mmap.c | 21 ++ arch/csky/Kconfig | 1 + arch/csky/include/asm/pgtable.h | 18 arch/csky/mm/init.c | 32 ++ arch/mips/Kconfig | 1 + arch/mips/include/asm/pgtable.h | 22 -- arch/mips/mm/cache.c | 60 --- arch/nios2/Kconfig| 1 + arch/nios2/include/asm/pgtable.h | 24 --- arch/nios2/mm/init.c | 47 + arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/pgtable.h | 18 arch/openrisc/mm/init.c | 41 ++ arch/powerpc/include/asm/pgtable.h| 2 + arch/powerpc/mm/book3s64/pgtable.c| 20 + arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h | 17 arch/s390/mm/mmap.c | 33 +++ arch/sparc/include/asm/pgtable_32.h | 2 + arch/sparc/include/asm/pgtable_64.h | 19 - arch/sparc/mm/init_64.c | 20 + arch/x86/include/asm/pgtable_types.h | 19 - arch/x86/mm/pgprot.c | 19 + include/linux/mm.h| 2 + mm/mmap.c | 2 +- 26 files changed, 280 insertions(+), 181 deletions(-) -- 2.25.1
Re: [PATCH 2/6] powerpc: Provide syscall wrapper
On Wed, Jun 1, 2022 at 7:48 AM Rohan McLure wrote: > > Syscall wrapper implemented as per s390, x86, arm64, providing the > option for gprs to be cleared on entry to the kernel, reducing caller > influence influence on speculation within syscall routine. The wrapper > is a macro that emits syscall handler implementations with parameters > passed by stack pointer. > > For platforms supporting this syscall wrapper, emit symbols with usual > in-register parameters (`sys...`) to support calls to syscall handlers > from within the kernel. Nice work! > Syscalls are wrapped on all platforms except Cell processor. SPUs require > access syscall prototypes which are omitted with ARCH_HAS_SYSCALL_WRAPPER > enabled. Right, I think it's ok to leave out the SPU side. In the long run, I would like to go back to requiring the prototypes for everything on all architectures, to enforce type checking, but that's a separate piece of work. > +/* > + * For PowerPC specific syscall implementations, wrapper takes exact name and > + * return type for a given function. > + */ > + > +#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER > +#define PPC_SYSCALL_DEFINE(x, type, name, ...) > \ > + asmlinkage type __powerpc_##name(const struct pt_regs *regs); > \ > + ALLOW_ERROR_INJECTION(__powerpc_##name, ERRNO); > \ > + type sys_##name(__MAP(x,__SC_DECL,__VA_ARGS__)); > \ > + static type __se_##name(__MAP(x,__SC_LONG,__VA_ARGS__)); > \ > + static inline type __do_##name(__MAP(x,__SC_DECL,__VA_ARGS__)); > \ What is the benefit of having a separate set of macros for this? I think that adds more complexity than it saves in the end. > @@ -68,52 +69,63 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t > len, > #define merge_64(high, low) ((u64)high << 32) | low > #endif > > -compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, > compat_size_t count, > -u32 reg6, u32 pos1, u32 pos2) > +PPC_SYSCALL_DEFINE(6, compat_ssize_t, compat_sys_pread64, > + unsigned int, fd, > + char __user *, ubuf, compat_size_t, count, > + u32, reg6, u32, pos1, u32, pos2) > { > return ksys_pread64(fd, ubuf, count, merge_64(pos1, pos2)); > } We now have generalized versions of most of these system calls, as of 5.19-rc1 with the addition of the riscv compat support. I think it would be best to try removing the powerpc private versions wherever possible and use the common version, modifying it further where necessary. If this doesn't work for some of the syscalls, can you use the COMPAT_SYSCALL_DEFINE for those in place of PPC_SYSCALL_DEFINE? Arnd
Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer
On 02/06/2022, 19:58:31, Nathan Lynch wrote: > Laurent Dufour writes: >> When a partition is transferred, once it arrives at the destination node, >> the partition is active but much of its memory must be transferred from the >> start node. >> >> It depends on the activity in the partition, but the more CPU the partition >> has, the more memory to be transferred is likely to be. This causes latency >> when accessing pages that need to be transferred, and often, for large >> partitions, it triggers the NMI watchdog. > > It also triggers warnings from other watchdogs and subsystems that > have soft latency requirements - softlockup, RCU, workqueue. The issue > is more general than the NMI watchdog. I agree, but, as you can read in the title, this series is focusing on the NMI watchdog which may have some dangerous side effects. >> The NMI watchdog causes the CPU stack to dump where it appears to be >> stuck. In this case, it does not bring much information since it can happen >> during any memory access of the kernel. > > When the site of a watchdog backtrace shows a thread stuck on a routine > memory access as opposed to something like a lock acquisition, that is > actually useful information that shouldn't be discarded. It tells us the > platform is failing to adequately virtualize partition memory. This > isn't a benign situation and it's likely to unacceptably affect real > workloads. The kernel is ideally situated to detect and warn about this. > I agree, but the information provided are most of the time misleading, pointing to various part in the kernel where the last page fault of a series generated by the kernel happened. There is no real added value, since this is well known that the memory transfer is introducing latency that is detected by the kernel. Furthermore, soft lockups are still triggered and report as well this latency without any side effect. >> In addition, the NMI interrupt mechanism is not secure and can generate a >> dump system in the event that the interruption is taken while >> MSR[RI]=0. > > This sounds like a general problem with that facility that isn't > specific to partition migration? Maybe it should be disabled altogether > until that can be fixed? We already discuss that with Nick and it sounds that it is not so easy to fix that. Furthermore, the NMI watchdog is considered as last option for analyzing a potential dying system. So taking the risk of generating a crash because of the NMI interrupt looks acceptable. But disabling it totally because of that is not the right option. In the LPM's case, the system is dependent on the LPM's latency, it is not really dying or in a really bad shape, so that risk is too expansive. Fixing the latency at the source is definitively the best option, and the PHYP team is already investigating that. But, in the meantime, there is a way to prevent the system to die because of that side effect by disabling the NMI watchdog during the memory transfer. > >> Given how often hard lockups are detected when transferring large >> partitions, it seems best to disable the watchdog NMI until the memory >> transfer from the start node is complete. > > At this time, I'm far from convinced. Disabling the watchdog is going to > make the underlying problems in the platform and/or network harder to > understand. I was also reluctant, and would like the NMI watchdog to remain active during LPM. But there is currently no other way to work around the LPM's latency, and its potential risk of system crash. I've spent a lot of time analyzing many crashes happening during LPM and all of them are now pointing to the NMI watchdog issue. Furthermore, on a system with thousands of CPUs, I saw a system crash because a CPU was not able to respond in time (1s) to the NMI interrupt and thus generate the panic. In addition, we now know that a RTAS call, made right after the system is running again on the arrival side, is taking ages and is most of the time triggering the NMI watchdog. There are ongoing investigations to clarify where and how this latency is happening. I'm not excluding any other issue in the Linux kernel, but right now, this looks to be the best option to prevent system crash during LPM. I'm hoping that the PHYP team will be able to improve that latency. At that time, this commit can be reverted, but until then, I don't see how we can do without that workaround. Laurent.
Re: [PATCH 2/6] powerpc: Provide syscall wrapper
Le 03/06/2022 à 09:09, Andrew Donnellan a écrit : > On Fri, 2022-06-03 at 13:24 +1000, Rohan McLure wrote: >> The implementation of ppc_personality can be immediately reworked to >> call ksys_personality, but I can’t do the same for sys_old_select for >> example, which is required to implement ppc_select. As such we emit >> both > > For ppc_select, I suggest we resurrect > https://lore.kernel.org/lkml/5811950d-ef14-d416-35e6-d694ef920...@csgroup.eu/T/#u > and just get rid of the hack. > Not sure I understand, what would you like to resurrect ? You want to resurrect the discussion, or revert commit fd69d544b0e7 ("powerpc/syscalls: Use sys_old_select() in ppc_select()") ? We are talking about a 25 years old ack However calling sys_old_select() from ppc_select() is just a tail call so I can't see why it would require duplicating the pt_regs. c0004ddc : c0004ddc: 28 03 0f ff cmplwi r3,4095 c0004de0: 41 81 00 08 bgt c0004de8 c0004de4: 48 15 b9 98 b c016077c <__se_sys_select> c0004de8: 48 15 bc c0 b c0160aa8 <__se_sys_old_select> Christophe
Re: [PATCH 2/6] powerpc: Provide syscall wrapper
On Fri, 2022-06-03 at 13:24 +1000, Rohan McLure wrote: > The implementation of ppc_personality can be immediately reworked to > call ksys_personality, but I can’t do the same for sys_old_select for > example, which is required to implement ppc_select. As such we emit > both For ppc_select, I suggest we resurrect https://lore.kernel.org/lkml/5811950d-ef14-d416-35e6-d694ef920...@csgroup.eu/T/#u and just get rid of the hack. Andrew -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited