Re: [PATCH] powerpc/vdso32: Drop -mabi=elfv1 for 32 bit objects
Le 10/01/2019 à 02:42, Joel Stanley a écrit : From: Daniel Axtens All 64-bit objects need to specify the flag to be compiled correctly, we just don't need it for 32-bit objects. GCC just ignored it, but clang doesn't. Link: https://github.com/ClangBuiltLinux/linux/issues/240 Signed-off-by: Daniel Axtens Signed-off-by: Joel Stanley --- arch/powerpc/kernel/vdso32/Makefile | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile index 50112d4473bb..6bd41756e0c7 100644 --- a/arch/powerpc/kernel/vdso32/Makefile +++ b/arch/powerpc/kernel/vdso32/Makefile @@ -34,6 +34,20 @@ obj-y += vdso32_wrapper.o extra-y += vdso32.lds CPPFLAGS_vdso32.lds += -P -C -Upowerpc +# clang refuses to accept -mabi=elfv1 for when using the +# 64-bit target in 32-bit mode +ifdef CONFIG_CC_IS_CLANG If -mabi=elfv1 is unneeded even for GCC, why depend on CLANG ? +ifdef CONFIG_PPC64 +AFLAGS_REMOVE_getcpu.o += -mabi=elfv1 +endif Why only this one is inside the ifdef ? powerpc Makefile only set -mabi=elfv1 when CONFIG_PPC64 is set, so all objects should be handled the same way. And would it harm just doing it all the time, regardless of CONFIG_PPC64 ? Christophe +AFLAGS_REMOVE_sigtramp.o += -mabi=elfv1 +AFLAGS_REMOVE_gettimeofday.o += -mabi=elfv1 +AFLAGS_REMOVE_datapage.o += -mabi=elfv1 +AFLAGS_REMOVE_cacheflush.o += -mabi=elfv1 +AFLAGS_REMOVE_note.o += -mabi=elfv1 +endif + + # Force dependency (incbin is bad) $(obj)/vdso32_wrapper.o : $(obj)/vdso32.so
Re: [PATCH v2 18/34] dt-bindings: arm: Convert FSL board/soc bindings to json-schema
On Sat, Dec 08, 2018 at 09:58:37AM +0800, Shawn Guo wrote: > On Thu, Dec 06, 2018 at 05:33:13PM -0600, Rob Herring wrote: > > On Wed, Dec 5, 2018 at 8:32 PM Shawn Guo wrote: > > > > > > On Mon, Dec 03, 2018 at 03:32:07PM -0600, Rob Herring wrote: > > > > Convert Freescale SoC bindings to DT schema format using json-schema. > > > > > > > > Cc: Shawn Guo > > > > Cc: Mark Rutland > > > > Cc: devicet...@vger.kernel.org > > > > Signed-off-by: Rob Herring > > > > --- > > > > .../devicetree/bindings/arm/armadeus.txt | 6 - > > > > Documentation/devicetree/bindings/arm/bhf.txt | 6 - > > > > .../bindings/arm/compulab-boards.txt | 25 -- > > > > Documentation/devicetree/bindings/arm/fsl.txt | 229 -- > > > > .../devicetree/bindings/arm/fsl.yaml | 214 > > > > > > Rob, > > > > > > I do have any changes on bindings/arm/fsl.txt queued for 4.21 on my > > > tree, so please send it via your tree. > > > > What about: > > > > c386f362957b dt-bindings: Add compatible string for LS1028A-QDS > > 3671cd57de06 dt-bindings: ls1012a: Add FRWY-LS1012A device tree binding > > Ah, sorry, I only checked on imx/dt branch and forgot imx/dt64. I will > drop the changes on fsl.txt and update fsl.yaml after it hits mainline. What happened to this? It seems the patch did not hit v5.0-rc1. Shawn
[RFC PATCH kernel] powerpc/stack_protector: Fix external modules building
c3ff2a519 "powerpc/32: add stack protector support" addes stack protector support so now powerpc's "prepare" target depends on prepare0 (via stack_protector_prepare target). It works fine until we try build an external module where it fails with: Run: 'make -j128 SYSSRC=/home/aik/p/kernel SYSOUT=/home/aik/pbuild/kernel-le-pseries/ ARCH=powerpc' make[1]: Entering directory '/home/aik/p/kernel' make[2]: Entering directory '/home/aik/pbuild/kernel-le-pseries' make[2]: *** No rule to make target 'prepare0', needed by 'stack_protector_prepare'. Stop. The reason for that is that the main Linux Makefile defines "prepare0" only if KBUILD_EXTMOD=="". This hacks powerpc's Makefile to make external modules build again. Fixes: c3ff2a519 "powerpc/32: add stack protector support" Signed-off-by: Alexey Kardashevskiy --- It has been suggested that there is a better way of fixing this hence RFC. --- arch/powerpc/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 488c9ed..0492f62 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -419,7 +419,11 @@ archheaders: ifdef CONFIG_STACKPROTECTOR prepare: stack_protector_prepare +ifeq ($(KBUILD_EXTMOD),) stack_protector_prepare: prepare0 +else +stack_protector_prepare: +endif ifdef CONFIG_PPC64 $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if ($$2 == "PACA_CANARY") print $$3;}' include/generated/asm-offsets.h)) else -- 2.17.1
[PATCH] ibmvscsi: use GFP_KERNEL with dma_alloc_coherent in initialize_event_pool
During driver probe we allocate a dma region for our event pool. Currently, zero is passed for the gfp_flags parameter. Driver probe callbacks run in process context and we hold no locks so we can sleep here if necessary. Fix by passing GFP_KERNEL explicitly to dma_alloc_coherent(). Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index cb8535e..10d5e77 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -465,7 +465,7 @@ static int initialize_event_pool(struct event_pool *pool, pool->iu_storage = dma_alloc_coherent(hostdata->dev, pool->size * sizeof(*pool->iu_storage), - >iu_token, 0); + >iu_token, GFP_KERNEL); if (!pool->iu_storage) { kfree(pool->events); return -ENOMEM; -- 1.8.3.1
[PATCH] ibmvscsi: use GFP_ATOMIC with dma_alloc_coherent in map_sg_data
While mapping DMA for scatter list when a scsi command is queued the existing call to dma_alloc_coherent() in our map_sg_data() function passes zero for the gfp_flags parameter. We are most definitly in atomic context at this point as queue_command() is called in softirq context and further we have a spinlock holding the scsi host lock. Fix this by passing GFP_ATOMIC to dma_alloc_coherent() to prevent any sort of sleeping in atomic context deadlock. Fixes: 4dddbc26c389 ("[SCSI] ibmvscsi: handle large scatter/gather lists") Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 1135e74..cb8535e 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -731,7 +731,7 @@ static int map_sg_data(struct scsi_cmnd *cmd, evt_struct->ext_list = (struct srp_direct_buf *) dma_alloc_coherent(dev, SG_ALL * sizeof(struct srp_direct_buf), - _struct->ext_list_token, 0); + _struct->ext_list_token, GFP_ATOMIC); if (!evt_struct->ext_list) { if (!firmware_has_feature(FW_FEATURE_CMO)) sdev_printk(KERN_ERR, cmd->device, -- 1.8.3.1
[PATCH] powerpc/vdso32: Drop -mabi=elfv1 for 32 bit objects
From: Daniel Axtens All 64-bit objects need to specify the flag to be compiled correctly, we just don't need it for 32-bit objects. GCC just ignored it, but clang doesn't. Link: https://github.com/ClangBuiltLinux/linux/issues/240 Signed-off-by: Daniel Axtens Signed-off-by: Joel Stanley --- arch/powerpc/kernel/vdso32/Makefile | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile index 50112d4473bb..6bd41756e0c7 100644 --- a/arch/powerpc/kernel/vdso32/Makefile +++ b/arch/powerpc/kernel/vdso32/Makefile @@ -34,6 +34,20 @@ obj-y += vdso32_wrapper.o extra-y += vdso32.lds CPPFLAGS_vdso32.lds += -P -C -Upowerpc +# clang refuses to accept -mabi=elfv1 for when using the +# 64-bit target in 32-bit mode +ifdef CONFIG_CC_IS_CLANG +ifdef CONFIG_PPC64 +AFLAGS_REMOVE_getcpu.o += -mabi=elfv1 +endif +AFLAGS_REMOVE_sigtramp.o += -mabi=elfv1 +AFLAGS_REMOVE_gettimeofday.o += -mabi=elfv1 +AFLAGS_REMOVE_datapage.o += -mabi=elfv1 +AFLAGS_REMOVE_cacheflush.o += -mabi=elfv1 +AFLAGS_REMOVE_note.o += -mabi=elfv1 +endif + + # Force dependency (incbin is bad) $(obj)/vdso32_wrapper.o : $(obj)/vdso32.so -- 2.19.1
Re: [PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output
Christophe Leroy writes: > Le 08/01/2019 à 13:21, Christophe Leroy a écrit : >> Le 08/01/2019 à 13:05, Michael Ellerman a écrit : >>> The page size the kernel is built with is useful info when debugging a >>> crash, so add it to the output in __die(). >>> >>> Result looks like eg: >>> >>> kernel BUG at drivers/misc/lkdtm/bugs.c:63! >>> Oops: Exception in kernel mode, sig: 5 [#1] >>> LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries >>> Modules linked in: vmx_crypto kvm binfmt_misc ip_tables >>> >>> Signed-off-by: Michael Ellerman >>> --- >>> arch/powerpc/kernel/traps.c | 12 >>> 1 file changed, 12 insertions(+) >>> >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c >>> index 431a86d3f772..fc972e4eee5f 100644 >>> --- a/arch/powerpc/kernel/traps.c >>> +++ b/arch/powerpc/kernel/traps.c >>> @@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs >>> *regs, long err) >>> else >>> seq_buf_puts(, "BE "); >>> + seq_buf_puts(, "PAGE_SIZE="); >>> + if (IS_ENABLED(CONFIG_PPC_4K_PAGES)) >>> + seq_buf_puts(, "4K "); >>> + else if (IS_ENABLED(CONFIG_PPC_16K_PAGES)) >>> + seq_buf_puts(, "16K "); >>> + else if (IS_ENABLED(CONFIG_PPC_64K_PAGES)) >>> + seq_buf_puts(, "64K "); >>> + else if (IS_ENABLED(CONFIG_PPC_256K_PAGES)) >>> + seq_buf_puts(, "256K "); >> >> Can't we build all the above at once using PAGE_SHIFT ? >> >> Something like (untested): >> >> "%dK ", 1 << (PAGE_SHIFT - 10) > > Or even simplier: > > "%dK ", PAGE_SIZE / 1024 Yep, good point. Clearly I have forgotten how to program over the break (if I ever knew). cheers
Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
On 10/1/19 2:13 am, Frederic Barrat wrote: With a recent change around IOMMU group, a system with an opencapi adapter is no longer booting and we get a kernel oops: BUG: Kernel NULL pointer dereference at 0x0028 Faulting instruction address: 0xc00aa38c Oops: Kernel access of bad area, sig: 7 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12 NIP: c00aa38c LR: c00a6608 CTR: c0097480 REGS: c5783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-1-g3bd6 MSR: 92009033 CR: 28000228 XER: 20 CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0 GPR00: c00a6608 c5783990 c1036100 c007bf761860 GPR04: c5783834 GPR08: 69626d2c6e707500 92001003 GPR12: c007bfff8300 c0010450 GPR16: c0ced938 0100 c0ced948 000a GPR20: 000bfffe c0ced9a8 0200 c0ced978 GPR24: 006080c0 c00716d09828 c0002e6fd000 GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860 NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0 LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660 Call Trace: [c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660 [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4 [c5783c10] [c0010054] do_one_initcall+0x64/0x264 [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468 [c5783db0] [c0010474] kernel_init+0x2c/0x148 [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68 An opencapi device is using a device PE, so the current code breaks because pe->pbus is not defined. More generally, there's no need to define an IOMMU group for opencapi, as the device sends real addresses directly (admittedly, the virtualization story is yet to be written). So let's fix it by skipping the IOMMU group setup for opencapi PHBs. Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") Signed-off-by: Frederic Barrat Reviewed-by: Andrew Donnellan --- arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 1d6406a051f1..7db3119f8a5b 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void) list_for_each_entry(hose, _list, list_node) { phb = hose->private_data; - if (phb->type == PNV_PHB_NPU_NVLINK) + if (phb->type == PNV_PHB_NPU_NVLINK || + phb->type == PNV_PHB_NPU_OCAPI) continue; list_for_each_entry(pe, >ioda.pe_list, list) { -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
[PATCH] powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
Commit 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer dereference") moved the loading of r6 earlier in the code. As some functions are called inbetween, r6 needs to be loaded again with the address of swapper_pg_dir in order to set PTE pointers for the Abatron BDI. Fixes: 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer dereference") Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index aea5f367e4fe..ab0e6f1c98b0 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -885,11 +885,12 @@ start_here: /* set up the PTE pointers for the Abatron bdiGDB. */ - tovirt(r6,r6) lis r5, abatron_pteptrs@h ori r5, r5, abatron_pteptrs@l stw r5, 0xf0(0) /* Must match your Abatron config file */ tophys(r5,r5) + lis r6, swapper_pg_dir@h + ori r6, r6, swapper_pg_dir@l stw r6, 0(r5) /* Now turn on the MMU for real! */ -- 2.13.3
Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference
On Wed, Jan 9, 2019 at 7:16 AM Kees Cook wrote: > > On Tue, Jan 8, 2019 at 10:31 PM Christophe Leroy > wrote: > > > > > > > > Le 09/01/2019 à 02:14, Kees Cook a écrit : > > > On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy > > > wrote: > > >> > > >> Introduce lkdtm tests for NULL pointer dereference: check > > >> access or exec at NULL address. > > > > > > Why is this not already covered by the existing tests? (Is there > > > something special about NULL that is being missed?) I'd expect SMAP > > > and SMEP to cover NULL as well. > > > > Most arches print a different message whether the faulty address is > > above or under PAGE_SIZE. Below is exemple from x86: > > > > pr_alert("BUG: unable to handle kernel %s at %px\n", > > address < PAGE_SIZE ? "NULL pointer dereference" : "paging > > request", > > (void *)address); > > > > > > Until recently, the powerpc arch didn't do it. When I implemented it > > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49a502ea23bf9dec47f8f3c3960909ff409cd1bb), > > I needed a way to test it and couldn't find an existing one, hence this > > new LKDTM test. > > > > But maybe I missed something ? > > Okay, gotcha. You're getting more complete reporting coverage. Sounds > good to me. Thanks! > > Acked-by: Kees Cook Applied to my lkdtm -next tree. -- Kees Cook
Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
On Wed, 9 Jan 2019 17:45:53 +0100 Frederic Barrat wrote: > Le 09/01/2019 à 17:25, Greg Kurz a écrit : > > On Wed, 9 Jan 2019 16:13:42 +0100 > > Frederic Barrat wrote: > > > >> With a recent change around IOMMU group, a system with an opencapi > >> adapter is no longer booting and we get a kernel oops: > >> > >> BUG: Kernel NULL pointer dereference at 0x0028 > >> Faulting instruction address: 0xc00aa38c > >> Oops: Kernel access of bad area, sig: 7 [#1] > >> LE SMP NR_CPUS=2048 NUMA PowerNV > >> Modules linked in: > >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12 > >> NIP: c00aa38c LR: c00a6608 CTR: c0097480 > >> REGS: c5783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-1-g3bd6 > >> MSR: 92009033 CR: 28000228 XER: 20 > >> CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0 > >> GPR00: c00a6608 c5783990 c1036100 c007bf761860 > >> GPR04: c5783834 > >> GPR08: 69626d2c6e707500 92001003 > >> GPR12: c007bfff8300 c0010450 > >> GPR16: c0ced938 0100 c0ced948 000a > >> GPR20: 000bfffe c0ced9a8 0200 c0ced978 > >> GPR24: 006080c0 c00716d09828 c0002e6fd000 > >> GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860 > >> NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0 > >> LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660 > >> Call Trace: > >> [c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x > >> [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660 > >> [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c > >> [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4 > >> [c5783c10] [c0010054] do_one_initcall+0x64/0x264 > >> [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468 > >> [c5783db0] [c0010474] kernel_init+0x2c/0x148 > >> [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68 > >> > >> An opencapi device is using a device PE, so the current code breaks > >> because pe->pbus is not defined. > >> > >> More generally, there's no need to define an IOMMU group for opencapi, > >> as the device sends real addresses directly (admittedly, the > >> virtualization story is yet to be written). So let's fix it by > > > > Current plan is to go for mediated VFIO. The real HW stays under the control > > of the host ocxl driver, and we still don't need an IOMMU group. > > > >> skipping the IOMMU group setup for opencapi PHBs. > >> > >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") > >> Signed-off-by: Frederic Barrat > >> --- > > > > Reviewed-by: Greg Kurz > > > > and > > > > Cc: sta...@vger.kernel.org # v4.20 > > Thanks for the review! But why did you add stable? that problem is only > seen on 5.0-rc1, isn't it? > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't tested :) >Fred > > > >> arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > >> b/arch/powerpc/platforms/powernv/pci-ioda.c > >> index 1d6406a051f1..7db3119f8a5b 100644 > >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c > >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > >> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void) > >>list_for_each_entry(hose, _list, list_node) { > >>phb = hose->private_data; > >> > >> - if (phb->type == PNV_PHB_NPU_NVLINK) > >> + if (phb->type == PNV_PHB_NPU_NVLINK || > >> + phb->type == PNV_PHB_NPU_OCAPI) > >>continue; > >> > >>list_for_each_entry(pe, >ioda.pe_list, list) { > > >
Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
On Wed, Jan 09, 2019 at 05:45:53PM +0100, Frederic Barrat wrote: > > > Le 09/01/2019 à 17:25, Greg Kurz a écrit : > > On Wed, 9 Jan 2019 16:13:42 +0100 > > Frederic Barrat wrote: > > > > > With a recent change around IOMMU group, a system with an opencapi > > > adapter is no longer booting and we get a kernel oops: > > > > > > BUG: Kernel NULL pointer dereference at 0x0028 > > > Faulting instruction address: 0xc00aa38c > > > Oops: Kernel access of bad area, sig: 7 [#1] > > > LE SMP NR_CPUS=2048 NUMA PowerNV > > > Modules linked in: > > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted > > > 5.0.0-rc1-fxb-1-g3bd6e94bec12 > > > NIP: c00aa38c LR: c00a6608 CTR: c0097480 > > > REGS: c5783700 TRAP: 0300 Not tainted > > > (5.0.0-rc1-fxb-1-g3bd6 > > > MSR: 92009033 CR: 28000228 XER: > > > 20 > > > CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0 > > > GPR00: c00a6608 c5783990 c1036100 c007bf761860 > > > GPR04: c5783834 > > > GPR08: 69626d2c6e707500 92001003 > > > GPR12: c007bfff8300 c0010450 > > > GPR16: c0ced938 0100 c0ced948 000a > > > GPR20: 000bfffe c0ced9a8 0200 c0ced978 > > > GPR24: 006080c0 c00716d09828 c0002e6fd000 > > > GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860 > > > NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0 > > > LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660 > > > Call Trace: > > > [c5783990] [c00aa3d0] > > > pnv_try_setup_npu_table_group+0x60/0x > > > [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660 > > > [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c > > > [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4 > > > [c5783c10] [c0010054] do_one_initcall+0x64/0x264 > > > [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468 > > > [c5783db0] [c0010474] kernel_init+0x2c/0x148 > > > [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68 > > > > > > An opencapi device is using a device PE, so the current code breaks > > > because pe->pbus is not defined. > > > > > > More generally, there's no need to define an IOMMU group for opencapi, > > > as the device sends real addresses directly (admittedly, the > > > virtualization story is yet to be written). So let's fix it by > > > > Current plan is to go for mediated VFIO. The real HW stays under the control > > of the host ocxl driver, and we still don't need an IOMMU group. > > > > > skipping the IOMMU group setup for opencapi PHBs. > > > > > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") > > > Signed-off-by: Frederic Barrat > > > --- > > > > Reviewed-by: Greg Kurz > > > > and > > > > Cc: sta...@vger.kernel.org # v4.20 > > Thanks for the review! But why did you add stable? that problem is only seen > on 5.0-rc1, isn't it? No, this is fixing a patch that got backported to stable. Well, attempted to be backported, I dropped it because of the problem :) thanks, greg k-h
Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
Le 09/01/2019 à 17:25, Greg Kurz a écrit : On Wed, 9 Jan 2019 16:13:42 +0100 Frederic Barrat wrote: With a recent change around IOMMU group, a system with an opencapi adapter is no longer booting and we get a kernel oops: BUG: Kernel NULL pointer dereference at 0x0028 Faulting instruction address: 0xc00aa38c Oops: Kernel access of bad area, sig: 7 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12 NIP: c00aa38c LR: c00a6608 CTR: c0097480 REGS: c5783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-1-g3bd6 MSR: 92009033 CR: 28000228 XER: 20 CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0 GPR00: c00a6608 c5783990 c1036100 c007bf761860 GPR04: c5783834 GPR08: 69626d2c6e707500 92001003 GPR12: c007bfff8300 c0010450 GPR16: c0ced938 0100 c0ced948 000a GPR20: 000bfffe c0ced9a8 0200 c0ced978 GPR24: 006080c0 c00716d09828 c0002e6fd000 GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860 NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0 LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660 Call Trace: [c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660 [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4 [c5783c10] [c0010054] do_one_initcall+0x64/0x264 [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468 [c5783db0] [c0010474] kernel_init+0x2c/0x148 [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68 An opencapi device is using a device PE, so the current code breaks because pe->pbus is not defined. More generally, there's no need to define an IOMMU group for opencapi, as the device sends real addresses directly (admittedly, the virtualization story is yet to be written). So let's fix it by Current plan is to go for mediated VFIO. The real HW stays under the control of the host ocxl driver, and we still don't need an IOMMU group. skipping the IOMMU group setup for opencapi PHBs. Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") Signed-off-by: Frederic Barrat --- Reviewed-by: Greg Kurz and Cc: sta...@vger.kernel.org # v4.20 Thanks for the review! But why did you add stable? that problem is only seen on 5.0-rc1, isn't it? Fred arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 1d6406a051f1..7db3119f8a5b 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void) list_for_each_entry(hose, _list, list_node) { phb = hose->private_data; - if (phb->type == PNV_PHB_NPU_NVLINK) + if (phb->type == PNV_PHB_NPU_NVLINK || + phb->type == PNV_PHB_NPU_OCAPI) continue; list_for_each_entry(pe, >ioda.pe_list, list) {
Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
On Wed, 9 Jan 2019 16:13:42 +0100 Frederic Barrat wrote: > With a recent change around IOMMU group, a system with an opencapi > adapter is no longer booting and we get a kernel oops: > > BUG: Kernel NULL pointer dereference at 0x0028 > Faulting instruction address: 0xc00aa38c > Oops: Kernel access of bad area, sig: 7 [#1] > LE SMP NR_CPUS=2048 NUMA PowerNV > Modules linked in: > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12 > NIP: c00aa38c LR: c00a6608 CTR: c0097480 > REGS: c5783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-1-g3bd6 > MSR: 92009033 CR: 28000228 XER: 20 > CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0 > GPR00: c00a6608 c5783990 c1036100 c007bf761860 > GPR04: c5783834 > GPR08: 69626d2c6e707500 92001003 > GPR12: c007bfff8300 c0010450 > GPR16: c0ced938 0100 c0ced948 000a > GPR20: 000bfffe c0ced9a8 0200 c0ced978 > GPR24: 006080c0 c00716d09828 c0002e6fd000 > GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860 > NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0 > LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660 > Call Trace: > [c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x > [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660 > [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c > [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4 > [c5783c10] [c0010054] do_one_initcall+0x64/0x264 > [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468 > [c5783db0] [c0010474] kernel_init+0x2c/0x148 > [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68 > > An opencapi device is using a device PE, so the current code breaks > because pe->pbus is not defined. > > More generally, there's no need to define an IOMMU group for opencapi, > as the device sends real addresses directly (admittedly, the > virtualization story is yet to be written). So let's fix it by Current plan is to go for mediated VFIO. The real HW stays under the control of the host ocxl driver, and we still don't need an IOMMU group. > skipping the IOMMU group setup for opencapi PHBs. > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") > Signed-off-by: Frederic Barrat > --- Reviewed-by: Greg Kurz and Cc: sta...@vger.kernel.org # v4.20 > arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 1d6406a051f1..7db3119f8a5b 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void) > list_for_each_entry(hose, _list, list_node) { > phb = hose->private_data; > > - if (phb->type == PNV_PHB_NPU_NVLINK) > + if (phb->type == PNV_PHB_NPU_NVLINK || > + phb->type == PNV_PHB_NPU_OCAPI) > continue; > > list_for_each_entry(pe, >ioda.pe_list, list) {
Re: [PATCH v2 13/34] dt-bindings: arm: amlogic: Move 'amlogic,meson-gx-ao-secure' binding to its own file
On Tue, Dec 4, 2018 at 10:18 PM Rob Herring wrote: > > On Tue, Dec 4, 2018 at 7:01 PM Kevin Hilman wrote: > > > > Rob Herring writes: > > > > > It is best practice to have 1 binding per file, so board level bindings > > > should be separate for various misc SoC bindings. > > > > > > Cc: Mark Rutland > > > Cc: Carlo Caione > > > Cc: Kevin Hilman > > > Cc: devicet...@vger.kernel.org > > > Cc: linux-arm-ker...@lists.infradead.org > > > Cc: linux-amlo...@lists.infradead.org > > > Signed-off-by: Rob Herring > > > --- > > > .../devicetree/bindings/arm/amlogic.txt | 29 --- > > > .../amlogic/amlogic,meson-gx-ao-secure.txt| 28 ++ > > > 2 files changed, 28 insertions(+), 29 deletions(-) > > > create mode 100644 > > > Documentation/devicetree/bindings/arm/amlogic/amlogic,meson-gx-ao-secure.txt > > > > Acked-by: Kevin Hilman > > > > But this isn't really related to the schema series is it? If you > > prefer, I can just queue this one separately via my tree. > > Yes, you can take it. Hey Kevin, doesn't look like this got applied. Rob
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote: > > POWER 8 firmware is good? If the link does eventually come back, is > > the POWER8's D3 resumption timeout long enough? > > > > If this doesn't lead to an obvious conclusion you'll probably need to > > connect to IBM's Mellanox support team to get more information from > > the card side. > > We are IBM :-) So far, it seems to be that the card is doing something > not quite right, but we don't know what. We might need to engage > Mellanox themselves. Sorry, it was unclear, I ment the support team for IBM inside Mellanox .. There might be internal debugging available that can show if the card is detecting the beacon, how far it gets in renegotiation, etc. >From all the mails it really has the feel of a PCI-E interop problem between these two specific chips.. Jason
Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference
On Tue, Jan 8, 2019 at 10:31 PM Christophe Leroy wrote: > > > > Le 09/01/2019 à 02:14, Kees Cook a écrit : > > On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy > > wrote: > >> > >> Introduce lkdtm tests for NULL pointer dereference: check > >> access or exec at NULL address. > > > > Why is this not already covered by the existing tests? (Is there > > something special about NULL that is being missed?) I'd expect SMAP > > and SMEP to cover NULL as well. > > Most arches print a different message whether the faulty address is > above or under PAGE_SIZE. Below is exemple from x86: > > pr_alert("BUG: unable to handle kernel %s at %px\n", > address < PAGE_SIZE ? "NULL pointer dereference" : "paging > request", > (void *)address); > > > Until recently, the powerpc arch didn't do it. When I implemented it > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49a502ea23bf9dec47f8f3c3960909ff409cd1bb), > I needed a way to test it and couldn't find an existing one, hence this > new LKDTM test. > > But maybe I missed something ? Okay, gotcha. You're getting more complete reporting coverage. Sounds good to me. Thanks! Acked-by: Kees Cook -Kees > > Christophe > > > > > -Kees > > > >> > >> Signed-off-by: Christophe Leroy > >> --- > >> drivers/misc/lkdtm/core.c | 2 ++ > >> drivers/misc/lkdtm/lkdtm.h | 2 ++ > >> drivers/misc/lkdtm/perms.c | 18 ++ > >> 3 files changed, 22 insertions(+) > >> > >> diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c > >> index bc76756b7eda..36910e1d5c09 100644 > >> --- a/drivers/misc/lkdtm/core.c > >> +++ b/drivers/misc/lkdtm/core.c > >> @@ -157,7 +157,9 @@ static const struct crashtype crashtypes[] = { > >> CRASHTYPE(EXEC_VMALLOC), > >> CRASHTYPE(EXEC_RODATA), > >> CRASHTYPE(EXEC_USERSPACE), > >> + CRASHTYPE(EXEC_NULL), > >> CRASHTYPE(ACCESS_USERSPACE), > >> + CRASHTYPE(ACCESS_NULL), > >> CRASHTYPE(WRITE_RO), > >> CRASHTYPE(WRITE_RO_AFTER_INIT), > >> CRASHTYPE(WRITE_KERN), > >> diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h > >> index 3c6fd327e166..b69ee004a3f7 100644 > >> --- a/drivers/misc/lkdtm/lkdtm.h > >> +++ b/drivers/misc/lkdtm/lkdtm.h > >> @@ -45,7 +45,9 @@ void lkdtm_EXEC_KMALLOC(void); > >> void lkdtm_EXEC_VMALLOC(void); > >> void lkdtm_EXEC_RODATA(void); > >> void lkdtm_EXEC_USERSPACE(void); > >> +void lkdtm_EXEC_NULL(void); > >> void lkdtm_ACCESS_USERSPACE(void); > >> +void lkdtm_ACCESS_NULL(void); > >> > >> /* lkdtm_refcount.c */ > >> void lkdtm_REFCOUNT_INC_OVERFLOW(void); > >> diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c > >> index fa54add6375a..62f76d506f04 100644 > >> --- a/drivers/misc/lkdtm/perms.c > >> +++ b/drivers/misc/lkdtm/perms.c > >> @@ -164,6 +164,11 @@ void lkdtm_EXEC_USERSPACE(void) > >> vm_munmap(user_addr, PAGE_SIZE); > >> } > >> > >> +void lkdtm_EXEC_NULL(void) > >> +{ > >> + execute_location(NULL, CODE_AS_IS); > >> +} > >> + > >> void lkdtm_ACCESS_USERSPACE(void) > >> { > >> unsigned long user_addr, tmp = 0; > >> @@ -195,6 +200,19 @@ void lkdtm_ACCESS_USERSPACE(void) > >> vm_munmap(user_addr, PAGE_SIZE); > >> } > >> > >> +void lkdtm_ACCESS_NULL(void) > >> +{ > >> + unsigned long tmp; > >> + unsigned long *ptr = (unsigned long *)NULL; > >> + > >> + pr_info("attempting bad read at %px\n", ptr); > >> + tmp = *ptr; > >> + tmp += 0xc0dec0de; > >> + > >> + pr_info("attempting bad write at %px\n", ptr); > >> + *ptr = tmp; > >> +} > >> + > >> void __init lkdtm_perms_init(void) > >> { > >> /* Make sure we can write to __ro_after_init values during __init > >> */ > >> -- > >> 2.13.3 > >> > > > > -- Kees Cook
[PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
With a recent change around IOMMU group, a system with an opencapi adapter is no longer booting and we get a kernel oops: BUG: Kernel NULL pointer dereference at 0x0028 Faulting instruction address: 0xc00aa38c Oops: Kernel access of bad area, sig: 7 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12 NIP: c00aa38c LR: c00a6608 CTR: c0097480 REGS: c5783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-1-g3bd6 MSR: 92009033 CR: 28000228 XER: 20 CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0 GPR00: c00a6608 c5783990 c1036100 c007bf761860 GPR04: c5783834 GPR08: 69626d2c6e707500 92001003 GPR12: c007bfff8300 c0010450 GPR16: c0ced938 0100 c0ced948 000a GPR20: 000bfffe c0ced9a8 0200 c0ced978 GPR24: 006080c0 c00716d09828 c0002e6fd000 GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860 NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0 LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660 Call Trace: [c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660 [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4 [c5783c10] [c0010054] do_one_initcall+0x64/0x264 [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468 [c5783db0] [c0010474] kernel_init+0x2c/0x148 [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68 An opencapi device is using a device PE, so the current code breaks because pe->pbus is not defined. More generally, there's no need to define an IOMMU group for opencapi, as the device sends real addresses directly (admittedly, the virtualization story is yet to be written). So let's fix it by skipping the IOMMU group setup for opencapi PHBs. Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") Signed-off-by: Frederic Barrat --- arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 1d6406a051f1..7db3119f8a5b 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void) list_for_each_entry(hose, _list, list_node) { phb = hose->private_data; - if (phb->type == PNV_PHB_NPU_NVLINK) + if (phb->type == PNV_PHB_NPU_NVLINK || + phb->type == PNV_PHB_NPU_OCAPI) continue; list_for_each_entry(pe, >ioda.pe_list, list) { -- 2.19.1
Re: [Bug 202149] New: NULL Pointer Dereference in __split_huge_pmd on PPC64LE
It's normal daily usage on a workstation (TALOS 2). I've seen it at least twice, both times in rustc, though I've run rustc more times than I can count. Note that the program that triggered it was running in lxc and it only happened after upgrading to 4.19. > On Jan 9, 2019, at 06:50, Aneesh Kumar K.V wrote: > > Matt Corallo writes: > >> .config follows. I have not tested with 64K pages as, sadly, I have a >> large BTRFS volume that was formatted on x86, and am thus stuck with 4K >> pages. Note that this is roughly the Debian kernel, so it has whatever >> patches Debian defaults to applying, a list of which follows. >> > > What is the test you are running? I tried a 4K page size config on P9. I > am running ltp test suite there. Also tried few thp memremap tests. > Nothing hit that. > > root@:~/tests/ltp/testcases/kernel/mem/thp# getconf PAGESIZE > 4096 > root@ltc-boston123:~/tests/ltp/testcases/kernel/mem/thp# grep thp > /proc/vmstat > thp_fault_alloc 641141 > thp_fault_fallback 0 > thp_collapse_alloc 90 > thp_collapse_alloc_failed 0 > thp_file_alloc 0 > thp_file_mapped 0 > thp_split_page 1 > thp_split_page_failed 0 > thp_deferred_split_page 641150 > thp_split_pmd 24 > thp_zero_page_alloc 1 > thp_zero_page_alloc_failed 0 > thp_swpout 0 > thp_swpout_fallback 0 > root@:~/tests/ltp/testcases/kernel/mem/thp# > > -aneesh >
Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls
On 1/9/19 2:08 PM, Michael Ellerman wrote: > Cédric Le Goater writes: > >> These flags are shared between Linux/KVM implementing the hypervisor >> calls for the XIVE native exploitation mode and the driver for the >> sPAPR guests. >> >> Signed-off-by: Cédric Le Goater >> --- >> arch/powerpc/include/asm/xive.h | 23 +++ >> arch/powerpc/sysdev/xive/spapr.c | 28 >> 2 files changed, 31 insertions(+), 20 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/xive.h >> b/arch/powerpc/include/asm/xive.h >> index 3c704f5dd3ae..32f033bfbf42 100644 >> --- a/arch/powerpc/include/asm/xive.h >> +++ b/arch/powerpc/include/asm/xive.h >> @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void); >> /* xmon hook */ >> extern void xmon_xive_do_dump(int cpu); >> >> +/* >> + * Hcall flags shared by the sPAPR backend and KVM >> + */ >> + >> +/* H_INT_GET_SOURCE_INFO */ >> +#define XIVE_SPAPR_SRC_H_INT_ESBPPC_BIT(60) >> +#define XIVE_SPAPR_SRC_LSI PPC_BIT(61) >> +#define XIVE_SPAPR_SRC_TRIGGER PPC_BIT(62) >> +#define XIVE_SPAPR_SRC_STORE_EOIPPC_BIT(63) > > I have an (irrational) hatred of PPC_BIT, because it obfuscates what's > going on and makes PPC seem weirder than it needs to be. It could at > least be called IBM_BIT(). > > I know it helps people compare the code vs the documentation, but > basically no one has the documentation, and everyone has the code. > > Anyway it's not a show stopper, just a pet-peeve of mine :) Only the define matters, I can change that back to the non-PPC_BIT version in v2. Not a problem. Cheers, C.
[PATCH] powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined. error: 'msr' undeclared (first use in this function) This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set: #define MSR_TM_ACTIVE(x) 0 This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition. Cc: sta...@vger.kernel.org Reported-by: Christoph Biedl Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao --- NB: Since stable kernels didn't cherry picked 5c784c8414fba ('powerpc/tm: Remove msr_tm_active()), MSR_TM_ACTIVE() is not defined as 0 for CONFIG_PPC_TRANSACTIONAL_MEM=n case, thus triggering the compilation error above. Tested against stable kernel 4.19.13-rc2 and problem is now fixed when CONFIG_PPC_TRANSACTIONAL_MEM=n arch/powerpc/kernel/signal_64.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index daa28cb72272..8fe698162ab9 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -739,11 +739,12 @@ SYSCALL_DEFINE0(rt_sigreturn) if (restore_tm_sigcontexts(current, >uc_mcontext, _transact->uc_mcontext)) goto badframe; - } + } else #endif - /* Fall through, for non-TM restore */ - if (!MSR_TM_ACTIVE(msr)) { + { /* +* Fall through, for non-TM restore +* * Unset MSR[TS] on the thread regs since MSR from user * context does not have MSR active, and recheckpoint was * not called since restore_tm_sigcontexts() was not called -- 2.19.0
Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls
Cédric Le Goater writes: > These flags are shared between Linux/KVM implementing the hypervisor > calls for the XIVE native exploitation mode and the driver for the > sPAPR guests. > > Signed-off-by: Cédric Le Goater > --- > arch/powerpc/include/asm/xive.h | 23 +++ > arch/powerpc/sysdev/xive/spapr.c | 28 > 2 files changed, 31 insertions(+), 20 deletions(-) > > diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h > index 3c704f5dd3ae..32f033bfbf42 100644 > --- a/arch/powerpc/include/asm/xive.h > +++ b/arch/powerpc/include/asm/xive.h > @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void); > /* xmon hook */ > extern void xmon_xive_do_dump(int cpu); > > +/* > + * Hcall flags shared by the sPAPR backend and KVM > + */ > + > +/* H_INT_GET_SOURCE_INFO */ > +#define XIVE_SPAPR_SRC_H_INT_ESB PPC_BIT(60) > +#define XIVE_SPAPR_SRC_LSI PPC_BIT(61) > +#define XIVE_SPAPR_SRC_TRIGGER PPC_BIT(62) > +#define XIVE_SPAPR_SRC_STORE_EOI PPC_BIT(63) I have an (irrational) hatred of PPC_BIT, because it obfuscates what's going on and makes PPC seem weirder than it needs to be. It could at least be called IBM_BIT(). I know it helps people compare the code vs the documentation, but basically no one has the documentation, and everyone has the code. Anyway it's not a show stopper, just a pet-peeve of mine :) cheers
[PATCH 14/14] syscall_get_arch: add "struct task_struct *" argument
This argument is required to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request: syscall_get_arch() is going to be called from ptrace_request() along with syscall_get_nr(), syscall_get_arguments(), syscall_get_error(), and syscall_get_return_value() functions with a tracee as their argument. The primary intent is that the triple (audit_arch, syscall_nr, arg1..arg6) should describe what system call is being called and what its arguments are. Reverts: 5e937a9ae913 ("syscall_get_arch: remove useless function arguments") Reverts: 1002d94d3076 ("syscall.h: fix doc text for syscall_get_arch()") Reviewed-by: Andy Lutomirski # for x86 Reviewed-by: Palmer Dabbelt Acked-by: Paul Moore Acked-by: Paul Burton # MIPS parts Acked-by: Michael Ellerman (powerpc) Acked-by: Kees Cook # seccomp parts Acked-by: Mark Salter # for the c6x bit Cc: Elvira Khabirova Cc: Eugene Syromyatnikov Cc: Oleg Nesterov Cc: x...@kernel.org Cc: linux-al...@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c6x-...@linux-c6x.org Cc: uclinux-h8-de...@lists.sourceforge.jp Cc: linux-hexa...@vger.kernel.org Cc: linux-i...@vger.kernel.org Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: nios2-...@lists.rocketboards.org Cc: openr...@lists.librecores.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org Cc: linux-xte...@linux-xtensa.org Cc: linux-a...@vger.kernel.org Cc: linux-au...@redhat.com Signed-off-by: Dmitry V. Levin --- arch/alpha/include/asm/syscall.h | 2 +- arch/arc/include/asm/syscall.h| 2 +- arch/arm/include/asm/syscall.h| 2 +- arch/arm64/include/asm/syscall.h | 4 ++-- arch/c6x/include/asm/syscall.h| 2 +- arch/csky/include/asm/syscall.h | 2 +- arch/h8300/include/asm/syscall.h | 2 +- arch/hexagon/include/asm/syscall.h| 2 +- arch/ia64/include/asm/syscall.h | 2 +- arch/m68k/include/asm/syscall.h | 2 +- arch/microblaze/include/asm/syscall.h | 2 +- arch/mips/include/asm/syscall.h | 6 +++--- arch/mips/kernel/ptrace.c | 2 +- arch/nds32/include/asm/syscall.h | 2 +- arch/nios2/include/asm/syscall.h | 2 +- arch/openrisc/include/asm/syscall.h | 2 +- arch/parisc/include/asm/syscall.h | 4 ++-- arch/powerpc/include/asm/syscall.h| 10 -- arch/riscv/include/asm/syscall.h | 2 +- arch/s390/include/asm/syscall.h | 4 ++-- arch/sh/include/asm/syscall_32.h | 2 +- arch/sh/include/asm/syscall_64.h | 2 +- arch/sparc/include/asm/syscall.h | 5 +++-- arch/unicore32/include/asm/syscall.h | 2 +- arch/x86/include/asm/syscall.h| 8 +--- arch/x86/um/asm/syscall.h | 2 +- arch/xtensa/include/asm/syscall.h | 2 +- include/asm-generic/syscall.h | 5 +++-- kernel/auditsc.c | 4 ++-- kernel/seccomp.c | 4 ++-- 30 files changed, 52 insertions(+), 42 deletions(-) diff --git a/arch/alpha/include/asm/syscall.h b/arch/alpha/include/asm/syscall.h index d73a6fcb519c..11c688c1d7ec 100644 --- a/arch/alpha/include/asm/syscall.h +++ b/arch/alpha/include/asm/syscall.h @@ -4,7 +4,7 @@ #include -static inline int syscall_get_arch(void) +static inline int syscall_get_arch(struct task_struct *task) { return AUDIT_ARCH_ALPHA; } diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h index c7fc4c0c3bcb..caf2697ef5b7 100644 --- a/arch/arc/include/asm/syscall.h +++ b/arch/arc/include/asm/syscall.h @@ -70,7 +70,7 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs, } static inline int -syscall_get_arch(void) +syscall_get_arch(struct task_struct *task) { return IS_ENABLED(CONFIG_ISA_ARCOMPACT) ? (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h index 06dea6bce293..3940ceac0bdc 100644 --- a/arch/arm/include/asm/syscall.h +++ b/arch/arm/include/asm/syscall.h @@ -104,7 +104,7 @@ static inline void syscall_set_arguments(struct task_struct *task, memcpy(>ARM_r0 + i, args, n * sizeof(args[0])); } -static inline int syscall_get_arch(void) +static inline int syscall_get_arch(struct task_struct *task) { /* ARM tasks don't change audit architectures on the fly. */ return AUDIT_ARCH_ARM; diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h index ad8be16a39c9..1870df03f774 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -117,9 +117,9 @@ static inline void syscall_set_arguments(struct task_struct *task, * We don't care about endianness (__AUDIT_ARCH_LE bit) here because * AArch64 has the same system calls both on little- and
Re: Kconfig label updates
Hi Bjorn, Bjorn Helgaas writes: > Hi, > > I want to update the PCI Kconfig labels so they're more consistent and > useful to users, something like the patch below. IIUC, the items > below are all IBM-related; please correct me if not. > > I'd also like to expand (or remove) "RPA" because Google doesn't find > anything about "IBM RPA", except Robotic Process Automation, which I > think must be something else. Yeah I think just remove it, it's not a well known term and is unlikely to help anyone these days. It stands for "RISC Platform Architecture", which was some kind of specification for Power machines back in the day, but from what I can tell it was never used in marketing or manuals much (hence so few hits on Google). > Is there some text expansion of RPA that we could use that would be > meaningful to a user, i.e., something he/she might find on a nameplate > or in a user manual? No I don't think so. > Ideally the PCI Kconfig labels would match the terms used in > arch/.../Kconfig, e.g., > > config PPC_POWERNV > bool "IBM PowerNV (Non-Virtualized) platform support" > > config PPC_PSERIES > bool "IBM pSeries & new (POWER5-based) iSeries" TBH these are pretty unhelpful too. PowerNV is not a marketing name and so doesn't appear anywhere much in official manuals or brochures and it's also used on non-IBM branded machines. And pSeries & iSeries were marketing names but are no longer used. We should probably update that text, but we can do that later, rather than blocking this patch. > diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig > index e9f78eb390d2..1c1d145bfd84 100644 > --- a/drivers/pci/hotplug/Kconfig > +++ b/drivers/pci/hotplug/Kconfig > @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC > When in doubt, say N. > > config HOTPLUG_PCI_POWERNV > - tristate "PowerPC PowerNV PCI Hotplug driver" > + tristate "IBM PowerNV PCI Hotplug driver" This is used in non-IBM machines as well. So perhaps: ? tristate "IBM/OpenPower PowerNV (bare metal) PCI Hotplug driver" > @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV > When in doubt, say N. > > config HOTPLUG_PCI_RPA > - tristate "RPA PCI Hotplug driver" > + tristate "IBM Power Systems RPA PCI Hotplug driver" I think just drop RPA here. > depends on PPC_PSERIES && EEH > help > Say Y here if you have a RPA system that supports PCI Hotplug. s/RPA/IBM Power Systems/ > + This includes the earlier pSeries and iSeries. To be complete: This includes the earlier System p, System i, pSeries and iSeries. > > To compile this driver as a module, choose M here: the > module will be called rpaphp. > @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA > When in doubt, say N. > > config HOTPLUG_PCI_RPA_DLPAR > - tristate "RPA Dynamic Logical Partitioning for I/O slots" > + tristate "IBM RPA Dynamic Logical Partitioning for I/O slots" Again just drop RPA. cheers
[PATCH -next] powerpc/mm: Fix debugfs_simple_attr.cocci warnings
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE for debugfs files. Semantic patch information: Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file() imposes some significant overhead as compared to DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe(). Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci Signed-off-by: YueHaibing --- arch/powerpc/mm/hash_utils_64.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index bc6be44..22f14e1 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1889,12 +1889,13 @@ static int hpt_order_set(void *data, u64 val) return mmu_hash_ops.resize_hpt(val); } -DEFINE_SIMPLE_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, "%llu\n"); +DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, +"%llu\n"); static int __init hash64_debugfs(void) { - if (!debugfs_create_file("hpt_order", 0600, powerpc_debugfs_root, -NULL, _hpt_order)) { + if (!debugfs_create_file_unsafe("hpt_order", 0600, powerpc_debugfs_root, + NULL, _hpt_order)) { pr_err("lpar: unable to create hpt_order debugsfs file\n"); }
Re: [PATCH] powerpc/powernv/npu: Allocate enough memory in pnv_try_setup_npu_table_group()
Dan Carpenter writes: > There is a typo so we accidentally allocate enough memory for a pointer > when we wanted to allocate enough for a struct. > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") > Signed-off-by: Dan Carpenter > --- > arch/powerpc/platforms/powernv/npu-dma.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, I've applied this to my fixes-test tree. Alexey can you send me an ack? cheers > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > b/arch/powerpc/platforms/powernv/npu-dma.c > index d7f742ed48ba..3f58c7dbd581 100644 > --- a/arch/powerpc/platforms/powernv/npu-dma.c > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > @@ -564,7 +564,7 @@ struct iommu_table_group > *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe) > } > } else { > /* Create a group for 1 GPU and attached NPUs for POWER8 */ > - pe->npucomp = kzalloc(sizeof(pe->npucomp), GFP_KERNEL); > + pe->npucomp = kzalloc(sizeof(*pe->npucomp), GFP_KERNEL); > table_group = >npucomp->table_group; > table_group->ops = _npu_peers_ops; > iommu_register_group(table_group, hose->global_number, > -- > 2.17.1
Re: [Bug 202149] New: NULL Pointer Dereference in __split_huge_pmd on PPC64LE
Matt Corallo writes: > .config follows. I have not tested with 64K pages as, sadly, I have a > large BTRFS volume that was formatted on x86, and am thus stuck with 4K > pages. Note that this is roughly the Debian kernel, so it has whatever > patches Debian defaults to applying, a list of which follows. > What is the test you are running? I tried a 4K page size config on P9. I am running ltp test suite there. Also tried few thp memremap tests. Nothing hit that. root@:~/tests/ltp/testcases/kernel/mem/thp# getconf PAGESIZE 4096 root@ltc-boston123:~/tests/ltp/testcases/kernel/mem/thp# grep thp /proc/vmstat thp_fault_alloc 641141 thp_fault_fallback 0 thp_collapse_alloc 90 thp_collapse_alloc_failed 0 thp_file_alloc 0 thp_file_mapped 0 thp_split_page 1 thp_split_page_failed 0 thp_deferred_split_page 641150 thp_split_pmd 24 thp_zero_page_alloc 1 thp_zero_page_alloc_failed 0 thp_swpout 0 thp_swpout_fallback 0 root@:~/tests/ltp/testcases/kernel/mem/thp# -aneesh
[PATCH] powerpc/powernv/npu: Allocate enough memory in pnv_try_setup_npu_table_group()
There is a typo so we accidentally allocate enough memory for a pointer when we wanted to allocate enough for a struct. Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") Signed-off-by: Dan Carpenter --- arch/powerpc/platforms/powernv/npu-dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index d7f742ed48ba..3f58c7dbd581 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -564,7 +564,7 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe) } } else { /* Create a group for 1 GPU and attached NPUs for POWER8 */ - pe->npucomp = kzalloc(sizeof(pe->npucomp), GFP_KERNEL); + pe->npucomp = kzalloc(sizeof(*pe->npucomp), GFP_KERNEL); table_group = >npucomp->table_group; table_group->ops = _npu_peers_ops; iommu_register_group(table_group, hose->global_number, -- 2.17.1
Re: use generic DMA mapping code in powerpc V4
Next step: a64e18ba191ba9102fb174f27d707485ffd9389c (powerpc/dma: remove dma_nommu_get_required_mask) git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a git checkout a64e18ba191ba9102fb174f27d707485ffd9389c Link to the Git: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6 Results: PASEMI onboard ethernet works and the X5000 (P5020 board) boots. I also successfully tested sound, hardware 3D acceleration, Bluetooth, network, booting with a label etc. The uImages work also in a virtual e5500 quad-core QEMU machine. -- Christian On 05 January 2019 at 5:03PM, Christian Zigotzky wrote: Next step: c446404b041130fbd9d1772d184f24715cf2362f (powerpc/dma: remove dma_nommu_mmap_coherent) git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a git checkout c446404b041130fbd9d1772d184f24715cf2362f Output: Note: checking out 'c446404b041130fbd9d1772d184f24715cf2362f'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at c446404... powerpc/dma: remove dma_nommu_mmap_coherent - Link to the Git: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6 Result: PASEMI onboard ethernet works and the X5000 (P5020 board) boots. -- Christian
Re: [PATCH V6 0/4] mm/kvm/vfio/ppc64: Migrate compound pages out of CMA region
Andrew Morton writes: > On Tue, 8 Jan 2019 10:21:06 +0530 "Aneesh Kumar K.V" > wrote: > >> ppc64 use CMA area for the allocation of guest page table (hash page table). >> We won't >> be able to start guest if we fail to allocate hash page table. We have >> observed >> hash table allocation failure because we failed to migrate pages out of CMA >> region >> because they were pinned. This happen when we are using VFIO. VFIO on ppc64 >> pins >> the entire guest RAM. If the guest RAM pages get allocated out of CMA >> region, we >> won't be able to migrate those pages. The pages are also pinned for the >> lifetime of the >> guest. >> >> Currently we support migration of non-compound pages. With THP and with the >> addition of >> hugetlb migration we can end up allocating compound pages from CMA region. >> This >> patch series add support for migrating compound pages. The first path adds >> the helper >> get_user_pages_cma_migrate() which pin the page making sure we migrate them >> out of >> CMA region before incrementing the reference count. > > Does this code do anything for architectures other than powerpc? If > not, should we be adding the ifdefs to avoid burdening other > architectures with unused code? Any architecture enabling CMA may need this. I will move most of this below CONFIG_CMA. -aneesh
Re: [PATCH V6 3/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
Andrea Arcangeli writes: > Hello, > > On Tue, Jan 08, 2019 at 10:21:09AM +0530, Aneesh Kumar K.V wrote: >> @@ -187,41 +149,25 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, >> unsigned long ua, >> goto unlock_exit; >> } >> >> +ret = get_user_pages_cma_migrate(ua, entries, 1, mem->hpages); > > In terms of gup APIs, I've been wondering if this shall become > get_user_pages_longerm(FOLL_CMA_MIGRATE). So basically moving this > CMA migrate logic inside get_user_pages_longerm. Do we need the FOLL_CMA_MIGRATE flag? Wondering whether a long term pin won't imply a CMA migrate? What is the benefit of that FOLL_CMA_MIGRATE flags. We can do better by taking a list of pages for migration and I guess it is much simpler if we limit that migration logic to get_user_pages_longterm()? I ended up with something like below. Do you suggest we should add those isolate_lru and other details via FOLL_CMA_MIGRATE flag and do that when we take the page reference instead of doing this by iterating the page array in get_user_pages_longterm as in the below diff? diff --git a/mm/gup.c b/mm/gup.c index 05acd7e2eb22..6e8152594e83 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -13,6 +13,9 @@ #include #include #include +#include +#include +#include #include #include @@ -1126,7 +1129,167 @@ long get_user_pages(unsigned long start, unsigned long nr_pages, } EXPORT_SYMBOL(get_user_pages); +#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA) + #ifdef CONFIG_FS_DAX +static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages) +{ + long i; + struct vm_area_struct *vma_prev = NULL; + + for (i = 0; i < nr_pages; i++) { + struct vm_area_struct *vma = vmas[i]; + + if (vma == vma_prev) + continue; + + vma_prev = vma; + + if (vma_is_fsdax(vma)) + return true; + } + return false; +} +#else +static inline bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages) +{ + return false; +} +#endif + +#ifdef CONFIG_CMA +static struct page *new_non_cma_page(struct page *page, unsigned long private) +{ + /* +* We want to make sure we allocate the new page from the same node +* as the source page. +*/ + int nid = page_to_nid(page); + /* +* Trying to allocate a page for migration. Ignore allocation +* failure warnings. We don't force __GFP_THISNODE here because +* this node here is the node where we have CMA reservation and +* in some case these nodes will have really less non movable +* allocation memory. +*/ + gfp_t gfp_mask = GFP_USER | __GFP_NOWARN; + + if (PageHighMem(page)) + gfp_mask |= __GFP_HIGHMEM; + +#ifdef CONFIG_HUGETLB_PAGE + if (PageHuge(page)) { + struct hstate *h = page_hstate(page); + /* +* We don't want to dequeue from the pool because pool pages will +* mostly be from the CMA region. +*/ + return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); + } +#endif + if (PageTransHuge(page)) { + struct page *thp; + /* +* ignore allocation failure warnings +*/ + gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN; + + /* +* Remove the movable mask so that we don't allocate from +* CMA area again. +*/ + thp_gfpmask &= ~__GFP_MOVABLE; + thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER); + if (!thp) + return NULL; + prep_transhuge_page(thp); + return thp; + } + + return __alloc_pages_node(nid, gfp_mask, 0); +} + +static long check_and_migrate_cma_pages(unsigned long start, long nr_pages, + unsigned int gup_flags, + struct page **pages, + struct vm_area_struct **vmas) +{ + long i; + bool drain_allow = true; + bool migrate_allow = true; + LIST_HEAD(cma_page_list); + +check_again: + for (i = 0; i < nr_pages; i++) { + /* +* If we get a page from the CMA zone, since we are going to +* be pinning these entries, we might as well move them out +* of the CMA zone if possible. +*/ + if (is_migrate_cma_page(pages[i])) { + + struct page *head = compound_head(pages[i]); + + if (PageHuge(head)) { + isolate_huge_page(head, _page_list); + } else { + if (!PageLRU(head) && drain_allow) { + lru_add_drain_all(); +
Re: Kconfig label updates
On Tue, 8 Jan 2019 16:30:24 -0600 Bjorn Helgaas wrote: > Hi, > > I want to update the PCI Kconfig labels so they're more consistent and > useful to users, something like the patch below. IIUC, the items > below are all IBM-related; please correct me if not. > > I'd also like to expand (or remove) "RPA" because Google doesn't find > anything about "IBM RPA", except Robotic Process Automation, which I > think must be something else. > > Is there some text expansion of RPA that we could use that would be > meaningful to a user, i.e., something he/she might find on a nameplate > or in a user manual? > > Ideally the PCI Kconfig labels would match the terms used in > arch/.../Kconfig, e.g., > > config PPC_POWERNV > bool "IBM PowerNV (Non-Virtualized) platform support" > > config PPC_PSERIES > bool "IBM pSeries & new (POWER5-based) iSeries" > > config MARCH_Z900 > bool "IBM zSeries model z800 and z900" > > config MARCH_Z9_109 > bool "IBM System z9" > > Bjorn > > > diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig > index e9f78eb390d2..1c1d145bfd84 100644 > --- a/drivers/pci/hotplug/Kconfig > +++ b/drivers/pci/hotplug/Kconfig > @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC > When in doubt, say N. > > config HOTPLUG_PCI_POWERNV > - tristate "PowerPC PowerNV PCI Hotplug driver" > + tristate "IBM PowerNV PCI Hotplug driver" > depends on PPC_POWERNV && EEH > select OF_DYNAMIC > help > @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV > When in doubt, say N. > > config HOTPLUG_PCI_RPA > - tristate "RPA PCI Hotplug driver" > + tristate "IBM Power Systems RPA PCI Hotplug driver" > depends on PPC_PSERIES && EEH > help > Say Y here if you have a RPA system that supports PCI Hotplug. > + This includes the earlier pSeries and iSeries. > > To compile this driver as a module, choose M here: the > module will be called rpaphp. > @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA > When in doubt, say N. > > config HOTPLUG_PCI_RPA_DLPAR > - tristate "RPA Dynamic Logical Partitioning for I/O slots" > + tristate "IBM RPA Dynamic Logical Partitioning for I/O slots" > depends on HOTPLUG_PCI_RPA > help > Say Y here if your system supports Dynamic Logical Partitioning > @@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI > When in doubt, say N. > > config HOTPLUG_PCI_S390 > - bool "System z PCI Hotplug Support" > + bool "IBM System z PCI Hotplug Support" > depends on S390 && 64BIT > help > Say Y here if you want to use the System z PCI Hotplug > The rewording of the HOTPLUG_PCI_S390 entry is fine with me. Acked-by: Martin Schwidefsky -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On 09/01/2019 18:24, Benjamin Herrenschmidt wrote: > On Wed, 2019-01-09 at 15:53 +1100, Alexey Kardashevskiy wrote: >> "A PCI completion timeout occurred for an outstanding PCI-E transaction" >> it is. >> >> This is how I bind the device to vfio: >> >> echo vfio-pci > '/sys/bus/pci/devices/:01:00.0/driver_override' >> echo vfio-pci > '/sys/bus/pci/devices/:01:00.1/driver_override' >> echo ':01:00.0' > '/sys/bus/pci/devices/:01:00.0/driver/unbind' >> echo ':01:00.1' > '/sys/bus/pci/devices/:01:00.1/driver/unbind' >> echo ':01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind >> echo ':01:00.1' > /sys/bus/pci/drivers/vfio-pci/bind >> >> >> and I noticed that EEH only happens with the last command. The order >> (.0,.1 or .1,.0) does not matter, it seems that putting one function to >> D3 is fine but putting another one when the first one is already in D3 - >> produces EEH. And I do not recall ever seeing this on the firestone >> machine. Weird. > > Putting all functions into D3 is what allows the device to actually go > into D3. > > Does it work with other devices ? Works fine with on the very same garrison: 0009:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0009:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Bizarre. > We do have that bug on early P9 > revisions where the attempt of bringing the link to L1 as part of the > D3 process fails in horrible ways, I thought P8 would be ok but maybe > not ... > Otherwise, it might be that our timeouts are too low (you may want to > talk to our PCIe guys internally) This increases "Outbound non-posted transactions timeout configuration" from 16ms to 1s and does not help anyway: diff --git a/hw/phb3.c b/hw/phb3.c index 38b8f46..cb14909 100644 --- a/hw/phb3.c +++ b/hw/phb3.c @@ -4065,7 +4065,7 @@ static void phb3_init_utl(struct phb3 *p) /* Init_82: PCI Express port control * SW283991: Set Outbound Non-Posted request timeout to 16ms (RTOS). */ - out_be64(p->regs + UTL_PCIE_PORT_CONTROL, 0x85880070); + out_be64(p->regs + UTL_PCIE_PORT_CONTROL, 0x858800d0); -- Alexey
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On 09/01/2019 18:25, Benjamin Herrenschmidt wrote: > On Wed, 2019-01-09 at 17:32 +1100, Alexey Kardashevskiy wrote: >> I have just moved the "Mellanox Technologies MT27700 Family >> [ConnectX-4]" from garrison to firestone machine and there it does not >> produce an EEH, with the same kernel and skiboot (both upstream + my >> debug). Hm. I cannot really blame the card but I cannot see what could >> cause the difference in skiboot either. I even tried disabling NPU so >> garrison would look like firestone, still EEH'ing. > > The systems have a different chip though, firestone is P8 and garrison > is P8', which a slightly different PHB revision. Worth checking if we > have anything significantly different in our inits and poke at the HW > guys. Nope, we do not have anything different for these machines. Asking HW guys never worked for me :-/ I think the easiest is just doing what we did for PHB4 and ignoring these D3 requests on garrisons. > BTW. Are the cards behind a switch in either case ? No, directly connected to the root on both: garrison: :00:00.0 PCI bridge: IBM Device 03dc (rev ff) :01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] (rev ff) :01:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] (rev ff) firestone (phb #0 is taken by nvidia gpu): 0001:00:00.0 PCI bridge: IBM POWER8 Host Bridge (PHB3) 0001:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] 0001:01:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] -- Alexey