Re: [PATCH] powerpc/vdso32: Drop -mabi=elfv1 for 32 bit objects

2019-01-09 Thread Christophe Leroy




Le 10/01/2019 à 02:42, Joel Stanley a écrit :

From: Daniel Axtens 

All 64-bit objects need to specify the flag to be compiled correctly, we
just don't need it for 32-bit objects. GCC just ignored it, but clang
doesn't.

Link: https://github.com/ClangBuiltLinux/linux/issues/240
Signed-off-by: Daniel Axtens 
Signed-off-by: Joel Stanley 
---
  arch/powerpc/kernel/vdso32/Makefile | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 50112d4473bb..6bd41756e0c7 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,20 @@ obj-y += vdso32_wrapper.o
  extra-y += vdso32.lds
  CPPFLAGS_vdso32.lds += -P -C -Upowerpc
  
+# clang refuses to accept -mabi=elfv1 for when using the

+# 64-bit target in 32-bit mode
+ifdef CONFIG_CC_IS_CLANG


If -mabi=elfv1 is unneeded even for GCC, why depend on CLANG ?


+ifdef CONFIG_PPC64
+AFLAGS_REMOVE_getcpu.o += -mabi=elfv1
+endif


Why only this one is inside the ifdef ? powerpc Makefile only set 
-mabi=elfv1 when CONFIG_PPC64 is set, so all objects should be handled 
the same way.


And would it harm just doing it all the time, regardless of CONFIG_PPC64 ?

Christophe


+AFLAGS_REMOVE_sigtramp.o += -mabi=elfv1
+AFLAGS_REMOVE_gettimeofday.o += -mabi=elfv1
+AFLAGS_REMOVE_datapage.o += -mabi=elfv1
+AFLAGS_REMOVE_cacheflush.o += -mabi=elfv1
+AFLAGS_REMOVE_note.o += -mabi=elfv1
+endif
+
+
  # Force dependency (incbin is bad)
  $(obj)/vdso32_wrapper.o : $(obj)/vdso32.so
  



Re: [PATCH v2 18/34] dt-bindings: arm: Convert FSL board/soc bindings to json-schema

2019-01-09 Thread Shawn Guo
On Sat, Dec 08, 2018 at 09:58:37AM +0800, Shawn Guo wrote:
> On Thu, Dec 06, 2018 at 05:33:13PM -0600, Rob Herring wrote:
> > On Wed, Dec 5, 2018 at 8:32 PM Shawn Guo  wrote:
> > >
> > > On Mon, Dec 03, 2018 at 03:32:07PM -0600, Rob Herring wrote:
> > > > Convert Freescale SoC bindings to DT schema format using json-schema.
> > > >
> > > > Cc: Shawn Guo 
> > > > Cc: Mark Rutland 
> > > > Cc: devicet...@vger.kernel.org
> > > > Signed-off-by: Rob Herring 
> > > > ---
> > > >  .../devicetree/bindings/arm/armadeus.txt  |   6 -
> > > >  Documentation/devicetree/bindings/arm/bhf.txt |   6 -
> > > >  .../bindings/arm/compulab-boards.txt  |  25 --
> > > >  Documentation/devicetree/bindings/arm/fsl.txt | 229 --
> > > >  .../devicetree/bindings/arm/fsl.yaml  | 214 
> > >
> > > Rob,
> > >
> > > I do have any changes on bindings/arm/fsl.txt queued for 4.21 on my
> > > tree, so please send it via your tree.
> > 
> > What about:
> > 
> > c386f362957b dt-bindings: Add compatible string for LS1028A-QDS
> > 3671cd57de06 dt-bindings: ls1012a: Add FRWY-LS1012A device tree binding
> 
> Ah, sorry, I only checked on imx/dt branch and forgot imx/dt64.  I will
> drop the changes on fsl.txt and update fsl.yaml after it hits mainline.

What happened to this?  It seems the patch did not hit v5.0-rc1.

Shawn


[RFC PATCH kernel] powerpc/stack_protector: Fix external modules building

2019-01-09 Thread Alexey Kardashevskiy
c3ff2a519 "powerpc/32: add stack protector support" addes stack protector
support so now powerpc's "prepare" target depends on prepare0 (via
stack_protector_prepare target).

It works fine until we try build an external module where it fails with:
Run: 'make -j128 SYSSRC=/home/aik/p/kernel 
SYSOUT=/home/aik/pbuild/kernel-le-pseries/ ARCH=powerpc'
make[1]: Entering directory '/home/aik/p/kernel'
make[2]: Entering directory '/home/aik/pbuild/kernel-le-pseries'
make[2]: *** No rule to make target 'prepare0', needed by 
'stack_protector_prepare'.  Stop.

The reason for that is that the main Linux Makefile defines "prepare0"
only if KBUILD_EXTMOD=="".

This hacks powerpc's Makefile to make external modules build again.

Fixes: c3ff2a519 "powerpc/32: add stack protector support"
Signed-off-by: Alexey Kardashevskiy 
---


It has been suggested that there is a better way of fixing this hence RFC.


---
 arch/powerpc/Makefile | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 488c9ed..0492f62 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -419,7 +419,11 @@ archheaders:
 ifdef CONFIG_STACKPROTECTOR
 prepare: stack_protector_prepare
 
+ifeq ($(KBUILD_EXTMOD),)
 stack_protector_prepare: prepare0
+else
+stack_protector_prepare:
+endif
 ifdef CONFIG_PPC64
$(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if 
($$2 == "PACA_CANARY") print $$3;}' include/generated/asm-offsets.h))
 else
-- 
2.17.1



[PATCH] ibmvscsi: use GFP_KERNEL with dma_alloc_coherent in initialize_event_pool

2019-01-09 Thread Tyrel Datwyler
During driver probe we allocate a dma region for our event pool.
Currently, zero is passed for the gfp_flags parameter. Driver probe
callbacks run in process context and we hold no locks so we can sleep
here if necessary.

Fix by passing GFP_KERNEL explicitly to dma_alloc_coherent().

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index cb8535e..10d5e77 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -465,7 +465,7 @@ static int initialize_event_pool(struct event_pool *pool,
pool->iu_storage =
dma_alloc_coherent(hostdata->dev,
   pool->size * sizeof(*pool->iu_storage),
-  >iu_token, 0);
+  >iu_token, GFP_KERNEL);
if (!pool->iu_storage) {
kfree(pool->events);
return -ENOMEM;
-- 
1.8.3.1



[PATCH] ibmvscsi: use GFP_ATOMIC with dma_alloc_coherent in map_sg_data

2019-01-09 Thread Tyrel Datwyler
While mapping DMA for scatter list when a scsi command is queued the
existing call to dma_alloc_coherent() in our map_sg_data() function
passes zero for the gfp_flags parameter. We are most definitly in atomic
context at this point as queue_command() is called in softirq context
and further we have a spinlock holding the scsi host lock.

Fix this by passing GFP_ATOMIC to dma_alloc_coherent() to prevent any
sort of sleeping in atomic context deadlock.

Fixes: 4dddbc26c389 ("[SCSI] ibmvscsi: handle large scatter/gather lists")
Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 1135e74..cb8535e 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -731,7 +731,7 @@ static int map_sg_data(struct scsi_cmnd *cmd,
evt_struct->ext_list = (struct srp_direct_buf *)
dma_alloc_coherent(dev,
   SG_ALL * sizeof(struct 
srp_direct_buf),
-  _struct->ext_list_token, 0);
+  _struct->ext_list_token, 
GFP_ATOMIC);
if (!evt_struct->ext_list) {
if (!firmware_has_feature(FW_FEATURE_CMO))
sdev_printk(KERN_ERR, cmd->device,
-- 
1.8.3.1



[PATCH] powerpc/vdso32: Drop -mabi=elfv1 for 32 bit objects

2019-01-09 Thread Joel Stanley
From: Daniel Axtens 

All 64-bit objects need to specify the flag to be compiled correctly, we
just don't need it for 32-bit objects. GCC just ignored it, but clang
doesn't.

Link: https://github.com/ClangBuiltLinux/linux/issues/240
Signed-off-by: Daniel Axtens 
Signed-off-by: Joel Stanley 
---
 arch/powerpc/kernel/vdso32/Makefile | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 50112d4473bb..6bd41756e0c7 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,20 @@ obj-y += vdso32_wrapper.o
 extra-y += vdso32.lds
 CPPFLAGS_vdso32.lds += -P -C -Upowerpc
 
+# clang refuses to accept -mabi=elfv1 for when using the
+# 64-bit target in 32-bit mode
+ifdef CONFIG_CC_IS_CLANG
+ifdef CONFIG_PPC64
+AFLAGS_REMOVE_getcpu.o += -mabi=elfv1
+endif
+AFLAGS_REMOVE_sigtramp.o += -mabi=elfv1
+AFLAGS_REMOVE_gettimeofday.o += -mabi=elfv1
+AFLAGS_REMOVE_datapage.o += -mabi=elfv1
+AFLAGS_REMOVE_cacheflush.o += -mabi=elfv1
+AFLAGS_REMOVE_note.o += -mabi=elfv1
+endif
+
+
 # Force dependency (incbin is bad)
 $(obj)/vdso32_wrapper.o : $(obj)/vdso32.so
 
-- 
2.19.1



Re: [PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output

2019-01-09 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 08/01/2019 à 13:21, Christophe Leroy a écrit :
>> Le 08/01/2019 à 13:05, Michael Ellerman a écrit :
>>> The page size the kernel is built with is useful info when debugging a
>>> crash, so add it to the output in __die().
>>>
>>> Result looks like eg:
>>>
>>>    kernel BUG at drivers/misc/lkdtm/bugs.c:63!
>>>    Oops: Exception in kernel mode, sig: 5 [#1]
>>>    LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries
>>>    Modules linked in: vmx_crypto kvm binfmt_misc ip_tables
>>>
>>> Signed-off-by: Michael Ellerman 
>>> ---
>>>   arch/powerpc/kernel/traps.c | 12 
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>> index 431a86d3f772..fc972e4eee5f 100644
>>> --- a/arch/powerpc/kernel/traps.c
>>> +++ b/arch/powerpc/kernel/traps.c
>>> @@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs 
>>> *regs, long err)
>>>   else
>>>   seq_buf_puts(, "BE ");
>>> +    seq_buf_puts(, "PAGE_SIZE=");
>>> +    if (IS_ENABLED(CONFIG_PPC_4K_PAGES))
>>> +    seq_buf_puts(, "4K ");
>>> +    else if (IS_ENABLED(CONFIG_PPC_16K_PAGES))
>>> +    seq_buf_puts(, "16K ");
>>> +    else if (IS_ENABLED(CONFIG_PPC_64K_PAGES))
>>> +    seq_buf_puts(, "64K ");
>>> +    else if (IS_ENABLED(CONFIG_PPC_256K_PAGES))
>>> +    seq_buf_puts(, "256K ");
>> 
>> Can't we build all the above at once using PAGE_SHIFT ?
>> 
>> Something like (untested):
>> 
>> "%dK ", 1 << (PAGE_SHIFT - 10)
>
> Or even simplier:
>
> "%dK ", PAGE_SIZE / 1024

Yep, good point.

Clearly I have forgotten how to program over the break (if I ever knew).

cheers


Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()

2019-01-09 Thread Andrew Donnellan

On 10/1/19 2:13 am, Frederic Barrat wrote:

With a recent change around IOMMU group, a system with an opencapi
adapter is no longer booting and we get a kernel oops:

BUG: Kernel NULL pointer dereference at 0x0028
Faulting instruction address: 0xc00aa38c
Oops: Kernel access of bad area, sig: 7 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12
NIP:  c00aa38c LR: c00a6608 CTR: c0097480
REGS: c5783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-1-g3bd6
MSR:  92009033   CR: 28000228  XER: 20
CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0
GPR00: c00a6608 c5783990 c1036100 c007bf761860
GPR04:  c5783834  
GPR08: 69626d2c6e707500   92001003
GPR12:  c007bfff8300 c0010450 
GPR16: c0ced938 0100 c0ced948 000a
GPR20: 000bfffe c0ced9a8 0200 c0ced978
GPR24: 006080c0 c00716d09828 c0002e6fd000 
GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860
NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660
Call Trace:
[c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x
[c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660
[c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c
[c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4
[c5783c10] [c0010054] do_one_initcall+0x64/0x264
[c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468
[c5783db0] [c0010474] kernel_init+0x2c/0x148
[c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68

An opencapi device is using a device PE, so the current code breaks
because pe->pbus is not defined.

More generally, there's no need to define an IOMMU group for opencapi,
as the device sends real addresses directly (admittedly, the
virtualization story is yet to be written). So let's fix it by
skipping the IOMMU group setup for opencapi PHBs.

Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
Signed-off-by: Frederic Barrat 


Reviewed-by: Andrew Donnellan 


---
  arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d6406a051f1..7db3119f8a5b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
list_for_each_entry(hose, _list, list_node) {
phb = hose->private_data;
  
-		if (phb->type == PNV_PHB_NPU_NVLINK)

+   if (phb->type == PNV_PHB_NPU_NVLINK ||
+   phb->type == PNV_PHB_NPU_OCAPI)
continue;
  
  		list_for_each_entry(pe, >ioda.pe_list, list) {




--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



[PATCH] powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.

2019-01-09 Thread Christophe Leroy
Commit 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer
dereference") moved the loading of r6 earlier in the code. As some
functions are called inbetween, r6 needs to be loaded again with the
address of swapper_pg_dir in order to set PTE pointers for
the Abatron BDI.

Fixes: 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer 
dereference")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index aea5f367e4fe..ab0e6f1c98b0 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -885,11 +885,12 @@ start_here:
 
/* set up the PTE pointers for the Abatron bdiGDB.
*/
-   tovirt(r6,r6)
lis r5, abatron_pteptrs@h
ori r5, r5, abatron_pteptrs@l
stw r5, 0xf0(0) /* Must match your Abatron config file */
tophys(r5,r5)
+   lis r6, swapper_pg_dir@h
+   ori r6, r6, swapper_pg_dir@l
stw r6, 0(r5)
 
 /* Now turn on the MMU for real! */
-- 
2.13.3



Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference

2019-01-09 Thread Kees Cook
On Wed, Jan 9, 2019 at 7:16 AM Kees Cook  wrote:
>
> On Tue, Jan 8, 2019 at 10:31 PM Christophe Leroy
>  wrote:
> >
> >
> >
> > Le 09/01/2019 à 02:14, Kees Cook a écrit :
> > > On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy
> > >  wrote:
> > >>
> > >> Introduce lkdtm tests for NULL pointer dereference: check
> > >> access or exec at NULL address.
> > >
> > > Why is this not already covered by the existing tests? (Is there
> > > something special about NULL that is being missed?) I'd expect SMAP
> > > and SMEP to cover NULL as well.
> >
> > Most arches print a different message whether the faulty address is
> > above or under PAGE_SIZE. Below is exemple from x86:
> >
> > pr_alert("BUG: unable to handle kernel %s at %px\n",
> >  address < PAGE_SIZE ? "NULL pointer dereference" : "paging 
> > request",
> >  (void *)address);
> >
> >
> > Until recently, the powerpc arch didn't do it. When I implemented it
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49a502ea23bf9dec47f8f3c3960909ff409cd1bb),
> > I needed a way to test it and couldn't find an existing one, hence this
> > new LKDTM test.
> >
> > But maybe I missed something ?
>
> Okay, gotcha. You're getting more complete reporting coverage. Sounds
> good to me. Thanks!
>
> Acked-by: Kees Cook 

Applied to my lkdtm -next tree.

-- 
Kees Cook


Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()

2019-01-09 Thread Greg Kurz
On Wed, 9 Jan 2019 17:45:53 +0100
Frederic Barrat  wrote:

> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed,  9 Jan 2019 16:13:42 +0100
> > Frederic Barrat  wrote:
> >   
> >> With a recent change around IOMMU group, a system with an opencapi
> >> adapter is no longer booting and we get a kernel oops:
> >>
> >> BUG: Kernel NULL pointer dereference at 0x0028
> >> Faulting instruction address: 0xc00aa38c
> >> Oops: Kernel access of bad area, sig: 7 [#1]
> >> LE SMP NR_CPUS=2048 NUMA PowerNV
> >> Modules linked in:
> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12
> >> NIP:  c00aa38c LR: c00a6608 CTR: c0097480
> >> REGS: c5783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-1-g3bd6
> >> MSR:  92009033   CR: 28000228  XER: 20
> >> CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0
> >> GPR00: c00a6608 c5783990 c1036100 c007bf761860
> >> GPR04:  c5783834  
> >> GPR08: 69626d2c6e707500   92001003
> >> GPR12:  c007bfff8300 c0010450 
> >> GPR16: c0ced938 0100 c0ced948 000a
> >> GPR20: 000bfffe c0ced9a8 0200 c0ced978
> >> GPR24: 006080c0 c00716d09828 c0002e6fd000 
> >> GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860
> >> NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> >> LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> >> Call Trace:
> >> [c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> >> [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660
> >> [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> >> [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4
> >> [c5783c10] [c0010054] do_one_initcall+0x64/0x264
> >> [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468
> >> [c5783db0] [c0010474] kernel_init+0x2c/0x148
> >> [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68
> >>
> >> An opencapi device is using a device PE, so the current code breaks
> >> because pe->pbus is not defined.
> >>
> >> More generally, there's no need to define an IOMMU group for opencapi,
> >> as the device sends real addresses directly (admittedly, the
> >> virtualization story is yet to be written). So let's fix it by  
> > 
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> >   
> >> skipping the IOMMU group setup for opencapi PHBs.
> >>
> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> >> Signed-off-by: Frederic Barrat 
> >> ---  
> > 
> > Reviewed-by: Greg Kurz 
> > 
> > and
> > 
> > Cc: sta...@vger.kernel.org  # v4.20  
> 
> Thanks for the review! But why did you add stable? that problem is only 
> seen on 5.0-rc1, isn't it?
> 

Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
tested :)

>Fred
> 
> 
> >>   arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> >> b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> index 1d6406a051f1..7db3119f8a5b 100644
> >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
> >>list_for_each_entry(hose, _list, list_node) {
> >>phb = hose->private_data;
> >>   
> >> -  if (phb->type == PNV_PHB_NPU_NVLINK)
> >> +  if (phb->type == PNV_PHB_NPU_NVLINK ||
> >> +  phb->type == PNV_PHB_NPU_OCAPI)
> >>continue;
> >>   
> >>list_for_each_entry(pe, >ioda.pe_list, list) {  
> >   
> 



Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()

2019-01-09 Thread Greg KH
On Wed, Jan 09, 2019 at 05:45:53PM +0100, Frederic Barrat wrote:
> 
> 
> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed,  9 Jan 2019 16:13:42 +0100
> > Frederic Barrat  wrote:
> > 
> > > With a recent change around IOMMU group, a system with an opencapi
> > > adapter is no longer booting and we get a kernel oops:
> > > 
> > > BUG: Kernel NULL pointer dereference at 0x0028
> > > Faulting instruction address: 0xc00aa38c
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > LE SMP NR_CPUS=2048 NUMA PowerNV
> > > Modules linked in:
> > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 
> > > 5.0.0-rc1-fxb-1-g3bd6e94bec12
> > > NIP:  c00aa38c LR: c00a6608 CTR: c0097480
> > > REGS: c5783700 TRAP: 0300   Not tainted  
> > > (5.0.0-rc1-fxb-1-g3bd6
> > > MSR:  92009033   CR: 28000228  XER: 
> > > 20
> > > CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0
> > > GPR00: c00a6608 c5783990 c1036100 c007bf761860
> > > GPR04:  c5783834  
> > > GPR08: 69626d2c6e707500   92001003
> > > GPR12:  c007bfff8300 c0010450 
> > > GPR16: c0ced938 0100 c0ced948 000a
> > > GPR20: 000bfffe c0ced9a8 0200 c0ced978
> > > GPR24: 006080c0 c00716d09828 c0002e6fd000 
> > > GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860
> > > NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> > > LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> > > Call Trace:
> > > [c5783990] [c00aa3d0] 
> > > pnv_try_setup_npu_table_group+0x60/0x
> > > [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660
> > > [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> > > [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4
> > > [c5783c10] [c0010054] do_one_initcall+0x64/0x264
> > > [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468
> > > [c5783db0] [c0010474] kernel_init+0x2c/0x148
> > > [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68
> > > 
> > > An opencapi device is using a device PE, so the current code breaks
> > > because pe->pbus is not defined.
> > > 
> > > More generally, there's no need to define an IOMMU group for opencapi,
> > > as the device sends real addresses directly (admittedly, the
> > > virtualization story is yet to be written). So let's fix it by
> > 
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> > 
> > > skipping the IOMMU group setup for opencapi PHBs.
> > > 
> > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> > > Signed-off-by: Frederic Barrat 
> > > ---
> > 
> > Reviewed-by: Greg Kurz 
> > 
> > and
> > 
> > Cc: sta...@vger.kernel.org  # v4.20
> 
> Thanks for the review! But why did you add stable? that problem is only seen
> on 5.0-rc1, isn't it?

No, this is fixing a patch that got backported to stable.

Well, attempted to be backported, I dropped it because of the problem :)

thanks,

greg k-h


Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()

2019-01-09 Thread Frederic Barrat




Le 09/01/2019 à 17:25, Greg Kurz a écrit :

On Wed,  9 Jan 2019 16:13:42 +0100
Frederic Barrat  wrote:


With a recent change around IOMMU group, a system with an opencapi
adapter is no longer booting and we get a kernel oops:

BUG: Kernel NULL pointer dereference at 0x0028
Faulting instruction address: 0xc00aa38c
Oops: Kernel access of bad area, sig: 7 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12
NIP:  c00aa38c LR: c00a6608 CTR: c0097480
REGS: c5783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-1-g3bd6
MSR:  92009033   CR: 28000228  XER: 20
CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0
GPR00: c00a6608 c5783990 c1036100 c007bf761860
GPR04:  c5783834  
GPR08: 69626d2c6e707500   92001003
GPR12:  c007bfff8300 c0010450 
GPR16: c0ced938 0100 c0ced948 000a
GPR20: 000bfffe c0ced9a8 0200 c0ced978
GPR24: 006080c0 c00716d09828 c0002e6fd000 
GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860
NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660
Call Trace:
[c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x
[c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660
[c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c
[c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4
[c5783c10] [c0010054] do_one_initcall+0x64/0x264
[c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468
[c5783db0] [c0010474] kernel_init+0x2c/0x148
[c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68

An opencapi device is using a device PE, so the current code breaks
because pe->pbus is not defined.

More generally, there's no need to define an IOMMU group for opencapi,
as the device sends real addresses directly (admittedly, the
virtualization story is yet to be written). So let's fix it by


Current plan is to go for mediated VFIO. The real HW stays under the control
of the host ocxl driver, and we still don't need an IOMMU group.


skipping the IOMMU group setup for opencapi PHBs.

Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
Signed-off-by: Frederic Barrat 
---


Reviewed-by: Greg Kurz 

and

Cc: sta...@vger.kernel.org  # v4.20


Thanks for the review! But why did you add stable? that problem is only 
seen on 5.0-rc1, isn't it?


  Fred



  arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d6406a051f1..7db3119f8a5b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
list_for_each_entry(hose, _list, list_node) {
phb = hose->private_data;
  
-		if (phb->type == PNV_PHB_NPU_NVLINK)

+   if (phb->type == PNV_PHB_NPU_NVLINK ||
+   phb->type == PNV_PHB_NPU_OCAPI)
continue;
  
  		list_for_each_entry(pe, >ioda.pe_list, list) {






Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()

2019-01-09 Thread Greg Kurz
On Wed,  9 Jan 2019 16:13:42 +0100
Frederic Barrat  wrote:

> With a recent change around IOMMU group, a system with an opencapi
> adapter is no longer booting and we get a kernel oops:
> 
> BUG: Kernel NULL pointer dereference at 0x0028
> Faulting instruction address: 0xc00aa38c
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12
> NIP:  c00aa38c LR: c00a6608 CTR: c0097480
> REGS: c5783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-1-g3bd6
> MSR:  92009033   CR: 28000228  XER: 20
> CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0
> GPR00: c00a6608 c5783990 c1036100 c007bf761860
> GPR04:  c5783834  
> GPR08: 69626d2c6e707500   92001003
> GPR12:  c007bfff8300 c0010450 
> GPR16: c0ced938 0100 c0ced948 000a
> GPR20: 000bfffe c0ced9a8 0200 c0ced978
> GPR24: 006080c0 c00716d09828 c0002e6fd000 
> GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860
> NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> Call Trace:
> [c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> [c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660
> [c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> [c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4
> [c5783c10] [c0010054] do_one_initcall+0x64/0x264
> [c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468
> [c5783db0] [c0010474] kernel_init+0x2c/0x148
> [c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68
> 
> An opencapi device is using a device PE, so the current code breaks
> because pe->pbus is not defined.
> 
> More generally, there's no need to define an IOMMU group for opencapi,
> as the device sends real addresses directly (admittedly, the
> virtualization story is yet to be written). So let's fix it by

Current plan is to go for mediated VFIO. The real HW stays under the control
of the host ocxl driver, and we still don't need an IOMMU group.

> skipping the IOMMU group setup for opencapi PHBs.
> 
> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> Signed-off-by: Frederic Barrat 
> ---

Reviewed-by: Greg Kurz 

and

Cc: sta...@vger.kernel.org  # v4.20

>  arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 1d6406a051f1..7db3119f8a5b 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>   list_for_each_entry(hose, _list, list_node) {
>   phb = hose->private_data;
>  
> - if (phb->type == PNV_PHB_NPU_NVLINK)
> + if (phb->type == PNV_PHB_NPU_NVLINK ||
> + phb->type == PNV_PHB_NPU_OCAPI)
>   continue;
>  
>   list_for_each_entry(pe, >ioda.pe_list, list) {



Re: [PATCH v2 13/34] dt-bindings: arm: amlogic: Move 'amlogic,meson-gx-ao-secure' binding to its own file

2019-01-09 Thread Rob Herring
On Tue, Dec 4, 2018 at 10:18 PM Rob Herring  wrote:
>
> On Tue, Dec 4, 2018 at 7:01 PM Kevin Hilman  wrote:
> >
> > Rob Herring  writes:
> >
> > > It is best practice to have 1 binding per file, so board level bindings
> > > should be separate for various misc SoC bindings.
> > >
> > > Cc: Mark Rutland 
> > > Cc: Carlo Caione 
> > > Cc: Kevin Hilman 
> > > Cc: devicet...@vger.kernel.org
> > > Cc: linux-arm-ker...@lists.infradead.org
> > > Cc: linux-amlo...@lists.infradead.org
> > > Signed-off-by: Rob Herring 
> > > ---
> > >  .../devicetree/bindings/arm/amlogic.txt   | 29 ---
> > >  .../amlogic/amlogic,meson-gx-ao-secure.txt| 28 ++
> > >  2 files changed, 28 insertions(+), 29 deletions(-)
> > >  create mode 100644 
> > > Documentation/devicetree/bindings/arm/amlogic/amlogic,meson-gx-ao-secure.txt
> >
> > Acked-by: Kevin Hilman 
> >
> > But this isn't really related to the schema series is it?  If you
> > prefer, I can just queue this one separately via my tree.
>
> Yes, you can take it.

Hey Kevin, doesn't look like this got applied.

Rob


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-09 Thread Jason Gunthorpe
On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote:

> > POWER 8 firmware is good? If the link does eventually come back, is
> > the POWER8's D3 resumption timeout long enough?
> > 
> > If this doesn't lead to an obvious conclusion you'll probably need to
> > connect to IBM's Mellanox support team to get more information from
> > the card side.
> 
> We are IBM :-) So far, it seems to be that the card is doing something
> not quite right, but we don't know what. We might need to engage
> Mellanox themselves.

Sorry, it was unclear, I ment the support team for IBM inside Mellanox
..

There might be internal debugging available that can show if the card
is detecting the beacon, how far it gets in renegotiation, etc.

>From all the mails it really has the feel of a PCI-E interop problem between
these two specific chips..

Jason


Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference

2019-01-09 Thread Kees Cook
On Tue, Jan 8, 2019 at 10:31 PM Christophe Leroy
 wrote:
>
>
>
> Le 09/01/2019 à 02:14, Kees Cook a écrit :
> > On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy
> >  wrote:
> >>
> >> Introduce lkdtm tests for NULL pointer dereference: check
> >> access or exec at NULL address.
> >
> > Why is this not already covered by the existing tests? (Is there
> > something special about NULL that is being missed?) I'd expect SMAP
> > and SMEP to cover NULL as well.
>
> Most arches print a different message whether the faulty address is
> above or under PAGE_SIZE. Below is exemple from x86:
>
> pr_alert("BUG: unable to handle kernel %s at %px\n",
>  address < PAGE_SIZE ? "NULL pointer dereference" : "paging 
> request",
>  (void *)address);
>
>
> Until recently, the powerpc arch didn't do it. When I implemented it
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49a502ea23bf9dec47f8f3c3960909ff409cd1bb),
> I needed a way to test it and couldn't find an existing one, hence this
> new LKDTM test.
>
> But maybe I missed something ?

Okay, gotcha. You're getting more complete reporting coverage. Sounds
good to me. Thanks!

Acked-by: Kees Cook 

-Kees

>
> Christophe
>
> >
> > -Kees
> >
> >>
> >> Signed-off-by: Christophe Leroy 
> >> ---
> >>   drivers/misc/lkdtm/core.c  |  2 ++
> >>   drivers/misc/lkdtm/lkdtm.h |  2 ++
> >>   drivers/misc/lkdtm/perms.c | 18 ++
> >>   3 files changed, 22 insertions(+)
> >>
> >> diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
> >> index bc76756b7eda..36910e1d5c09 100644
> >> --- a/drivers/misc/lkdtm/core.c
> >> +++ b/drivers/misc/lkdtm/core.c
> >> @@ -157,7 +157,9 @@ static const struct crashtype crashtypes[] = {
> >>  CRASHTYPE(EXEC_VMALLOC),
> >>  CRASHTYPE(EXEC_RODATA),
> >>  CRASHTYPE(EXEC_USERSPACE),
> >> +   CRASHTYPE(EXEC_NULL),
> >>  CRASHTYPE(ACCESS_USERSPACE),
> >> +   CRASHTYPE(ACCESS_NULL),
> >>  CRASHTYPE(WRITE_RO),
> >>  CRASHTYPE(WRITE_RO_AFTER_INIT),
> >>  CRASHTYPE(WRITE_KERN),
> >> diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
> >> index 3c6fd327e166..b69ee004a3f7 100644
> >> --- a/drivers/misc/lkdtm/lkdtm.h
> >> +++ b/drivers/misc/lkdtm/lkdtm.h
> >> @@ -45,7 +45,9 @@ void lkdtm_EXEC_KMALLOC(void);
> >>   void lkdtm_EXEC_VMALLOC(void);
> >>   void lkdtm_EXEC_RODATA(void);
> >>   void lkdtm_EXEC_USERSPACE(void);
> >> +void lkdtm_EXEC_NULL(void);
> >>   void lkdtm_ACCESS_USERSPACE(void);
> >> +void lkdtm_ACCESS_NULL(void);
> >>
> >>   /* lkdtm_refcount.c */
> >>   void lkdtm_REFCOUNT_INC_OVERFLOW(void);
> >> diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
> >> index fa54add6375a..62f76d506f04 100644
> >> --- a/drivers/misc/lkdtm/perms.c
> >> +++ b/drivers/misc/lkdtm/perms.c
> >> @@ -164,6 +164,11 @@ void lkdtm_EXEC_USERSPACE(void)
> >>  vm_munmap(user_addr, PAGE_SIZE);
> >>   }
> >>
> >> +void lkdtm_EXEC_NULL(void)
> >> +{
> >> +   execute_location(NULL, CODE_AS_IS);
> >> +}
> >> +
> >>   void lkdtm_ACCESS_USERSPACE(void)
> >>   {
> >>  unsigned long user_addr, tmp = 0;
> >> @@ -195,6 +200,19 @@ void lkdtm_ACCESS_USERSPACE(void)
> >>  vm_munmap(user_addr, PAGE_SIZE);
> >>   }
> >>
> >> +void lkdtm_ACCESS_NULL(void)
> >> +{
> >> +   unsigned long tmp;
> >> +   unsigned long *ptr = (unsigned long *)NULL;
> >> +
> >> +   pr_info("attempting bad read at %px\n", ptr);
> >> +   tmp = *ptr;
> >> +   tmp += 0xc0dec0de;
> >> +
> >> +   pr_info("attempting bad write at %px\n", ptr);
> >> +   *ptr = tmp;
> >> +}
> >> +
> >>   void __init lkdtm_perms_init(void)
> >>   {
> >>  /* Make sure we can write to __ro_after_init values during __init 
> >> */
> >> --
> >> 2.13.3
> >>
> >
> >



-- 
Kees Cook


[PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()

2019-01-09 Thread Frederic Barrat
With a recent change around IOMMU group, a system with an opencapi
adapter is no longer booting and we get a kernel oops:

BUG: Kernel NULL pointer dereference at 0x0028
Faulting instruction address: 0xc00aa38c
Oops: Kernel access of bad area, sig: 7 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-1-g3bd6e94bec12
NIP:  c00aa38c LR: c00a6608 CTR: c0097480
REGS: c5783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-1-g3bd6
MSR:  92009033   CR: 28000228  XER: 20
CFAR: c00a6604 DAR: 0028 DSISR: 0008 IRQMASK: 0
GPR00: c00a6608 c5783990 c1036100 c007bf761860
GPR04:  c5783834  
GPR08: 69626d2c6e707500   92001003
GPR12:  c007bfff8300 c0010450 
GPR16: c0ced938 0100 c0ced948 000a
GPR20: 000bfffe c0ced9a8 0200 c0ced978
GPR24: 006080c0 c00716d09828 c0002e6fd000 
GPR28: c007bf4aff68 c007bf8d0080 c0f23938 c007bf761860
NIP [c00aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
LR [c00a6608] pnv_pci_ioda_fixup+0x1f8/0x660
Call Trace:
[c5783990] [c00aa3d0] pnv_try_setup_npu_table_group+0x60/0x
[c57839d0] [c00a661c] pnv_pci_ioda_fixup+0x20c/0x660
[c5783ab0] [c0e1d4c0] pcibios_resource_survey+0x2c8/0x31c
[c5783b90] [c0e1caf4] pcibios_init+0xb0/0xe4
[c5783c10] [c0010054] do_one_initcall+0x64/0x264
[c5783ce0] [c0e1132c] kernel_init_freeable+0x36c/0x468
[c5783db0] [c0010474] kernel_init+0x2c/0x148
[c5783e20] [c000b794] ret_from_kernel_thread+0x5c/0x68

An opencapi device is using a device PE, so the current code breaks
because pe->pbus is not defined.

More generally, there's no need to define an IOMMU group for opencapi,
as the device sends real addresses directly (admittedly, the
virtualization story is yet to be written). So let's fix it by
skipping the IOMMU group setup for opencapi PHBs.

Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
Signed-off-by: Frederic Barrat 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d6406a051f1..7db3119f8a5b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
list_for_each_entry(hose, _list, list_node) {
phb = hose->private_data;
 
-   if (phb->type == PNV_PHB_NPU_NVLINK)
+   if (phb->type == PNV_PHB_NPU_NVLINK ||
+   phb->type == PNV_PHB_NPU_OCAPI)
continue;
 
list_for_each_entry(pe, >ioda.pe_list, list) {
-- 
2.19.1



Re: [Bug 202149] New: NULL Pointer Dereference in __split_huge_pmd on PPC64LE

2019-01-09 Thread Matt Corallo
It's normal daily usage on a workstation (TALOS 2). I've seen it at least 
twice, both times in rustc, though I've run rustc more times than I can count. 
Note that the program that triggered it was running in lxc and it only happened 
after upgrading to 4.19.

> On Jan 9, 2019, at 06:50, Aneesh Kumar K.V  wrote:
> 
> Matt Corallo  writes:
> 
>> .config follows. I have not tested with 64K pages as, sadly, I have a 
>> large BTRFS volume that was formatted on x86, and am thus stuck with 4K 
>> pages. Note that this is roughly the Debian kernel, so it has whatever 
>> patches Debian defaults to applying, a list of which follows.
>> 
> 
> What is the test you are running? I tried a 4K page size config on P9. I
> am running ltp test suite there. Also tried few thp memremap tests.
> Nothing hit that.
> 
> root@:~/tests/ltp/testcases/kernel/mem/thp# getconf  PAGESIZE
> 4096
> root@ltc-boston123:~/tests/ltp/testcases/kernel/mem/thp# grep thp 
> /proc/vmstat 
> thp_fault_alloc 641141
> thp_fault_fallback 0
> thp_collapse_alloc 90
> thp_collapse_alloc_failed 0
> thp_file_alloc 0
> thp_file_mapped 0
> thp_split_page 1
> thp_split_page_failed 0
> thp_deferred_split_page 641150
> thp_split_pmd 24
> thp_zero_page_alloc 1
> thp_zero_page_alloc_failed 0
> thp_swpout 0
> thp_swpout_fallback 0
> root@:~/tests/ltp/testcases/kernel/mem/thp# 
> 
> -aneesh
> 



Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls

2019-01-09 Thread Cédric Le Goater
On 1/9/19 2:08 PM, Michael Ellerman wrote:
> Cédric Le Goater  writes:
> 
>> These flags are shared between Linux/KVM implementing the hypervisor
>> calls for the XIVE native exploitation mode and the driver for the
>> sPAPR guests.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/xive.h  | 23 +++
>>  arch/powerpc/sysdev/xive/spapr.c | 28 
>>  2 files changed, 31 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/xive.h 
>> b/arch/powerpc/include/asm/xive.h
>> index 3c704f5dd3ae..32f033bfbf42 100644
>> --- a/arch/powerpc/include/asm/xive.h
>> +++ b/arch/powerpc/include/asm/xive.h
>> @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void);
>>  /* xmon hook */
>>  extern void xmon_xive_do_dump(int cpu);
>>  
>> +/*
>> + * Hcall flags shared by the sPAPR backend and KVM
>> + */
>> +
>> +/* H_INT_GET_SOURCE_INFO */
>> +#define XIVE_SPAPR_SRC_H_INT_ESBPPC_BIT(60)
>> +#define XIVE_SPAPR_SRC_LSI  PPC_BIT(61)
>> +#define XIVE_SPAPR_SRC_TRIGGER  PPC_BIT(62)
>> +#define XIVE_SPAPR_SRC_STORE_EOIPPC_BIT(63)
> 
> I have an (irrational) hatred of PPC_BIT, because it obfuscates what's
> going on and makes PPC seem weirder than it needs to be. It could at
> least be called IBM_BIT().
> 
> I know it helps people compare the code vs the documentation, but
> basically no one has the documentation, and everyone has the code.
> 
> Anyway it's not a show stopper, just a pet-peeve of mine :)

Only the define matters, I can change that back to the non-PPC_BIT
version in v2. Not a problem. 

Cheers,

C. 


[PATCH] powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM

2019-01-09 Thread Breno Leitao
Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
moved a code block around and this block uses a 'msr' variable outside of
the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared
inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when
CONFIG_PPC_TRANSACTION_MEM is not defined.

error: 'msr' undeclared (first use in this function)

This is not causing a compilation error in the mainline kernel, because
'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as
the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set:

#define MSR_TM_ACTIVE(x) 0

This patch just fixes this issue avoiding the 'msr' variable usage outside
the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the
MSR_TM_ACTIVE() definition.

Cc: sta...@vger.kernel.org
Reported-by: Christoph Biedl 
Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
Signed-off-by: Breno Leitao 
---

NB: Since stable kernels didn't cherry picked 5c784c8414fba ('powerpc/tm:
Remove msr_tm_active()), MSR_TM_ACTIVE() is not defined as 0 for
CONFIG_PPC_TRANSACTIONAL_MEM=n case, thus triggering the compilation error
above.

Tested against stable kernel 4.19.13-rc2 and problem is now fixed when
CONFIG_PPC_TRANSACTIONAL_MEM=n

 arch/powerpc/kernel/signal_64.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index daa28cb72272..8fe698162ab9 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -739,11 +739,12 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (restore_tm_sigcontexts(current, >uc_mcontext,
   _transact->uc_mcontext))
goto badframe;
-   }
+   } else
 #endif
-   /* Fall through, for non-TM restore */
-   if (!MSR_TM_ACTIVE(msr)) {
+   {
/*
+* Fall through, for non-TM restore
+*
 * Unset MSR[TS] on the thread regs since MSR from user
 * context does not have MSR active, and recheckpoint was
 * not called since restore_tm_sigcontexts() was not called
-- 
2.19.0



Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls

2019-01-09 Thread Michael Ellerman
Cédric Le Goater  writes:

> These flags are shared between Linux/KVM implementing the hypervisor
> calls for the XIVE native exploitation mode and the driver for the
> sPAPR guests.
>
> Signed-off-by: Cédric Le Goater 
> ---
>  arch/powerpc/include/asm/xive.h  | 23 +++
>  arch/powerpc/sysdev/xive/spapr.c | 28 
>  2 files changed, 31 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index 3c704f5dd3ae..32f033bfbf42 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void);
>  /* xmon hook */
>  extern void xmon_xive_do_dump(int cpu);
>  
> +/*
> + * Hcall flags shared by the sPAPR backend and KVM
> + */
> +
> +/* H_INT_GET_SOURCE_INFO */
> +#define XIVE_SPAPR_SRC_H_INT_ESB PPC_BIT(60)
> +#define XIVE_SPAPR_SRC_LSI   PPC_BIT(61)
> +#define XIVE_SPAPR_SRC_TRIGGER   PPC_BIT(62)
> +#define XIVE_SPAPR_SRC_STORE_EOI PPC_BIT(63)

I have an (irrational) hatred of PPC_BIT, because it obfuscates what's
going on and makes PPC seem weirder than it needs to be. It could at
least be called IBM_BIT().

I know it helps people compare the code vs the documentation, but
basically no one has the documentation, and everyone has the code.

Anyway it's not a show stopper, just a pet-peeve of mine :)

cheers


[PATCH 14/14] syscall_get_arch: add "struct task_struct *" argument

2019-01-09 Thread Dmitry V. Levin
This argument is required to extend the generic ptrace API with
PTRACE_GET_SYSCALL_INFO request: syscall_get_arch() is going
to be called from ptrace_request() along with syscall_get_nr(),
syscall_get_arguments(), syscall_get_error(), and
syscall_get_return_value() functions with a tracee as their argument.

The primary intent is that the triple (audit_arch, syscall_nr, arg1..arg6)
should describe what system call is being called and what its arguments
are.

Reverts: 5e937a9ae913 ("syscall_get_arch: remove useless function arguments")
Reverts: 1002d94d3076 ("syscall.h: fix doc text for syscall_get_arch()")
Reviewed-by: Andy Lutomirski  # for x86
Reviewed-by: Palmer Dabbelt 
Acked-by: Paul Moore 
Acked-by: Paul Burton  # MIPS parts
Acked-by: Michael Ellerman  (powerpc)
Acked-by: Kees Cook  # seccomp parts
Acked-by: Mark Salter  # for the c6x bit
Cc: Elvira Khabirova 
Cc: Eugene Syromyatnikov 
Cc: Oleg Nesterov 
Cc: x...@kernel.org
Cc: linux-al...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c6x-...@linux-c6x.org
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-hexa...@vger.kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: nios2-...@lists.rocketboards.org
Cc: openr...@lists.librecores.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-au...@redhat.com
Signed-off-by: Dmitry V. Levin 
---
 arch/alpha/include/asm/syscall.h  |  2 +-
 arch/arc/include/asm/syscall.h|  2 +-
 arch/arm/include/asm/syscall.h|  2 +-
 arch/arm64/include/asm/syscall.h  |  4 ++--
 arch/c6x/include/asm/syscall.h|  2 +-
 arch/csky/include/asm/syscall.h   |  2 +-
 arch/h8300/include/asm/syscall.h  |  2 +-
 arch/hexagon/include/asm/syscall.h|  2 +-
 arch/ia64/include/asm/syscall.h   |  2 +-
 arch/m68k/include/asm/syscall.h   |  2 +-
 arch/microblaze/include/asm/syscall.h |  2 +-
 arch/mips/include/asm/syscall.h   |  6 +++---
 arch/mips/kernel/ptrace.c |  2 +-
 arch/nds32/include/asm/syscall.h  |  2 +-
 arch/nios2/include/asm/syscall.h  |  2 +-
 arch/openrisc/include/asm/syscall.h   |  2 +-
 arch/parisc/include/asm/syscall.h |  4 ++--
 arch/powerpc/include/asm/syscall.h| 10 --
 arch/riscv/include/asm/syscall.h  |  2 +-
 arch/s390/include/asm/syscall.h   |  4 ++--
 arch/sh/include/asm/syscall_32.h  |  2 +-
 arch/sh/include/asm/syscall_64.h  |  2 +-
 arch/sparc/include/asm/syscall.h  |  5 +++--
 arch/unicore32/include/asm/syscall.h  |  2 +-
 arch/x86/include/asm/syscall.h|  8 +---
 arch/x86/um/asm/syscall.h |  2 +-
 arch/xtensa/include/asm/syscall.h |  2 +-
 include/asm-generic/syscall.h |  5 +++--
 kernel/auditsc.c  |  4 ++--
 kernel/seccomp.c  |  4 ++--
 30 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/arch/alpha/include/asm/syscall.h b/arch/alpha/include/asm/syscall.h
index d73a6fcb519c..11c688c1d7ec 100644
--- a/arch/alpha/include/asm/syscall.h
+++ b/arch/alpha/include/asm/syscall.h
@@ -4,7 +4,7 @@
 
 #include 
 
-static inline int syscall_get_arch(void)
+static inline int syscall_get_arch(struct task_struct *task)
 {
return AUDIT_ARCH_ALPHA;
 }
diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index c7fc4c0c3bcb..caf2697ef5b7 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -70,7 +70,7 @@ syscall_get_arguments(struct task_struct *task, struct 
pt_regs *regs,
 }
 
 static inline int
-syscall_get_arch(void)
+syscall_get_arch(struct task_struct *task)
 {
return IS_ENABLED(CONFIG_ISA_ARCOMPACT)
? (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 06dea6bce293..3940ceac0bdc 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -104,7 +104,7 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
memcpy(>ARM_r0 + i, args, n * sizeof(args[0]));
 }
 
-static inline int syscall_get_arch(void)
+static inline int syscall_get_arch(struct task_struct *task)
 {
/* ARM tasks don't change audit architectures on the fly. */
return AUDIT_ARCH_ARM;
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index ad8be16a39c9..1870df03f774 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -117,9 +117,9 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
  * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
  * AArch64 has the same system calls both on little- and 

Re: Kconfig label updates

2019-01-09 Thread Michael Ellerman
Hi Bjorn,

Bjorn Helgaas  writes:
> Hi,
>
> I want to update the PCI Kconfig labels so they're more consistent and
> useful to users, something like the patch below.  IIUC, the items
> below are all IBM-related; please correct me if not.
>
> I'd also like to expand (or remove) "RPA" because Google doesn't find
> anything about "IBM RPA", except Robotic Process Automation, which I
> think must be something else.

Yeah I think just remove it, it's not a well known term and is unlikely
to help anyone these days.

It stands for "RISC Platform Architecture", which was some kind of
specification for Power machines back in the day, but from what I can
tell it was never used in marketing or manuals much (hence so few hits
on Google).

> Is there some text expansion of RPA that we could use that would be
> meaningful to a user, i.e., something he/she might find on a nameplate
> or in a user manual?

No I don't think so.

> Ideally the PCI Kconfig labels would match the terms used in
> arch/.../Kconfig, e.g.,
>
>   config PPC_POWERNV
> bool "IBM PowerNV (Non-Virtualized) platform support"
>
>   config PPC_PSERIES
> bool "IBM pSeries & new (POWER5-based) iSeries"

TBH these are pretty unhelpful too. PowerNV is not a marketing name and
so doesn't appear anywhere much in official manuals or brochures and
it's also used on non-IBM branded machines. And pSeries & iSeries were
marketing names but are no longer used.

We should probably update that text, but we can do that later, rather
than blocking this patch.

> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index e9f78eb390d2..1c1d145bfd84 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC
> When in doubt, say N.
>  
>  config HOTPLUG_PCI_POWERNV
> - tristate "PowerPC PowerNV PCI Hotplug driver"
> + tristate "IBM PowerNV PCI Hotplug driver"

This is used in non-IBM machines as well.

So perhaps: ?

tristate "IBM/OpenPower PowerNV (bare metal) PCI Hotplug driver"

> @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV
> When in doubt, say N.
>  
>  config HOTPLUG_PCI_RPA
> - tristate "RPA PCI Hotplug driver"
> + tristate "IBM Power Systems RPA PCI Hotplug driver"

I think just drop RPA here.

>   depends on PPC_PSERIES && EEH
>   help
> Say Y here if you have a RPA system that supports PCI Hotplug.

s/RPA/IBM Power Systems/

> +   This includes the earlier pSeries and iSeries.

To be complete:
  This includes the earlier System p, System i, pSeries and iSeries.

>  
> To compile this driver as a module, choose M here: the
> module will be called rpaphp.
> @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA
> When in doubt, say N.
>  
>  config HOTPLUG_PCI_RPA_DLPAR
> - tristate "RPA Dynamic Logical Partitioning for I/O slots"
> + tristate "IBM RPA Dynamic Logical Partitioning for I/O slots"

Again just drop RPA.


cheers


[PATCH -next] powerpc/mm: Fix debugfs_simple_attr.cocci warnings

2019-01-09 Thread YueHaibing
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing 
---
 arch/powerpc/mm/hash_utils_64.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index bc6be44..22f14e1 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1889,12 +1889,13 @@ static int hpt_order_set(void *data, u64 val)
return mmu_hash_ops.resize_hpt(val);
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, 
"%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set,
+"%llu\n");
 
 static int __init hash64_debugfs(void)
 {
-   if (!debugfs_create_file("hpt_order", 0600, powerpc_debugfs_root,
-NULL, _hpt_order)) {
+   if (!debugfs_create_file_unsafe("hpt_order", 0600, powerpc_debugfs_root,
+   NULL, _hpt_order)) {
pr_err("lpar: unable to create hpt_order debugsfs file\n");
}
 







Re: [PATCH] powerpc/powernv/npu: Allocate enough memory in pnv_try_setup_npu_table_group()

2019-01-09 Thread Michael Ellerman
Dan Carpenter  writes:
> There is a typo so we accidentally allocate enough memory for a pointer
> when we wanted to allocate enough for a struct.
>
> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> Signed-off-by: Dan Carpenter 
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, I've applied this to my fixes-test tree.

Alexey can you send me an ack?

cheers

> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index d7f742ed48ba..3f58c7dbd581 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -564,7 +564,7 @@ struct iommu_table_group 
> *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>   }
>   } else {
>   /* Create a group for 1 GPU and attached NPUs for POWER8 */
> - pe->npucomp = kzalloc(sizeof(pe->npucomp), GFP_KERNEL);
> + pe->npucomp = kzalloc(sizeof(*pe->npucomp), GFP_KERNEL);
>   table_group = >npucomp->table_group;
>   table_group->ops = _npu_peers_ops;
>   iommu_register_group(table_group, hose->global_number,
> -- 
> 2.17.1


Re: [Bug 202149] New: NULL Pointer Dereference in __split_huge_pmd on PPC64LE

2019-01-09 Thread Aneesh Kumar K.V
Matt Corallo  writes:

> .config follows. I have not tested with 64K pages as, sadly, I have a 
> large BTRFS volume that was formatted on x86, and am thus stuck with 4K 
> pages. Note that this is roughly the Debian kernel, so it has whatever 
> patches Debian defaults to applying, a list of which follows.
>

What is the test you are running? I tried a 4K page size config on P9. I
am running ltp test suite there. Also tried few thp memremap tests.
Nothing hit that.

root@:~/tests/ltp/testcases/kernel/mem/thp# getconf  PAGESIZE
4096
root@ltc-boston123:~/tests/ltp/testcases/kernel/mem/thp# grep thp /proc/vmstat 
thp_fault_alloc 641141
thp_fault_fallback 0
thp_collapse_alloc 90
thp_collapse_alloc_failed 0
thp_file_alloc 0
thp_file_mapped 0
thp_split_page 1
thp_split_page_failed 0
thp_deferred_split_page 641150
thp_split_pmd 24
thp_zero_page_alloc 1
thp_zero_page_alloc_failed 0
thp_swpout 0
thp_swpout_fallback 0
root@:~/tests/ltp/testcases/kernel/mem/thp# 

-aneesh



[PATCH] powerpc/powernv/npu: Allocate enough memory in pnv_try_setup_npu_table_group()

2019-01-09 Thread Dan Carpenter
There is a typo so we accidentally allocate enough memory for a pointer
when we wanted to allocate enough for a struct.

Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
Signed-off-by: Dan Carpenter 
---
 arch/powerpc/platforms/powernv/npu-dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index d7f742ed48ba..3f58c7dbd581 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -564,7 +564,7 @@ struct iommu_table_group 
*pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
}
} else {
/* Create a group for 1 GPU and attached NPUs for POWER8 */
-   pe->npucomp = kzalloc(sizeof(pe->npucomp), GFP_KERNEL);
+   pe->npucomp = kzalloc(sizeof(*pe->npucomp), GFP_KERNEL);
table_group = >npucomp->table_group;
table_group->ops = _npu_peers_ops;
iommu_register_group(table_group, hose->global_number,
-- 
2.17.1



Re: use generic DMA mapping code in powerpc V4

2019-01-09 Thread Christian Zigotzky
Next step: a64e18ba191ba9102fb174f27d707485ffd9389c (powerpc/dma: remove 
dma_nommu_get_required_mask)


git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

git checkout a64e18ba191ba9102fb174f27d707485ffd9389c

Link to the Git: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6


Results: PASEMI onboard ethernet works and the X5000 (P5020 board) 
boots. I also successfully tested sound, hardware 3D acceleration, 
Bluetooth, network, booting with a label etc. The uImages work also in a 
virtual e5500 quad-core QEMU machine.


-- Christian


On 05 January 2019 at 5:03PM, Christian Zigotzky wrote:
Next step: c446404b041130fbd9d1772d184f24715cf2362f (powerpc/dma: 
remove dma_nommu_mmap_coherent)


git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

git checkout c446404b041130fbd9d1772d184f24715cf2362f

Output:

Note: checking out 'c446404b041130fbd9d1772d184f24715cf2362f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. 
Example:


  git checkout -b 

HEAD is now at c446404... powerpc/dma: remove dma_nommu_mmap_coherent

-

Link to the Git: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6


Result: PASEMI onboard ethernet works and the X5000 (P5020 board) boots.

-- Christian





Re: [PATCH V6 0/4] mm/kvm/vfio/ppc64: Migrate compound pages out of CMA region

2019-01-09 Thread Aneesh Kumar K.V
Andrew Morton  writes:

> On Tue,  8 Jan 2019 10:21:06 +0530 "Aneesh Kumar K.V" 
>  wrote:
>
>> ppc64 use CMA area for the allocation of guest page table (hash page table). 
>> We won't
>> be able to start guest if we fail to allocate hash page table. We have 
>> observed
>> hash table allocation failure because we failed to migrate pages out of CMA 
>> region
>> because they were pinned. This happen when we are using VFIO. VFIO on ppc64 
>> pins
>> the entire guest RAM. If the guest RAM pages get allocated out of CMA 
>> region, we
>> won't be able to migrate those pages. The pages are also pinned for the 
>> lifetime of the
>> guest.
>> 
>> Currently we support migration of non-compound pages. With THP and with the 
>> addition of
>>  hugetlb migration we can end up allocating compound pages from CMA region. 
>> This
>> patch series add support for migrating compound pages. The first path adds 
>> the helper
>> get_user_pages_cma_migrate() which pin the page making sure we migrate them 
>> out of
>> CMA region before incrementing the reference count. 
>
> Does this code do anything for architectures other than powerpc?  If
> not, should we be adding the ifdefs to avoid burdening other
> architectures with unused code?

Any architecture enabling CMA may need this. I will move most of this below
CONFIG_CMA.

-aneesh



Re: [PATCH V6 3/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get

2019-01-09 Thread Aneesh Kumar K.V
Andrea Arcangeli  writes:

> Hello,
>
> On Tue, Jan 08, 2019 at 10:21:09AM +0530, Aneesh Kumar K.V wrote:
>> @@ -187,41 +149,25 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
>> unsigned long ua,
>>  goto unlock_exit;
>>  }
>>  
>> +ret = get_user_pages_cma_migrate(ua, entries, 1, mem->hpages);
>
> In terms of gup APIs, I've been wondering if this shall become
> get_user_pages_longerm(FOLL_CMA_MIGRATE). So basically moving this
> CMA migrate logic inside get_user_pages_longerm.

Do we need the FOLL_CMA_MIGRATE flag? Wondering whether a long term pin
won't imply a CMA migrate? What is the benefit of that FOLL_CMA_MIGRATE
flags. We can do better by taking a list of pages for migration and I
guess it is much simpler if we limit that migration logic to
get_user_pages_longterm()?

I ended up with something like below. Do you suggest we should add those
isolate_lru and other details via FOLL_CMA_MIGRATE flag and do that when
we take the page reference instead of doing this by iterating the page array in
get_user_pages_longterm as in the below diff?

diff --git a/mm/gup.c b/mm/gup.c
index 05acd7e2eb22..6e8152594e83 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -13,6 +13,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1126,7 +1129,167 @@ long get_user_pages(unsigned long start, unsigned long 
nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
+
 #ifdef CONFIG_FS_DAX
+static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
+{
+   long i;
+   struct vm_area_struct *vma_prev = NULL;
+
+   for (i = 0; i < nr_pages; i++) {
+   struct vm_area_struct *vma = vmas[i];
+
+   if (vma == vma_prev)
+   continue;
+
+   vma_prev = vma;
+
+   if (vma_is_fsdax(vma))
+   return true;
+   }
+   return false;
+}
+#else
+static inline bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
+{
+   return false;
+}
+#endif
+
+#ifdef CONFIG_CMA
+static struct page *new_non_cma_page(struct page *page, unsigned long private)
+{
+   /*
+* We want to make sure we allocate the new page from the same node
+* as the source page.
+*/
+   int nid = page_to_nid(page);
+   /*
+* Trying to allocate a page for migration. Ignore allocation
+* failure warnings. We don't force __GFP_THISNODE here because
+* this node here is the node where we have CMA reservation and
+* in some case these nodes will have really less non movable
+* allocation memory.
+*/
+   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
+
+   if (PageHighMem(page))
+   gfp_mask |= __GFP_HIGHMEM;
+
+#ifdef CONFIG_HUGETLB_PAGE
+   if (PageHuge(page)) {
+   struct hstate *h = page_hstate(page);
+   /*
+* We don't want to dequeue from the pool because pool pages 
will
+* mostly be from the CMA region.
+*/
+   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   }
+#endif
+   if (PageTransHuge(page)) {
+   struct page *thp;
+   /*
+* ignore allocation failure warnings
+*/
+   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
+
+   /*
+* Remove the movable mask so that we don't allocate from
+* CMA area again.
+*/
+   thp_gfpmask &= ~__GFP_MOVABLE;
+   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
+   if (!thp)
+   return NULL;
+   prep_transhuge_page(thp);
+   return thp;
+   }
+
+   return __alloc_pages_node(nid, gfp_mask, 0);
+}
+
+static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
+   unsigned int gup_flags,
+   struct page **pages,
+   struct vm_area_struct **vmas)
+{
+   long i;
+   bool drain_allow = true;
+   bool migrate_allow = true;
+   LIST_HEAD(cma_page_list);
+
+check_again:
+   for (i = 0; i < nr_pages; i++) {
+   /*
+* If we get a page from the CMA zone, since we are going to
+* be pinning these entries, we might as well move them out
+* of the CMA zone if possible.
+*/
+   if (is_migrate_cma_page(pages[i])) {
+
+   struct page *head = compound_head(pages[i]);
+
+   if (PageHuge(head)) {
+   isolate_huge_page(head, _page_list);
+   } else {
+   if (!PageLRU(head) && drain_allow) {
+   lru_add_drain_all();
+  

Re: Kconfig label updates

2019-01-09 Thread Martin Schwidefsky
On Tue, 8 Jan 2019 16:30:24 -0600
Bjorn Helgaas  wrote:

> Hi,
> 
> I want to update the PCI Kconfig labels so they're more consistent and
> useful to users, something like the patch below.  IIUC, the items
> below are all IBM-related; please correct me if not.
> 
> I'd also like to expand (or remove) "RPA" because Google doesn't find
> anything about "IBM RPA", except Robotic Process Automation, which I
> think must be something else.
> 
> Is there some text expansion of RPA that we could use that would be
> meaningful to a user, i.e., something he/she might find on a nameplate
> or in a user manual?
> 
> Ideally the PCI Kconfig labels would match the terms used in
> arch/.../Kconfig, e.g.,
> 
>   config PPC_POWERNV
> bool "IBM PowerNV (Non-Virtualized) platform support"
> 
>   config PPC_PSERIES
> bool "IBM pSeries & new (POWER5-based) iSeries"
> 
>   config MARCH_Z900
> bool "IBM zSeries model z800 and z900"
> 
>   config MARCH_Z9_109
> bool "IBM System z9"
> 
> Bjorn
> 
> 
> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index e9f78eb390d2..1c1d145bfd84 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_POWERNV
> - tristate "PowerPC PowerNV PCI Hotplug driver"
> + tristate "IBM PowerNV PCI Hotplug driver"
>   depends on PPC_POWERNV && EEH
>   select OF_DYNAMIC
>   help
> @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_RPA
> - tristate "RPA PCI Hotplug driver"
> + tristate "IBM Power Systems RPA PCI Hotplug driver"
>   depends on PPC_PSERIES && EEH
>   help
> Say Y here if you have a RPA system that supports PCI Hotplug.
> +   This includes the earlier pSeries and iSeries.
> 
> To compile this driver as a module, choose M here: the
> module will be called rpaphp.
> @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_RPA_DLPAR
> - tristate "RPA Dynamic Logical Partitioning for I/O slots"
> + tristate "IBM RPA Dynamic Logical Partitioning for I/O slots"
>   depends on HOTPLUG_PCI_RPA
>   help
> Say Y here if your system supports Dynamic Logical Partitioning
> @@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_S390
> - bool "System z PCI Hotplug Support"
> + bool "IBM System z PCI Hotplug Support"
>   depends on S390 && 64BIT
>   help
> Say Y here if you want to use the System z PCI Hotplug
> 

The rewording of the HOTPLUG_PCI_S390 entry is fine with me.
Acked-by: Martin Schwidefsky 

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-09 Thread Alexey Kardashevskiy



On 09/01/2019 18:24, Benjamin Herrenschmidt wrote:
> On Wed, 2019-01-09 at 15:53 +1100, Alexey Kardashevskiy wrote:
>> "A PCI completion timeout occurred for an outstanding PCI-E transaction"
>> it is.
>>
>> This is how I bind the device to vfio:
>>
>> echo vfio-pci > '/sys/bus/pci/devices/:01:00.0/driver_override'
>> echo vfio-pci > '/sys/bus/pci/devices/:01:00.1/driver_override'
>> echo ':01:00.0' > '/sys/bus/pci/devices/:01:00.0/driver/unbind'
>> echo ':01:00.1' > '/sys/bus/pci/devices/:01:00.1/driver/unbind'
>> echo ':01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind
>> echo ':01:00.1' > /sys/bus/pci/drivers/vfio-pci/bind
>>
>>
>> and I noticed that EEH only happens with the last command. The order
>> (.0,.1  or .1,.0) does not matter, it seems that putting one function to
>> D3 is fine but putting another one when the first one is already in D3 -
>> produces EEH. And I do not recall ever seeing this on the firestone
>> machine. Weird.
> 
> Putting all functions into D3 is what allows the device to actually go
> into D3.
> 
> Does it work with other devices ?

Works fine with on the very same garrison:

0009:07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719
Gigabit Ethernet PCIe (rev 01)
0009:07:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719
Gigabit Ethernet PCIe (rev 01)

Bizarre.

> We do have that bug on early P9
> revisions where the attempt of bringing the link to L1 as part of the
> D3 process fails in horrible ways, I thought P8 would be ok but maybe
> not ...

> Otherwise, it might be that our timeouts are too low (you may want to
> talk to our PCIe guys internally)

This increases "Outbound non-posted transactions timeout configuration"
from 16ms to 1s and does not help anyway:


diff --git a/hw/phb3.c b/hw/phb3.c
index 38b8f46..cb14909 100644
--- a/hw/phb3.c
+++ b/hw/phb3.c
@@ -4065,7 +4065,7 @@ static void phb3_init_utl(struct phb3 *p)
/* Init_82: PCI Express port control
 * SW283991: Set Outbound Non-Posted request timeout to 16ms (RTOS).
 */
-   out_be64(p->regs + UTL_PCIE_PORT_CONTROL,
0x85880070);
+   out_be64(p->regs + UTL_PCIE_PORT_CONTROL,
0x858800d0);

-- 
Alexey


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2019-01-09 Thread Alexey Kardashevskiy



On 09/01/2019 18:25, Benjamin Herrenschmidt wrote:
> On Wed, 2019-01-09 at 17:32 +1100, Alexey Kardashevskiy wrote:
>> I have just moved the "Mellanox Technologies MT27700 Family
>> [ConnectX-4]" from garrison to firestone machine and there it does not
>> produce an EEH, with the same kernel and skiboot (both upstream + my
>> debug). Hm. I cannot really blame the card but I cannot see what could
>> cause the difference in skiboot either. I even tried disabling NPU so
>> garrison would look like firestone, still EEH'ing.
> 
> The systems have a different chip though, firestone is P8 and garrison
> is P8', which a slightly different PHB revision. Worth checking if we
> have anything significantly different in our inits and poke at the HW
> guys.

Nope, we do not have anything different for these machines. Asking HW
guys never worked for me :-/

I think the easiest is just doing what we did for PHB4 and ignoring
these D3 requests on garrisons.


> BTW. Are the cards behind a switch in either case ?


No, directly connected to the root on both:

garrison:

:00:00.0 PCI bridge: IBM Device 03dc (rev ff)
:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4] (rev ff)
:01:00.1 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4] (rev ff)

firestone (phb #0 is taken by nvidia gpu):

0001:00:00.0 PCI bridge: IBM POWER8 Host Bridge (PHB3)
0001:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4]
0001:01:00.1 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4]


-- 
Alexey