date:20181018

Re: [PATCH 4/8] PCI: consolidate PCI config entry in drivers/pci

2018-10-18 Thread Max Filippov

Hi Christoph,

On Wed, Oct 17, 2018 at 1:03 AM Christoph Hellwig  wrote:
> diff --git a/arch/xtensa/configs/iss_defconfig 
> b/arch/xtensa/configs/iss_defconfig
> index 4bb5b76d9524..818849bb7736 100644
> --- a/arch/xtensa/configs/iss_defconfig
> +++ b/arch/xtensa/configs/iss_defconfig
> @@ -4,7 +4,7 @@ CONFIG_EXPERT=y
>  CONFIG_SYSCTL_SYSCALL=y
>  # CONFIG_IOSCHED_DEADLINE is not set
>  # CONFIG_IOSCHED_CFQ is not set
> -# CONFIG_PCI is not set
> +CONFIG_PCI=y

This change doesn't make sense as there's no PCI hardware in the Xtensa ISS.
Otherwise for the Xtensa part:
Acked-by: Max Filippov 

-- 
Thanks.
-- Max

Re: [PATCH 4/8] PCI: consolidate PCI config entry in drivers/pci

2018-10-18 Thread Masahiro Yamada

On Wed, Oct 17, 2018 at 5:04 PM Christoph Hellwig  wrote:
>
> There is no good reason to duplicate the PCI menu in every architecture.
> Instead provide a selectable HAS_PCI symbol that indicates availability

HAS_PCI -> HAVE_PCI

> of PCI support and the handle the rest in drivers/pci.
>
> Note that for powerpc we now select HAVE_PCI globally instead of the
> convoluted mess of conditional or or non-conditional support per board,
> similar to what we do e.g. on x86.  For alpha PCI is selected for the
> non-jensen configs as it was the default before, and a lot of code does
> not compile without PCI enabled.  On other architectures with limited
> PCI support that wasn't as complicated I've left the selection as-is.
>
> Signed-off-by: Christoph Hellwig 
> Acked-by: Thomas Gleixner 
> Acked-by: Bjorn Helgaas 

Just in case, could you double-check these?
PCI_ENDPOINT
PCI_ENDPOINT_CONFIGFS
PCI_EPF_TEST

Previously, architecture without "source drivers/pci/Kconfig"
could not enable PCI_ENDPOINT.

Now, any architecture can enable it
regardless of its actual PCI availability
because PCI_ENDPOINT is only guarded by HAS_DMA.

We could add 'depends on HAVE_PCI' or something
to guard it to avoid changing the logic.

config PCI_ENDPOINT
bool "PCI Endpoint Support"
depends on HAVE_PCI # Is this correct ??
depends on HAS_DMA

or better to have 'depends on PCI' ?

PCI ML is also CC'ed, so comments are appreciated.

-- 
Best Regards
Masahiro Yamada

Re: [PATCH v5 00/18] of: overlay: validation checks, subsequent fixes

2018-10-18 Thread Frank Rowand

On 10/18/18 15:46, frowand.l...@gmail.com wrote:
> From: Frank Rowand 
> 
> Add checks to (1) overlay apply process and (2) memory freeing
> triggered by overlay release.  The checks are intended to detect
> possible memory leaks and invalid overlays.
> 
> The checks revealed bugs in existing code.  Fixed the bugs.
> 
> While fixing bugs, noted other issues, which are fixed in
> separate patches.

git version of the series:

git://git.kernel.org/pub/scm/linux/kernel/git/frowand/linux.git

$ git checkout v4.19-rc1--kfree_validate--v5

$ git log --oneline v4.19-rc1..
62e8f28bb14b of: unittest: initialize args before calling of_*parse_*()
cc8b630f0c7f of: unittest: find overlays[] entry by name instead of index
b80a8e974e0f of: unittest: allow base devicetree to have symbol metadata
bbcd6ead8e36 of: overlay: set node fields from properties when add new overlay 
node
e02d06f99646 of: unittest: remove unused of_unittest_apply_overlay() argument
484ba7f7eb4a of: overlay: check prevents multiple fragments touching same 
property
4640b81a605b of: overlay: check prevents multiple fragments add or delete same 
node
698f942ee230 of: overlay: test case of two fragments adding same node
5fe758e00f1f of: overlay: make all pr_debug() and pr_err() messages unique
868c6f70eed5 of: overlay: validate overlay properties #address-cells and 
#size-cells
06bc44ce477f of: overlay: reorder fields in struct fragment
584f4537377c of: dynamic: change type of of_{at,de}tach_node() to void
54f30ea3bf65 of: overlay: do not duplicate properties from overlay for new nodes
ad4180c300fc of: overlay: use prop add changeset entry for property in new nodes
b1bdca739700 powerpc/pseries: add of_node_put() in dlpar_detach_node()
8e0290d5cb62 of: overlay: add missing of_node_get() in __of_attach_node_sysfs
93e221495a9f of: overlay: add missing of_node_put() after add new node to 
changeset
86043f08e539 of: overlay: add tests to validate kfrees from overlay removal

Re: [PATCH 6/8] rapidio: consolidate RAPIDIO config entry in drivers/rapidio

2018-10-18 Thread Masahiro Yamada

On Wed, Oct 17, 2018 at 5:03 PM Christoph Hellwig  wrote:
>
> There is no good reason to duplicate the RAPIDIO menu in various
> architectures.  Instead provide a selectable HAS_RAPIDIO symbol

Nit.

HAS_RAPIDIO -> HAVE_RAPIDIO.



> that indicates native availability of RAPIDIO support and the handle
> the rest in drivers/pci.

Copy-paste mistake?

drivers/pci -> drivers/rapidio



>  This also means we now provide support
> for PCI(e) to Rapidio bridges for every architecture instead of a
> limited subset.
>
> Signed-off-by: Christoph Hellwig 
> Acked-by: Thomas Gleixner 




-- 
Best Regards
Masahiro Yamada

Re: [PATCH 7/8] eisa: consolidate EISA Kconfig entry in drivers/eisa

2018-10-18 Thread Masahiro Yamada

On Wed, Oct 17, 2018 at 5:03 PM Christoph Hellwig  wrote:
>
> Let architectures opt into EISA support by selecting HAS_EISA and

Nit.

HAS_EISA -> HAVE_EISA
since you renamed it in this version.



> handle everything else in drivers/eisa.
>
> Signed-off-by: Christoph Hellwig 
> Acked-by: Thomas Gleixner 
> ---
-- 
Best Regards
Masahiro Yamada

Re: [PATCH 7/8] eisa: consolidate EISA Kconfig entry in drivers/eisa

2018-10-18 Thread Masahiro Yamada

On Wed, Oct 17, 2018 at 5:03 PM Christoph Hellwig  wrote:
>
> Let architectures opt into EISA support by selecting HAS_EISA and
> handle everything else in drivers/eisa.
>
> Signed-off-by: Christoph Hellwig 
> Acked-by: Thomas Gleixner 
> ---


> index 60e37b9a715d..c90a1a4d6079 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -59,6 +59,7 @@ config ARM
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARM_SMCCC if CPU_V7
> select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
> +   select HAVE_EISA

I doubt this.

arch/arm/Kconfig previously did not include
driver/eisa/Kconfig.

No ARM platform enabled CONFIG_EISA either.


> select HAVE_CONTEXT_TRACKING
> select HAVE_C_RECORDMCOUNT
> select HAVE_DEBUG_KMEMLEAK
> @@ -162,21 +163,6 @@ config HAVE_PROC_CPU
>  config NO_IOPORT_MAP
> bool
>
> -config EISA
> -   bool
> -   ---help---
> - The Extended Industry Standard Architecture (EISA) bus was
> - developed as an open alternative to the IBM MicroChannel bus.
> -
> - The EISA bus provided some of the features of the IBM MicroChannel
> - bus while maintaining backward compatibility with cards made for
> - the older ISA bus.  The EISA bus saw limited use between 1988 and
> - 1995 when it was made obsolete by the PCI bus.
> -
> - Say Y here if you are building a kernel for an EISA-based machine.
> -
> - Otherwise, say N.
> -
>  config SBUS
> bool
>


-- 
Best Regards
Masahiro Yamada

[PATCH] powerpc/time: Fix clockevent_decrementer initalisation for PR KVM

2018-10-18 Thread Michael Ellerman

In the recent commit 8b78fdb045de ("powerpc/time: Use
clockevents_register_device(), fixing an issue with large
decrementer") we changed the way we initialise the decrementer
clockevent(s).

We no longer initialise the mult & shift values of
decrementer_clockevent itself.

This has the effect of breaking PR KVM, because it uses those values
in kvmppc_emulate_dec(). The symptom is guest kernels spin forever
mid-way through boot.

For now fix it by assigning back to decrementer_clockevent the mult
and shift values.

Fixes: 8b78fdb045de ("powerpc/time: Use clockevents_register_device(), fixing 
an issue with large decrementer")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/time.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 40868f3ee113..68e8f963d108 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -989,6 +989,10 @@ static void register_decrementer_clockevent(int cpu)
 
printk_once(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
dec->name, dec->mult, dec->shift, cpu);
+
+   /* Set values for KVM, see kvm_emulate_dec() */
+   decrementer_clockevent.mult = dec->mult;
+   decrementer_clockevent.shift = dec->shift;
 }
 
 static void enable_large_decrementer(void)
-- 
2.17.2

[PATCH 6/6] powerpc/mm/radix: Display if mappings are exec or not

2018-10-18 Thread Michael Ellerman

At boot we print the ranges we've mapped for the linear mapping and
what page size we've used. Also track whether the range is mapped
executable or not and display that as well.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/pgtable-radix.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 0e87733eed80..931156069a81 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -241,9 +241,8 @@ void radix__mark_initmem_nx(void)
 }
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
-static inline void __meminit print_mapping(unsigned long start,
-  unsigned long end,
-  unsigned long size)
+static inline void __meminit
+print_mapping(unsigned long start, unsigned long end, unsigned long size, bool 
exec)
 {
char buf[10];
 
@@ -252,7 +251,8 @@ static inline void __meminit print_mapping(unsigned long 
start,
 
string_get_size(size, 1, STRING_UNITS_2, buf, sizeof(buf));
 
-   pr_info("Mapped 0x%016lx-0x%016lx with %s pages\n", start, end, buf);
+   pr_info("Mapped 0x%016lx-0x%016lx with %s pages%s\n", start, end, buf,
+   exec ? " (exec)" : "");
 }
 
 static unsigned long next_boundary(unsigned long addr, unsigned long end)
@@ -269,6 +269,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 int nid)
 {
unsigned long vaddr, addr, mapping_size = 0;
+   bool prev_exec, exec = false;
pgprot_t prot;
int psize;
 
@@ -279,6 +280,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 
gap = next_boundary(addr, end) - addr;
previous_size = mapping_size;
+   prev_exec = exec;
 
if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
mmu_psize_defs[MMU_PAGE_1G].shift) {
@@ -293,18 +295,21 @@ static int __meminit create_physical_mapping(unsigned 
long start,
psize = mmu_virtual_psize;
}
 
-   if (mapping_size != previous_size) {
-   print_mapping(start, addr, previous_size);
-   start = addr;
-   }
-
vaddr = (unsigned long)__va(addr);
 
if (overlaps_kernel_text(vaddr, vaddr + mapping_size) ||
-   overlaps_interrupt_vector_text(vaddr, vaddr + mapping_size))
+   overlaps_interrupt_vector_text(vaddr, vaddr + 
mapping_size)) {
prot = PAGE_KERNEL_X;
-   else
+   exec = true;
+   } else {
prot = PAGE_KERNEL;
+   exec = false;
+   }
+
+   if (mapping_size != previous_size || exec != prev_exec) {
+   print_mapping(start, addr, previous_size, prev_exec);
+   start = addr;
+   }
 
rc = __map_kernel_page(vaddr, addr, prot, mapping_size, nid, 
start, end);
if (rc)
@@ -313,7 +318,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
update_page_count(psize, 1);
}
 
-   print_mapping(start, addr, mapping_size);
+   print_mapping(start, addr, mapping_size, exec);
return 0;
 }
 
-- 
2.17.2

[PATCH 5/6] powerpc/mm/radix: Simplify split mapping logic

2018-10-18 Thread Michael Ellerman

If we look closely at the logic in create_physical_mapping(), when
we're doing STRICT_KERNEL_RWX, we do the following steps:
  - determine the gap from where we are to the end of the range
  - choose an appropriate mapping_size based on the gap
  - check if that mapping_size would overlap the __init_begin
boundary, and if not choose an appropriate mapping_size

We can simplify the logic by taking the __init_begin boundary into
account when we calculate the initial gap.

So add a next_boundary() function which tells us what the next
boundary is, either the __init_begin boundary or end. In future we can
add more boundaries.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/pgtable-radix.c | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 030543451229..0e87733eed80 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -255,17 +255,21 @@ static inline void __meminit print_mapping(unsigned long 
start,
pr_info("Mapped 0x%016lx-0x%016lx with %s pages\n", start, end, buf);
 }
 
+static unsigned long next_boundary(unsigned long addr, unsigned long end)
+{
+#ifdef CONFIG_STRICT_KERNEL_RWX
+   if (addr < __pa_symbol(__init_begin))
+   return __pa_symbol(__init_begin);
+#endif
+   return end;
+}
+
 static int __meminit create_physical_mapping(unsigned long start,
 unsigned long end,
 int nid)
 {
unsigned long vaddr, addr, mapping_size = 0;
pgprot_t prot;
-#ifdef CONFIG_STRICT_KERNEL_RWX
-   int split_text_mapping = 1;
-#else
-   int split_text_mapping = 0;
-#endif
int psize;
 
start = _ALIGN_UP(start, PAGE_SIZE);
@@ -273,7 +277,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
unsigned long gap, previous_size;
int rc;
 
-   gap = end - addr;
+   gap = next_boundary(addr, end) - addr;
previous_size = mapping_size;
 
if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
@@ -289,22 +293,6 @@ static int __meminit create_physical_mapping(unsigned long 
start,
psize = mmu_virtual_psize;
}
 
-   if (split_text_mapping && (mapping_size == PUD_SIZE) &&
-   (addr < __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) > __pa_symbol(__init_begin)) {
-   if (mmu_psize_defs[MMU_PAGE_2M].shift)
-   mapping_size = PMD_SIZE;
-   else
-   mapping_size = PAGE_SIZE;
-   }
-
-   if (split_text_mapping && (mapping_size == PMD_SIZE) &&
-   (addr < __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) > __pa_symbol(__init_begin)) {
-   mapping_size = PAGE_SIZE;
-   psize = mmu_virtual_psize;
-   }
-
if (mapping_size != previous_size) {
print_mapping(start, addr, previous_size);
start = addr;
-- 
2.17.2

[PATCH 4/6] powerpc/mm/radix: Remove the retry in the split mapping logic

2018-10-18 Thread Michael Ellerman

When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
linear mapping at the text/data boundary so we can map the kernel
text read only.

The current logic uses a goto inside the for loop, which works, but is
hard to reason about.

When we hit the goto retry case we set max_mapping_size to PMD_SIZE
and go back to the start.

Setting max_mapping_size means we skip the PUD case and go to the PMD
case.

We know we will pass the alignment and gap checks because the only
reason we are there is we hit the goto retry, and that is guarded by
mapping_size == PUD_SIZE, which means addr is PUD aligned and gap is
greater or equal to PUD_SIZE.

So the only part of the check that can fail is the mmu_psize_defs
check for the 2M page size.

If we just duplicate that check we can avoid the goto, and we get the
same result.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/pgtable-radix.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 7a44ec276290..030543451229 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -261,7 +261,6 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 {
unsigned long vaddr, addr, mapping_size = 0;
pgprot_t prot;
-   unsigned long max_mapping_size;
 #ifdef CONFIG_STRICT_KERNEL_RWX
int split_text_mapping = 1;
 #else
@@ -276,12 +275,9 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 
gap = end - addr;
previous_size = mapping_size;
-   max_mapping_size = PUD_SIZE;
 
-retry:
if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
-   mmu_psize_defs[MMU_PAGE_1G].shift &&
-   PUD_SIZE <= max_mapping_size) {
+   mmu_psize_defs[MMU_PAGE_1G].shift) {
mapping_size = PUD_SIZE;
psize = MMU_PAGE_1G;
} else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
@@ -296,8 +292,10 @@ static int __meminit create_physical_mapping(unsigned long 
start,
if (split_text_mapping && (mapping_size == PUD_SIZE) &&
(addr < __pa_symbol(__init_begin)) &&
(addr + mapping_size) > __pa_symbol(__init_begin)) {
-   max_mapping_size = PMD_SIZE;
-   goto retry;
+   if (mmu_psize_defs[MMU_PAGE_2M].shift)
+   mapping_size = PMD_SIZE;
+   else
+   mapping_size = PAGE_SIZE;
}
 
if (split_text_mapping && (mapping_size == PMD_SIZE) &&
-- 
2.17.2

[PATCH 3/6] powerpc/mm/radix: Fix small page at boundary when splitting

2018-10-18 Thread Michael Ellerman

When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
linear mapping at the text/data boundary so we can map the kernel
text read only.

Currently we always use a small page at the text/data boundary, even
when that's not necessary:

  Mapped 0x-0x00e0 with 2.00 MiB pages
  Mapped 0x00e0-0x0100 with 64.0 KiB pages
  Mapped 0x0100-0x4000 with 2.00 MiB pages

This is because the check that the mapping crosses the __init_begin
boundary is too strict, it also returns true when we map exactly up to
the boundary.

So fix it to check that the mapping would actually map past
__init_begin, and with that we see:

  Mapped 0x-0x4000 with 2.00 MiB pages
  Mapped 0x4000-0x0001 with 1.00 GiB pages

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/pgtable-radix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index bb85c58b96c8..7a44ec276290 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -295,14 +295,14 @@ static int __meminit create_physical_mapping(unsigned 
long start,
 
if (split_text_mapping && (mapping_size == PUD_SIZE) &&
(addr < __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(__init_begin)) {
+   (addr + mapping_size) > __pa_symbol(__init_begin)) {
max_mapping_size = PMD_SIZE;
goto retry;
}
 
if (split_text_mapping && (mapping_size == PMD_SIZE) &&
(addr < __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(__init_begin)) {
+   (addr + mapping_size) > __pa_symbol(__init_begin)) {
mapping_size = PAGE_SIZE;
psize = mmu_virtual_psize;
}
-- 
2.17.2

[PATCH 2/6] powerpc/mm/radix: Fix overuse of small pages in splitting logic

2018-10-18 Thread Michael Ellerman

When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
linear mapping at the text/data boundary so we can map the kernel text
read only.

But the current logic uses small pages for the entire text section,
regardless of whether a larger page size would fit. eg. with the
boundary at 16M we could use 2M pages, but instead we use 64K pages up
to the 16M boundary:

  Mapped 0x-0x0100 with 64.0 KiB pages
  Mapped 0x0100-0x4000 with 2.00 MiB pages
  Mapped 0x4000-0x0001 with 1.00 GiB pages

This is because the test is checking if addr is < __init_begin
and addr + mapping_size is >= _stext. But that is true for all pages
between _stext and __init_begin.

Instead what we want to check is if we are crossing the text/data
boundary, which is at __init_begin. With that fixed we see:

  Mapped 0x-0x00e0 with 2.00 MiB pages
  Mapped 0x00e0-0x0100 with 64.0 KiB pages
  Mapped 0x0100-0x4000 with 2.00 MiB pages
  Mapped 0x4000-0x0001 with 1.00 GiB pages

ie. we're correctly using 2MB pages below __init_begin, but we still
drop down to 64K pages unnecessarily at the boundary.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/pgtable-radix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index d88d76231754..bb85c58b96c8 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -295,14 +295,14 @@ static int __meminit create_physical_mapping(unsigned 
long start,
 
if (split_text_mapping && (mapping_size == PUD_SIZE) &&
(addr < __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(_stext)) {
+   (addr + mapping_size) >= __pa_symbol(__init_begin)) {
max_mapping_size = PMD_SIZE;
goto retry;
}
 
if (split_text_mapping && (mapping_size == PMD_SIZE) &&
(addr < __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(_stext)) {
+   (addr + mapping_size) >= __pa_symbol(__init_begin)) {
mapping_size = PAGE_SIZE;
psize = mmu_virtual_psize;
}
-- 
2.17.2

[PATCH 1/6] powerpc/mm/radix: Fix off-by-one in split mapping logic

2018-10-18 Thread Michael Ellerman

When we have CONFIG_STRICT_KERNEL_RWX enabled, we try to split the
kernel linear (1:1) mapping so that the kernel text is in a separate
page to kernel data, so we can mark the former read-only.

We could achieve that just by always using 64K pages for the linear
mapping, but we try to be smarter. Instead we use huge pages when
possible, and only switch to smaller pages when necessary.

However we have an off-by-one bug in that logic, which causes us to
calculate the wrong boundary between text and data.

For example with the end of the kernel text at 16M we see:

  radix-mmu: Mapped 0x-0x0120 with 64.0 KiB pages
  radix-mmu: Mapped 0x0120-0x4000 with 2.00 MiB pages
  radix-mmu: Mapped 0x4000-0x0001 with 1.00 GiB pages

ie. we mapped from 0 to 18M with 64K pages, even though the boundary
between text and data is at 16M.

With the fix we see we're correctly hitting the 16M boundary:

  radix-mmu: Mapped 0x-0x0100 with 64.0 KiB pages
  radix-mmu: Mapped 0x0100-0x4000 with 2.00 MiB pages
  radix-mmu: Mapped 0x4000-0x0001 with 1.00 GiB pages

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/pgtable-radix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index c879979faa73..d88d76231754 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -294,14 +294,14 @@ static int __meminit create_physical_mapping(unsigned 
long start,
}
 
if (split_text_mapping && (mapping_size == PUD_SIZE) &&
-   (addr <= __pa_symbol(__init_begin)) &&
+   (addr < __pa_symbol(__init_begin)) &&
(addr + mapping_size) >= __pa_symbol(_stext)) {
max_mapping_size = PMD_SIZE;
goto retry;
}
 
if (split_text_mapping && (mapping_size == PMD_SIZE) &&
-   (addr <= __pa_symbol(__init_begin)) &&
+   (addr < __pa_symbol(__init_begin)) &&
(addr + mapping_size) >= __pa_symbol(_stext)) {
mapping_size = PAGE_SIZE;
psize = mmu_virtual_psize;
-- 
2.17.2

[PATCH] powerpc/mm: Fix page table dump to work on Radix

2018-10-18 Thread Michael Ellerman

When we're running on Book3S with the Radix MMU enabled the page table
dump currently prints the wrong addresses because it uses the wrong
start address.

Fix it to use PAGE_OFFSET rather than KERN_VIRT_START.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/dump_linuxpagetables.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/dump_linuxpagetables.c 
b/arch/powerpc/mm/dump_linuxpagetables.c
index e60aa6d7456d..2b74f8adf4d0 100644
--- a/arch/powerpc/mm/dump_linuxpagetables.c
+++ b/arch/powerpc/mm/dump_linuxpagetables.c
@@ -267,12 +267,13 @@ static void walk_pagetables(struct pg_state *st)
unsigned int i;
unsigned long addr;
 
+   addr = st->start_address;
+
/*
 * Traverse the linux pagetable structure and dump pages that are in
 * the hash pagetable.
 */
-   for (i = 0; i < PTRS_PER_PGD; i++, pgd++) {
-   addr = KERN_VIRT_START + i * PGDIR_SIZE;
+   for (i = 0; i < PTRS_PER_PGD; i++, pgd++, addr += PGDIR_SIZE) {
if (!pgd_none(*pgd) && !pgd_huge(*pgd))
/* pgd exists */
walk_pud(st, pgd, addr);
@@ -321,9 +322,14 @@ static int ptdump_show(struct seq_file *m, void *v)
 {
struct pg_state st = {
.seq = m,
-   .start_address = KERN_VIRT_START,
.marker = address_markers,
};
+
+   if (radix_enabled())
+   st.start_address = PAGE_OFFSET;
+   else
+   st.start_address = KERN_VIRT_START;
+
/* Traverse kernel page tables */
walk_pagetables();
note_page(, 0, 0, 0);
-- 
2.17.2

[PATCH] selftests/powerpc: Fix out-of-tree build errors

2018-10-18 Thread Michael Ellerman

Some of our Makefiles don't do the right thing when building the
selftests with O=, fix them up.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/powerpc/cache_shape/Makefile   | 2 --
 tools/testing/selftests/powerpc/ptrace/Makefile| 2 --
 tools/testing/selftests/powerpc/signal/Makefile| 2 --
 tools/testing/selftests/powerpc/switch_endian/Makefile | 1 +
 4 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/tools/testing/selftests/powerpc/cache_shape/Makefile 
b/tools/testing/selftests/powerpc/cache_shape/Makefile
index ede4d3dae750..62e947ca9921 100644
--- a/tools/testing/selftests/powerpc/cache_shape/Makefile
+++ b/tools/testing/selftests/powerpc/cache_shape/Makefile
@@ -1,8 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 TEST_PROGS := cache_shape
 
-all: $(TEST_PROGS)
-
 $(TEST_PROGS): ../harness.c ../utils.c
 
 top_srcdir = ../../../../..
diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile 
b/tools/testing/selftests/powerpc/ptrace/Makefile
index 9b35ca8e8f13..6ac71b629276 100644
--- a/tools/testing/selftests/powerpc/ptrace/Makefile
+++ b/tools/testing/selftests/powerpc/ptrace/Makefile
@@ -7,8 +7,6 @@ TEST_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \
 top_srcdir = ../../../../..
 include ../../lib.mk
 
-all: $(TEST_PROGS)
-
 CFLAGS += -m64 -I../../../../../usr/include -I../tm -mhtm -fno-pie
 
 ptrace-pkey core-pkey: child.h
diff --git a/tools/testing/selftests/powerpc/signal/Makefile 
b/tools/testing/selftests/powerpc/signal/Makefile
index 1fca25c6ace0..d34a7c7710db 100644
--- a/tools/testing/selftests/powerpc/signal/Makefile
+++ b/tools/testing/selftests/powerpc/signal/Makefile
@@ -1,8 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 TEST_PROGS := signal signal_tm
 
-all: $(TEST_PROGS)
-
 $(TEST_PROGS): ../harness.c ../utils.c signal.S
 
 CFLAGS += -maltivec
diff --git a/tools/testing/selftests/powerpc/switch_endian/Makefile 
b/tools/testing/selftests/powerpc/switch_endian/Makefile
index fcd2dcb8972b..bdc081afedb0 100644
--- a/tools/testing/selftests/powerpc/switch_endian/Makefile
+++ b/tools/testing/selftests/powerpc/switch_endian/Makefile
@@ -8,6 +8,7 @@ EXTRA_CLEAN = $(OUTPUT)/*.o $(OUTPUT)/check-reversed.S
 top_srcdir = ../../../../..
 include ../../lib.mk
 
+$(OUTPUT)/switch_endian_test: ASFLAGS += -I $(OUTPUT)
 $(OUTPUT)/switch_endian_test: $(OUTPUT)/check-reversed.S
 
 $(OUTPUT)/check-reversed.o: $(OUTPUT)/check.o
-- 
2.17.2

Re: powerpc/mm: Make pte_pgprot return all pte bits

2018-10-18 Thread Michael Ellerman

On Thu, 2018-10-18 at 13:33:16 UTC, Michael Ellerman wrote:
> From: "Aneesh Kumar K.V" 
> 
> Other archs do the same and instead of adding required pte bits (which
> got masked out) in __ioremap_at(), make sure we filter only pfn bits
> out.
> 
> Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
> Reviewed-by: Christophe Leroy 
> Signed-off-by: Aneesh Kumar K.V 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/b9fb4480a3af85552d88561a2fea9c

cheers

Re: powerpc/book3e: redefine pte_mkprivileged() and pte_mkuser()

2018-10-18 Thread Michael Ellerman

On Wed, 2018-10-17 at 13:03:22 UTC, Christophe Leroy wrote:
> Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash
> default pte_mkprivileged() and pte_mkuser() are not usable.
> 
> This patch redefines them for book3e.
> 
> In theorie, only pte_mkprivileged() needs to be redefined because
> _PAGE_USER includes _PAGE_PRIVILEGED, but it is less confusing
> to redefine both.
> 
> Fixes: a0da4bc166f2 ("powerpc/mm: Allow platforms to redefine some helpers")
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bde1a1335c5031758d3917d83dd5b8

cheers

Re: powerpc/io: remove old GCC version implementation

2018-10-18 Thread Michael Ellerman

On Tue, 2018-10-16 at 12:33:40 UTC, Christophe Leroy wrote:
> GCC 4.6 is the minimum supported now.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a0e102914aa3f619a5bc68a0d33e17

cheers

Re: powerpc/traps: remove redundant in_interrupt panic in die()

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 07:38:10 UTC, Christophe Leroy wrote:
> do_exit() already includes a test to panic() is in_interrupt()
> 
> This patch removes powerpc one which is redundant.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bd03fd84a53ac9ddaeb0a0fc4c4c98

cheers

Re: powerpc/traps: fix machine check handlers to use pr_cont()

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 07:20:45 UTC, Christophe Leroy wrote:
> When printing the machine check cause, the cause appears on the
> following line due to bad use of printk without \n:
> 
> [   33.663993] Machine check in kernel mode.
> [   33.664011] Caused by (from SRR1=9032):
> [   33.664036] Data access error at address c90c8000
> 
> This patch fixes it by using pr_cont() for the second part:
> 
> [  133.258131] Machine check in kernel mode.
> [  133.258146] Caused by (from SRR1=9032): Data access error at address 
> c90c8000
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/422123ccb9a13dcea2e008194ae6c2

cheers

Re: [12/12] powerpc/prom_init: Generate "phandle" instead of "linux, phandle"

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:50:00 UTC, Benjamin Herrenschmidt wrote:
> When creating the boot-time FDT from an actual Open Firmware live
> tree, let's generate "phandle" properties for the phandles instead
> of the old deprecated "linux,phandle".
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f1f208e54d08ccf00121c700a9bb1f

cheers

Re: [11/12] powerpc: Check prom_init for disallowed sections

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:59 UTC, Benjamin Herrenschmidt wrote:
> prom_init.c must not modify the kernel image outside
> of the .bss.prominit section. Thus make sure that
> prom_init.o doesn't have anything in any of these:
> 
>   .data
>   .bss
>   .init.data
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2c51d97ee88da897db8405f659d173

cheers

Re: [10/12] powerpc/prom_init: Move __prombss to it's own section and store it in .bss

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:58 UTC, Benjamin Herrenschmidt wrote:
> This makes __prombss its own section, and for now store
> it in .bss.
> 
> This will give us the ability later to store it elsewhere
> and/or free it after boot (it's about 8KB).
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5f69e38885c3483a1838dd946aaf01

cheers

Re: [09/12] powerpc/prom_init: Move a few remaining statics to appropriate sections

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:57 UTC, Benjamin Herrenschmidt wrote:
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8ca2d5151e7f5cbef42eda780eac56

cheers

Re: [08/12] powerpc/prom_init: Move const structures to __initconst

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:56 UTC, Benjamin Herrenschmidt wrote:
> As they are no longer used past the end of prom_init
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d00e34b92cd7d8f1c10c2f0a8c1036

cheers

Re: [07/12] powerpc/prom_init: Move ibm_arch_vec to __prombss

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:55 UTC, Benjamin Herrenschmidt wrote:
> Make the existing initialized definition constant and copy
> it to a __prombss copy
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a614f52e75bd69b513707b4adc6721

cheers

Re: [06/12] powerpc/prom_init: Move prom_radix_disable to __prombss

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:54 UTC, Benjamin Herrenschmidt wrote:
> Initialize it dynamically instead of statically
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c886087caee759790db47f345f8382

cheers

Re: [05/12] powerpc/prom_init: Remove support for OPAL v2

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:53 UTC, Benjamin Herrenschmidt wrote:
> We removed support for running under any OPAL version
> earlier than v3 in 2015 (they never saw the light of day
> anyway), but we kept some leftovers of this support in
> prom_init.c, so let's take it out.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/11fdb309341ca1ba2e3d03fd1c9c0c

cheers

Re: [04/12] powerpc/prom_init: Replace initdata with prombss when applicable

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 02:49:52 UTC, Benjamin Herrenschmidt wrote:
> This replaces all occurrences of __initdata for uninitialized
> data with a new __prombss
> 
> Currently __promdata is defined to be __initdata but we'll
> eventually change that.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e63334e556d9286fc30bec10503608

cheers

Re: macintosh/windfarm_smu_sat: Fix debug output

2018-10-18 Thread Michael Ellerman

On Mon, 2018-10-15 at 00:18:49 UTC, Benjamin Herrenschmidt wrote:
> There's some antiquated debug output that's trying
> to do a hand-made hexdump and turning into horrible
> 1-byte-per-line output these days.
> 
> Use print_hex_dump() instead
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fc0c8b36d379a046525eacb9c3323c

cheers

Re: [v3,1/2] powerpc/pseries: PAPR persistent memory support

2018-10-18 Thread Michael Ellerman

On Sun, 2018-10-14 at 23:18:27 UTC, Oliver O'Halloran wrote:
> This patch implements support for discovering storage class memory
> devices at boot and for handling hotplug of new regions via RTAS
> hotplug events.
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4c5d87db497832c493ed296157bd17

cheers

Re: [2/4] powerpc: Add -Werror at arch/powerpc level

2018-10-18 Thread Michael Ellerman

On Wed, 2018-10-10 at 05:13:06 UTC, Michael Ellerman wrote:
> Back when I added -Werror in commit ba55bd74360e ("powerpc: Add
> configurable -Werror for arch/powerpc") I did it by adding it to most
> of the arch Makefiles.
> 
> At the time we excluded math-emu, because apparently it didn't build
> cleanly. But that seems to have been fixed somewhere in the interim.
> 
> So move the -Werror addition to the top-level of the arch, this saves
> us from repeating it in every Makefile and means we won't forget to
> add it to any new sub-dirs.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/23ad1a2700725d46ee7760920974c6

cheers

Re: [1/4] powerpc: Move core kernel logic into arch/powerpc/Kbuild

2018-10-18 Thread Michael Ellerman

On Wed, 2018-10-10 at 05:13:05 UTC, Michael Ellerman wrote:
> This is a nice cleanup, arch/powerpc/Makefile is long and messy so
> moving this out helps a little.
> 
> It also allows us to do:
> 
>   $ make arch/powerpc
> 
> Which can be helpful if you just want to compile test some changes to
> arch code and not link everything.
> 
> Finally it also gives us a single place to do subdir-cc-flags
> assignments which affect the whole of arch/powerpc, which we will do
> in a future patch.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/c47ca98d32a22a412ddbc69916cf62

cheers

Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2

2018-10-18 Thread Alistair Popple

> >>> wouldn't you also need to do that somewhere? Unless the driver
> >>> does it at startup?
> >> 
> >> VFIO performs GPU reset so I'd expect the GPUs to flush its caches
> >> without any software interactions. Am I hoping for too much here?
> > 
> > Sadly you are. It's not the GPU caches that need flushing, it's the CPU
> > caches. This needs to happen as part of the reset sequence, so I guess
> > you would need to add it to the VFIO driver.
> 
> Well, ok. Caches need flushing, will look into this but this fencing is
> still needed, is not it?

Yes. Although without the flushing I think you may get HMI's on any subsequent 
driver loads.

So from the point of view of what happens on the Skiboot/HW side this looks ok 
so long as all links on an NPU are assigned to the same guest (as this call 
resets every link on a given NPU).

Acked-by: Alistair Popple 
 
> > - Alistair
> > 
> >>> - Alistair
> >>> 
> > - Alistair
> > 
> >>> - Alistair
> >>> 
> > - Alistair
> > 
> > On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy
> > 
> > wrote:
> >> Ping?
> >> 
> >> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> >>> The skiboot firmware has a hot reset handler which fences the
> >>> NVIDIA V100
> >>> GPU RAM on Witherspoons and makes accesses no-op instead of
> >>> throwing HMIs:
> >>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> >>> 
> >>> Now we are going to pass V100 via VFIO which most certainly
> >>> involves
> >>> KVM guests which are often terminated without getting a chance
> >>> to
> >>> offline
> >>> GPU RAM so we end up with a running machine with misconfigured
> >>> memory.
> >>> Accessing this memory produces hardware management interrupts
> >>> (HMI)
> >>> which bring the host down.
> >>> 
> >>> To suppress HMIs, this wires up this hot reset hook to
> >>> vfio_pci_disable()
> >>> via pci_disable_device() which switches NPU2 to a safe mode and
> >>> prevents
> >>> HMIs.
> >>> 
> >>> Signed-off-by: Alexey Kardashevskiy 
> >>> ---
> >>> Changes:
> >>> v2:
> >>> * updated the commit log
> >>> ---
> >>> 
> >>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++
> >>>  1 file changed, 10 insertions(+)
> >>> 
> >>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>> b/arch/powerpc/platforms/powernv/pci-ioda.c index
> >>> cde7102..e37b9cc 100644
> >>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct
> >>> pci_dev *pdev)>
> >>> 
> >>>   pnv_ioda_release_pe(pe);
> >>>  
> >>>  }
> >>> 
> >>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> >>> +{
> >>> + struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> >>> + struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> >>> +
> >>> + if (eehpe && eeh_ops && eeh_ops->reset)
> >>> + eeh_ops->reset(eehpe, EEH_RESET_HOT);
> >>> +}
> >>> +
> >>> 
> >>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >>>  {
> >>>  
> >>>   struct pnv_phb *phb = hose->private_data;
> >>> 
> >>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops
> >>> pnv_npu_ioda_controller_ops = {>
> >>> 
> >>>   .reset_secondary_bus= pnv_pci_reset_secondary_bus,
> >>>   .dma_set_mask   = pnv_npu_dma_set_mask,
> >>>   .shutdown   = pnv_pci_ioda_shutdown,
> >>> 
> >>> + .disable_device = pnv_npu_disable_device,
> >>> 
> >>>  };
> >>>  
> >>>  static const struct pci_controller_ops
> >>>  pnv_npu_ocapi_ioda_controller_ops = {

Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2

2018-10-18 Thread Alexey Kardashevskiy




On 18/10/2018 12:05, Alistair Popple wrote:
> Hi Alexey,
> 
>>> wouldn't you also need to do that somewhere? Unless the driver
>>> does it at startup?
>>
>> VFIO performs GPU reset so I'd expect the GPUs to flush its caches
>> without any software interactions. Am I hoping for too much here?
> 
> Sadly you are. It's not the GPU caches that need flushing, it's the CPU 
> caches. 
> This needs to happen as part of the reset sequence, so I guess you would need 
> to add it to the VFIO driver.

Well, ok. Caches need flushing, will look into this but this fencing is
still needed, is not it?



> 
> - Alistair
> 
>>
>>> - Alistair
>>>
> - Alistair
>
>>> - Alistair
>>>
> - Alistair
>
> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy 
> wrote:
>> Ping?
>>
>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
>>> The skiboot firmware has a hot reset handler which fences the
>>> NVIDIA V100
>>> GPU RAM on Witherspoons and makes accesses no-op instead of
>>> throwing HMIs:
>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
>>>
>>> Now we are going to pass V100 via VFIO which most certainly
>>> involves
>>> KVM guests which are often terminated without getting a chance to
>>> offline
>>> GPU RAM so we end up with a running machine with misconfigured
>>> memory.
>>> Accessing this memory produces hardware management interrupts
>>> (HMI)
>>> which bring the host down.
>>>
>>> To suppress HMIs, this wires up this hot reset hook to
>>> vfio_pci_disable()
>>> via pci_disable_device() which switches NPU2 to a safe mode and
>>> prevents
>>> HMIs.
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>> Changes:
>>> v2:
>>> * updated the commit log
>>> ---
>>>
>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> b/arch/powerpc/platforms/powernv/pci-ioda.c index
>>> cde7102..e37b9cc 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct
>>> pci_dev *pdev)> 
>>> pnv_ioda_release_pe(pe);
>>>  
>>>  }
>>>
>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
>>> +{
>>> +   struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
>>> +   struct eeh_pe *eehpe = edev ? edev->pe : NULL;
>>> +
>>> +   if (eehpe && eeh_ops && eeh_ops->reset)
>>> +   eeh_ops->reset(eehpe, EEH_RESET_HOT);
>>> +}
>>> +
>>>
>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>>>  {
>>>  
>>> struct pnv_phb *phb = hose->private_data;
>>>
>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops
>>> pnv_npu_ioda_controller_ops = {> 
>>> .reset_secondary_bus= pnv_pci_reset_secondary_bus,
>>> .dma_set_mask   = pnv_npu_dma_set_mask,
>>> .shutdown   = pnv_pci_ioda_shutdown,
>>>
>>> +   .disable_device = pnv_npu_disable_device,
>>>
>>>  };
>>>  
>>>  static const struct pci_controller_ops
>>>  pnv_npu_ocapi_ioda_controller_ops = {
> 
> 

-- 
Alexey

Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-18 Thread Alexey Kardashevskiy




On 19/10/2018 05:05, Alex Williamson wrote:
> On Thu, 18 Oct 2018 10:37:46 -0700
> Piotr Jaroszynski  wrote:
> 
>> On 10/18/18 9:55 AM, Alex Williamson wrote:
>>> On Thu, 18 Oct 2018 11:31:33 +1100
>>> Alexey Kardashevskiy  wrote:
>>>   
 On 18/10/2018 08:52, Alex Williamson wrote:  
> On Wed, 17 Oct 2018 12:19:20 +1100
> Alexey Kardashevskiy  wrote:
>  
>> On 17/10/2018 06:08, Alex Williamson wrote:  
>>> On Mon, 15 Oct 2018 20:42:33 +1100
>>> Alexey Kardashevskiy  wrote:
 +
 +  if (pdev->vendor == PCI_VENDOR_ID_IBM &&
 +  pdev->device == 0x04ea) {
 +  ret = vfio_pci_ibm_npu2_init(vdev);
 +  if (ret) {
 +  dev_warn(>pdev->dev,
 +  "Failed to setup NVIDIA NV2 
 ATSD region\n");
 +  goto disable_exit;
}  
>>>
>>> So the NPU is also actually owned by vfio-pci and assigned to the VM?  
>>
>> Yes. On a running system it looks like:
>>
>> 0007:00:00.0 Bridge: IBM Device 04ea (rev 01)
>> 0007:00:00.1 Bridge: IBM Device 04ea (rev 01)
>> 0007:00:01.0 Bridge: IBM Device 04ea (rev 01)
>> 0007:00:01.1 Bridge: IBM Device 04ea (rev 01)
>> 0007:00:02.0 Bridge: IBM Device 04ea (rev 01)
>> 0007:00:02.1 Bridge: IBM Device 04ea (rev 01)
>> 0035:00:00.0 PCI bridge: IBM Device 04c1
>> 0035:01:00.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>> 0035:02:04.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>> 0035:02:05.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>> 0035:02:0d.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
>> (rev a1
>> 0035:04:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
>> (rev a1)
>> 0035:05:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
>> (rev a1)
>>
>> One "IBM Device" bridge represents one NVLink2, i.e. a piece of NPU.
>> They all and 3 GPUs go to the same IOMMU group and get passed through to
>> a guest.
>>
>> The entire NPU does not have representation via sysfs as a whole though. 
>>  
>
> So the NPU is a bridge, but it uses a normal header type so vfio-pci
> will bind to it?  

 An NPU is a NVLink bridge, it is not PCI in any sense. We (the host
 powerpc firmware known as "skiboot" or "opal") have chosen to emulate a
 virtual bridge per 1 NVLink on the firmware level. So for each physical
 NPU there are 6 virtual bridges. So the NVIDIA driver does not need to
 know much about NPUs.
  
> And the ATSD register that we need on it is not
> accessible through these PCI representations of the sub-pieces of the
> NPU?  Thanks,  

 No, only via the device tree. The skiboot puts the ATSD register address
 to the PHB's DT property called 'ibm,mmio-atsd' of these virtual bridges.  
>>>
>>> Ok, so the NPU is essential a virtual device already, mostly just a
>>> stub.  But it seems that each NPU is associated to a specific GPU, how
>>> is that association done?  In the use case here it seems like it's just
>>> a vehicle to provide this ibm,mmio-atsd property to guest DT and the tgt
>>> routing information to the GPU.  So if both of those were attached to
>>> the GPU, there'd be no purpose in assigning the NPU other than it's in
>>> the same IOMMU group with a type 0 header, so something needs to be
>>> done with it.  If it's a virtual device, perhaps it could have a type 1
>>> header so vfio wouldn't care about it, then we would only assign the
>>> GPU with these extra properties, which seems easier for management
>>> tools and users.  If the guest driver needs a visible NPU device, QEMU
>>> could possibly emulate one to make the GPU association work
>>> automatically.  Maybe this isn't really a problem, but I wonder if
>>> you've looked up the management stack to see what tools need to know to
>>> assign these NPU devices and whether specific configurations are
>>> required to make the NPU to GPU association work.  Thanks,  
>>
>> I'm not that familiar with how this was originally set up, but note that 
>> Alexey is just making it work exactly like baremetal does. The baremetal 
>> GPU driver works as-is in the VM and expects the same properties in the 
>> device-tree. Obviously it doesn't have to be that way, but there is 
>> value in keeping it identical.
>>
>> Another probably bigger point is that the NPU device also implements the 
>> nvlink HW interface and is required for actually training and 
>> maintaining the link up. The driver in the guest trains the links by 
>> programming both the GPU end and the NPU end of each link so the NPU 
>> device needs to be exposed to the guest.
> 
> Ok, so there is functionality in assigning

Re: [PATCH] powerpc/uaccess: fix warning/error with access_ok()

2018-10-18 Thread kbuild test robot

Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-uaccess-fix-warning-error-with-access_ok/20181019-031908
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-socrates_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/uaccess.h:14:0,
from arch/powerpc/include/asm/sections.h:7,
from include/linux/kallsyms.h:15,
from include/linux/ftrace.h:11,
from include/linux/init_task.h:9,
from arch/powerpc/kernel/process.c:32:
   arch/powerpc/kernel/process.c: In function 'show_user_instructions':
>> arch/powerpc/include/asm/uaccess.h:21:25: error: incompatible type for 
>> argument 3 of '__access_ok'
#define MAKE_MM_SEG(s)  ((mm_segment_t) { (s) })
^
>> arch/powerpc/include/asm/uaccess.h:28:18: note: in expansion of macro 
>> 'MAKE_MM_SEG'
#define USER_DS  MAKE_MM_SEG(TASK_SIZE - 1)
 ^~~
>> arch/powerpc/kernel/process.c:1313:55: note: in expansion of macro 'USER_DS'
 if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) {
  ^~~
   arch/powerpc/include/asm/uaccess.h:58:19: note: expected 'long unsigned int' 
but argument is of type 'mm_segment_t {aka struct }'
static inline int __access_ok(int type, unsigned long addr, unsigned long 
size,
  ^~~
>> arch/powerpc/kernel/process.c:1313:7: error: too few arguments to function 
>> '__access_ok'
 if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) {
  ^~~
   In file included from include/linux/uaccess.h:14:0,
from arch/powerpc/include/asm/sections.h:7,
from include/linux/kallsyms.h:15,
from include/linux/ftrace.h:11,
from include/linux/init_task.h:9,
from arch/powerpc/kernel/process.c:32:
   arch/powerpc/include/asm/uaccess.h:58:19: note: declared here
static inline int __access_ok(int type, unsigned long addr, unsigned long 
size,
  ^~~
--
   In file included from include/linux/uaccess.h:14:0,
from arch/powerpc/include/asm/sections.h:7,
from include/linux/kallsyms.h:15,
from include/linux/ftrace.h:11,
from include/linux/kprobes.h:42,
from arch/powerpc/lib/sstep.c:12:
   arch/powerpc/lib/sstep.c: In function 'address_ok':
>> arch/powerpc/include/asm/uaccess.h:21:25: error: incompatible type for 
>> argument 3 of '__access_ok'
#define MAKE_MM_SEG(s)  ((mm_segment_t) { (s) })
^
>> arch/powerpc/include/asm/uaccess.h:28:18: note: in expansion of macro 
>> 'MAKE_MM_SEG'
#define USER_DS  MAKE_MM_SEG(TASK_SIZE - 1)
 ^~~
>> arch/powerpc/lib/sstep.c:113:26: note: in expansion of macro 'USER_DS'
 if (__access_ok(ea, nb, USER_DS))
 ^~~
   arch/powerpc/include/asm/uaccess.h:58:19: note: expected 'long unsigned int' 
but argument is of type 'mm_segment_t {aka struct }'
static inline int __access_ok(int type, unsigned long addr, unsigned long 
size,
  ^~~
>> arch/powerpc/lib/sstep.c:113:6: error: too few arguments to function 
>> '__access_ok'
 if (__access_ok(ea, nb, USER_DS))
 ^~~
   In file included from include/linux/uaccess.h:14:0,
from arch/powerpc/include/asm/sections.h:7,
from include/linux/kallsyms.h:15,
from include/linux/ftrace.h:11,
from include/linux/kprobes.h:42,
from arch/powerpc/lib/sstep.c:12:
   arch/powerpc/include/asm/uaccess.h:58:19: note: declared here
static inline int __access_ok(int type, unsigned long addr, unsigned long 
size,
  ^~~
>> arch/powerpc/include/asm/uaccess.h:21:25: error: incompatible type for 
>> argument 3 of '__access_ok'
#define MAKE_MM_SEG(s)  ((mm_segment_t) { (s) })
^
>> arch/powerpc/include/asm/u

[PATCH v5 18/18] of: unittest: initialize args before calling of_parse_()

2018-10-18 Thread frowand . list

From: Frank Rowand 

Callers of of_irq_parse_one() blindly use the pointer args.np
without checking whether of_irq_parse_one() had an error and
thus did not set the value of args.np.  Initialize args to
zero so that using the format "%pOF" to show the value of
args.np will show "(null)" when of_irq_parse_one() has an
error.  This prevents the dereference of a random value.

Make the same fix for callers of of_parse_phandle_with_args()
and of_parse_phandle_with_args_map().

Reported-by: Guenter Roeck 
Signed-off-by: Frank Rowand 
---
 drivers/of/unittest.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 785985bdbfa6..5f4db23e4752 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -375,6 +375,7 @@ static void __init of_unittest_parse_phandle_with_args(void)
for (i = 0; i < 8; i++) {
bool passed = true;
 
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args(np, "phandle-list",
"#phandle-cells", i, );
 
@@ -428,6 +429,7 @@ static void __init of_unittest_parse_phandle_with_args(void)
}
 
/* Check for missing list property */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args(np, "phandle-list-missing",
"#phandle-cells", 0, );
unittest(rc == -ENOENT, "expected:%i got:%i\n", -ENOENT, rc);
@@ -436,6 +438,7 @@ static void __init of_unittest_parse_phandle_with_args(void)
unittest(rc == -ENOENT, "expected:%i got:%i\n", -ENOENT, rc);
 
/* Check for missing cells property */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args(np, "phandle-list",
"#phandle-cells-missing", 0, );
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
@@ -444,6 +447,7 @@ static void __init of_unittest_parse_phandle_with_args(void)
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
 
/* Check for bad phandle in list */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args(np, "phandle-list-bad-phandle",
"#phandle-cells", 0, );
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
@@ -452,6 +456,7 @@ static void __init of_unittest_parse_phandle_with_args(void)
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
 
/* Check for incorrectly formed argument list */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args(np, "phandle-list-bad-args",
"#phandle-cells", 1, );
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
@@ -502,6 +507,7 @@ static void __init 
of_unittest_parse_phandle_with_args_map(void)
for (i = 0; i < 8; i++) {
bool passed = true;
 
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args_map(np, "phandle-list",
"phandle", i, );
 
@@ -559,21 +565,25 @@ static void __init 
of_unittest_parse_phandle_with_args_map(void)
}
 
/* Check for missing list property */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args_map(np, "phandle-list-missing",
"phandle", 0, );
unittest(rc == -ENOENT, "expected:%i got:%i\n", -ENOENT, rc);
 
/* Check for missing cells,map,mask property */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args_map(np, "phandle-list",
"phandle-missing", 0, );
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
 
/* Check for bad phandle in list */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args_map(np, "phandle-list-bad-phandle",
"phandle", 0, );
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
 
/* Check for incorrectly formed argument list */
+   memset(, 0, sizeof(args));
rc = of_parse_phandle_with_args_map(np, "phandle-list-bad-args",
"phandle", 1, );
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
@@ -780,7 +790,7 @@ static void __init of_unittest_parse_interrupts(void)
for (i = 0; i < 4; i++) {
bool passed = true;
 
-   args.args_count = 0;
+   memset(, 0, sizeof(args));
rc = of_irq_parse_one(np, i, );
 
passed &= !rc;
@@ -801,7 +811,7 @@ static void __init of_unittest_parse_interrupts(void)
for (i = 0; i < 4; i++) {
bool passed = true;
 
-   args.args_count = 0;
+   memset(, 0, sizeof(args));
rc =

[PATCH v5 17/18] of: unittest: find overlays[] entry by name instead of index

2018-10-18 Thread frowand . list

From: Frank Rowand 

One accessor of overlays[] was using a hard coded index value to
find the correct array entry instead of searching for the entry
containing the correct name.

Signed-off-by: Frank Rowand 
---
 drivers/of/unittest.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 1c2bd8503095..785985bdbfa6 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -2178,7 +2178,7 @@ struct overlay_info {
 OVERLAY_INFO_EXTERN(overlay_bad_phandle);
 OVERLAY_INFO_EXTERN(overlay_bad_symbol);
 
-/* order of entries is hard-coded into users of overlays[] */
+/* entries found by name */
 static struct overlay_info overlays[] = {
OVERLAY_INFO(overlay_base, -),
OVERLAY_INFO(overlay, 0),
@@ -2201,7 +2201,8 @@ struct overlay_info {
OVERLAY_INFO(overlay_bad_add_dup_prop, -EINVAL),
OVERLAY_INFO(overlay_bad_phandle, -EINVAL),
OVERLAY_INFO(overlay_bad_symbol, -EINVAL),
-   {}
+   /* end marker */
+   {.dtb_begin = NULL, .dtb_end = NULL, .expected_result = 0, .name = NULL}
 };
 
 static struct device_node *overlay_base_root;
@@ -2231,6 +2232,19 @@ void __init unittest_unflatten_overlay_base(void)
u32 data_size;
void *new_fdt;
u32 size;
+   int found = 0;
+   const char *overlay_name = "overlay_base";
+
+   for (info = overlays; info && info->name; info++) {
+   if (!strcmp(overlay_name, info->name)) {
+   found = 1;
+   break;
+   }
+   }
+   if (!found) {
+   pr_err("no overlay data for %s\n", overlay_name);
+   return;
+   }
 
info = [0];
 
@@ -2278,11 +2292,10 @@ static int __init overlay_data_apply(const char 
*overlay_name, int *overlay_id)
 {
struct overlay_info *info;
int found = 0;
-   int k;
int ret;
u32 size;
 
-   for (k = 0, info = overlays; info && info->name; info++, k++) {
+   for (info = overlays; info && info->name; info++) {
if (!strcmp(overlay_name, info->name)) {
found = 1;
break;
-- 
Frank Rowand

[PATCH v5 16/18] of: unittest: allow base devicetree to have symbol metadata

2018-10-18 Thread frowand . list

From: Frank Rowand 

The overlay metadata nodes in the FDT created from testcases.dts
are not handled properly.

The __fixups__ and __local_fixups__ node were added to the live
devicetree, but should not be.

Only the first property in the /__symbols__ node was added to the
live devicetree if the live devicetree already contained a
/__symbols node.  All of the node's properties must be added.

Signed-off-by: Frank Rowand 
---
 drivers/of/unittest.c | 43 +++
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 6d80f474c8f2..1c2bd8503095 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -1057,20 +1057,44 @@ static void __init of_unittest_platform_populate(void)
  * of np into dup node (present in live tree) and
  * updates parent of children of np to dup.
  *
- * @np:node already present in live tree
+ * @np:node whose properties are being added to the live tree
  * @dup:   node present in live tree to be updated
  */
 static void update_node_properties(struct device_node *np,
struct device_node *dup)
 {
struct property *prop;
+   struct property *save_next;
struct device_node *child;
-
-   for_each_property_of_node(np, prop)
-   of_add_property(dup, prop);
+   int ret;
 
for_each_child_of_node(np, child)
child->parent = dup;
+
+   /*
+* "unittest internal error: unable to add testdata property"
+*
+*If this message reports a property in node '/__symbols__' then
+*the respective unittest overlay contains a label that has the
+*same name as a label in the live devicetree.  The label will
+*be in the live devicetree only if the devicetree source was
+*compiled with the '-@' option.  If you encounter this error,
+*please consider renaming __all__ of the labels in the unittest
+*overlay dts files with an odd prefix that is unlikely to be
+*used in a real devicetree.
+*/
+
+   /*
+* open code for_each_property_of_node() because of_add_property()
+* sets prop->next to NULL
+*/
+   for (prop = np->properties; prop != NULL; prop = save_next) {
+   save_next = prop->next;
+   ret = of_add_property(dup, prop);
+   if (ret)
+   pr_err("unittest internal error: unable to add testdata 
property %pOF/%s",
+  np, prop->name);
+   }
 }
 
 /**
@@ -1079,18 +1103,23 @@ static void update_node_properties(struct device_node 
*np,
  *
  * @np:Node to attach to live tree
  */
-static int attach_node_and_children(struct device_node *np)
+static void attach_node_and_children(struct device_node *np)
 {
struct device_node *next, *dup, *child;
unsigned long flags;
const char *full_name;
 
full_name = kasprintf(GFP_KERNEL, "%pOF", np);
+
+   if (!strcmp(full_name, "/__local_fixups__") ||
+   !strcmp(full_name, "/__fixups__"))
+   return;
+
dup = of_find_node_by_path(full_name);
kfree(full_name);
if (dup) {
update_node_properties(np, dup);
-   return 0;
+   return;
}
 
child = np->child;
@@ -,8 +1140,6 @@ static int attach_node_and_children(struct device_node 
*np)
attach_node_and_children(child);
child = next;
}
-
-   return 0;
 }
 
 /**
-- 
Frank Rowand

[PATCH v5 15/18] of: overlay: set node fields from properties when add new overlay node

2018-10-18 Thread frowand . list

From: Frank Rowand 

Overlay nodes added by add_changeset_node() do not have the node
fields name, phandle, and type set.

The node passed to __of_attach_node() when the add node changeset
entry is processed does not contain any properties.  The node's
properties are located in add property changeset entries that will
be processed after the add node changeset is applied.

Set the node's fields in the node contained in the add node
changeset entry and do not set them to incorrect values in
add_changeset_node().

A visible symptom that is fixed by this patch is the names of nodes
added by overlays that have an entry in /sys/bus/platform/drivers/*/
will contain the unit-address but the node-name will be ,  for
example, "fc4ab000.".  After applying the patch the name, in
this example, for node restart@fc4ab000 is "fc4ab000.restart".

Signed-off-by: Frank Rowand 
---
 drivers/of/dynamic.c | 27 ++-
 drivers/of/overlay.c | 29 -
 2 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index beea792debb6..43e1ccb9c387 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -205,15 +205,24 @@ static void __of_attach_node(struct device_node *np)
const __be32 *phandle;
int sz;
 
-   np->name = __of_get_property(np, "name", NULL) ? : "";
-   np->type = __of_get_property(np, "device_type", NULL) ? : "";
-
-   phandle = __of_get_property(np, "phandle", );
-   if (!phandle)
-   phandle = __of_get_property(np, "linux,phandle", );
-   if (IS_ENABLED(CONFIG_PPC_PSERIES) && !phandle)
-   phandle = __of_get_property(np, "ibm,phandle", );
-   np->phandle = (phandle && (sz >= 4)) ? be32_to_cpup(phandle) : 0;
+   if (!of_node_check_flag(np, OF_OVERLAY)) {
+   np->name = __of_get_property(np, "name", NULL);
+   np->type = __of_get_property(np, "device_type", NULL);
+   if (!np->name)
+   np->name = "";
+   if (!np->type)
+   np->type = "";
+
+   phandle = __of_get_property(np, "phandle", );
+   if (!phandle)
+   phandle = __of_get_property(np, "linux,phandle", );
+   if (IS_ENABLED(CONFIG_PPC_PSERIES) && !phandle)
+   phandle = __of_get_property(np, "ibm,phandle", );
+   if (phandle && (sz >= 4))
+   np->phandle = be32_to_cpup(phandle);
+   else
+   np->phandle = 0;
+   }
 
np->child = NULL;
np->sibling = np->parent->child;
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 9e4af83c8c62..eaedb3a63077 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -307,10 +307,11 @@ static int add_changeset_property(struct 
overlay_changeset *ovcs,
int ret = 0;
bool check_for_non_overlay_node = false;
 
-   if (!of_prop_cmp(overlay_prop->name, "name") ||
-   !of_prop_cmp(overlay_prop->name, "phandle") ||
-   !of_prop_cmp(overlay_prop->name, "linux,phandle"))
-   return 0;
+   if (target->in_livetree)
+   if (!of_prop_cmp(overlay_prop->name, "name") ||
+   !of_prop_cmp(overlay_prop->name, "phandle") ||
+   !of_prop_cmp(overlay_prop->name, "linux,phandle"))
+   return 0;
 
if (target->in_livetree)
prop = of_find_property(target->np, overlay_prop->name, NULL);
@@ -330,6 +331,10 @@ static int add_changeset_property(struct overlay_changeset 
*ovcs,
 
if (!prop) {
check_for_non_overlay_node = true;
+   if (!target->in_livetree) {
+   new_prop->next = target->np->deadprops;
+   target->np->deadprops = new_prop;
+   }
ret = of_changeset_add_property(>cset, target->np,
new_prop);
} else if (!of_prop_cmp(prop->name, "#address-cells")) {
@@ -400,9 +405,10 @@ static int add_changeset_node(struct overlay_changeset 
*ovcs,
struct target *target, struct device_node *node)
 {
const char *node_kbasename;
+   const __be32 *phandle;
struct device_node *tchild;
struct target target_child;
-   int ret = 0;
+   int ret = 0, size;
 
node_kbasename = kbasename(node->full_name);
 
@@ -416,6 +422,19 @@ static int add_changeset_node(struct overlay_changeset 
*ovcs,
return -ENOMEM;
 
tchild->parent = target->np;
+   tchild->name = __of_get_property(node, "name", NULL);
+   tchild->type = __of_get_property(node, "device_type", NULL);
+
+   if (!tchild->name)
+   tchild->name = "";
+   if (!tchild->type)
+   tchild->type = "";
+
+   /*

[PATCH v5 14/18] of: unittest: remove unused of_unittest_apply_overlay() argument

2018-10-18 Thread frowand . list

From: Frank Rowand 

Argument unittest_nr is not used in of_unittest_apply_overlay(),
remove it.

Signed-off-by: Frank Rowand 
---
 drivers/of/unittest.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index efd9c947f192..6d80f474c8f2 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -1419,8 +1419,7 @@ static void of_unittest_destroy_tracked_overlays(void)
} while (defers > 0);
 }
 
-static int __init of_unittest_apply_overlay(int overlay_nr, int unittest_nr,
-   int *overlay_id)
+static int __init of_unittest_apply_overlay(int overlay_nr, int *overlay_id)
 {
const char *overlay_name;
 
@@ -1453,7 +1452,7 @@ static int __init of_unittest_apply_overlay_check(int 
overlay_nr,
}
 
ovcs_id = 0;
-   ret = of_unittest_apply_overlay(overlay_nr, unittest_nr, _id);
+   ret = of_unittest_apply_overlay(overlay_nr, _id);
if (ret != 0) {
/* of_unittest_apply_overlay already called unittest() */
return ret;
@@ -1489,7 +1488,7 @@ static int __init 
of_unittest_apply_revert_overlay_check(int overlay_nr,
 
/* apply the overlay */
ovcs_id = 0;
-   ret = of_unittest_apply_overlay(overlay_nr, unittest_nr, _id);
+   ret = of_unittest_apply_overlay(overlay_nr, _id);
if (ret != 0) {
/* of_unittest_apply_overlay already called unittest() */
return ret;
-- 
Frank Rowand

[PATCH v5 13/18] of: overlay: check prevents multiple fragments touching same property

2018-10-18 Thread frowand . list

From: Frank Rowand 

Add test case of two fragments updating the same property.  After
adding the test case, the system hangs at end of boot, after
after slub stack dumps from kfree() in crypto modprobe code.

Multiple overlay fragments adding, modifying, or deleting the same
property is not supported.  Add check to detect the attempt and fail
the overlay apply.

Before this patch, the first fragment error would terminate
processing.  Allow fragment checking to proceed and report all
of the fragment errors before terminating the overlay apply. This
is not a hot path, thus not a performance issue (the error is not
transient and requires fixing the overlay before attempting to
apply it again).

After applying this patch, the devicetree unittest messages will
include:

   OF: overlay: ERROR: multiple fragments add, update, and/or delete property 
/testcase-data-2/substation@100/motor-1/rpm_avail

   ...

   ### dt-test ### end of unittest - 212 passed, 0 failed

The check to detect two fragments updating the same property is
folded into the patch that created the test case to maintain
bisectability.

Signed-off-by: Frank Rowand 
---
 drivers/of/overlay.c   | 118 ++---
 drivers/of/unittest-data/Makefile  |   1 +
 .../of/unittest-data/overlay_bad_add_dup_prop.dts  |  24 +
 drivers/of/unittest-data/overlay_base.dts  |   1 +
 drivers/of/unittest.c  |   5 +
 5 files changed, 112 insertions(+), 37 deletions(-)
 create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_prop.dts

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index e12f2c28a36a..9e4af83c8c62 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -508,52 +508,96 @@ static int build_changeset_symbols_node(struct 
overlay_changeset *ovcs,
return 0;
 }
 
+static int find_dup_cset_node_entry(struct overlay_changeset *ovcs,
+   struct of_changeset_entry *ce_1)
+{
+   struct of_changeset_entry *ce_2;
+   char *fn_1, *fn_2;
+   int node_path_match;
+
+   if (ce_1->action != OF_RECONFIG_ATTACH_NODE &&
+   ce_1->action != OF_RECONFIG_DETACH_NODE)
+   return 0;
+
+   ce_2 = ce_1;
+   list_for_each_entry_continue(ce_2, >cset.entries, node) {
+   if ((ce_2->action != OF_RECONFIG_ATTACH_NODE &&
+ce_2->action != OF_RECONFIG_DETACH_NODE) ||
+   of_node_cmp(ce_1->np->full_name, ce_2->np->full_name))
+   continue;
+
+   fn_1 = kasprintf(GFP_KERNEL, "%pOF", ce_1->np);
+   fn_2 = kasprintf(GFP_KERNEL, "%pOF", ce_2->np);
+   node_path_match = !strcmp(fn_1, fn_2);
+   kfree(fn_1);
+   kfree(fn_2);
+   if (node_path_match) {
+   pr_err("ERROR: multiple fragments add and/or delete 
node %pOF\n",
+  ce_1->np);
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
+static int find_dup_cset_prop(struct overlay_changeset *ovcs,
+   struct of_changeset_entry *ce_1)
+{
+   struct of_changeset_entry *ce_2;
+   char *fn_1, *fn_2;
+   int node_path_match;
+
+   if (ce_1->action != OF_RECONFIG_ADD_PROPERTY &&
+   ce_1->action != OF_RECONFIG_REMOVE_PROPERTY &&
+   ce_1->action != OF_RECONFIG_UPDATE_PROPERTY)
+   return 0;
+
+   ce_2 = ce_1;
+   list_for_each_entry_continue(ce_2, >cset.entries, node) {
+   if ((ce_2->action != OF_RECONFIG_ADD_PROPERTY &&
+ce_2->action != OF_RECONFIG_REMOVE_PROPERTY &&
+ce_2->action != OF_RECONFIG_UPDATE_PROPERTY) ||
+   of_node_cmp(ce_1->np->full_name, ce_2->np->full_name))
+   continue;
+
+   fn_1 = kasprintf(GFP_KERNEL, "%pOF", ce_1->np);
+   fn_2 = kasprintf(GFP_KERNEL, "%pOF", ce_2->np);
+   node_path_match = !strcmp(fn_1, fn_2);
+   kfree(fn_1);
+   kfree(fn_2);
+   if (node_path_match &&
+   !of_prop_cmp(ce_1->prop->name, ce_2->prop->name)) {
+   pr_err("ERROR: multiple fragments add, update, and/or 
delete property %pOF/%s\n",
+  ce_1->np, ce_1->prop->name);
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 /**
- * check_changeset_dup_add_node() - changeset validation: duplicate add node
+ * changeset_dup_entry_check() - check for duplicate entries
  * @ovcs:  Overlay changeset
  *
- * Check changeset @ovcs->cset for multiple add node entries for the same
- * node.
+ * Check changeset @ovcs->cset for multiple {add or delete} node entries for
+ * the same node or duplicate {add, delete, or update} properties entries
+ * for the same property.
  *
- * Returns 0 on success, -ENOMEM if memory allocation

[PATCH v5 05/18] of: overlay: use prop add changeset entry for property in new nodes

2018-10-18 Thread frowand . list

From: Frank Rowand 

The changeset entry 'update property' was used for new properties in
an overlay instead of 'add property'.

The decision of whether to use 'update property' was based on whether
the property already exists in the subtree where the node is being
spliced into.  At the top level of creating a changeset describing the
overlay, the target node is in the live devicetree, so checking whether
the property exists in the target node returns the correct result.
As soon as the changeset creation algorithm recurses into a new node,
the target is no longer in the live devicetree, but is instead in the
detached overlay tree, thus all properties are incorrectly found to
already exist in the target.

This fix will expose another devicetree bug that will be fixed
in the following patch in the series.

When this patch is applied the errors reported by the devictree
unittest will change, and the unittest results will change from:

   ### dt-test ### end of unittest - 210 passed, 0 failed

to

   ### dt-test ### end of unittest - 203 passed, 7 failed

Signed-off-by: Frank Rowand 
---
 drivers/of/overlay.c | 112 ++-
 1 file changed, 74 insertions(+), 38 deletions(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 32cfee68f2e3..94740f4ee34c 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -24,6 +24,26 @@
 #include "of_private.h"
 
 /**
+ * struct target - info about current target node as recursing through overlay
+ * @np:node where current level of overlay will be 
applied
+ * @in_livetree:   @np is a node in the live devicetree
+ *
+ * Used in the algorithm to create the portion of a changeset that describes
+ * an overlay fragment, which is a devicetree subtree.  Initially @np is a node
+ * in the live devicetree where the overlay subtree is targeted to be grafted
+ * into.  When recursing to the next level of the overlay subtree, the target
+ * also recurses to the next level of the live devicetree, as long as overlay
+ * subtree node also exists in the live devicetree.  When a node in the overlay
+ * subtree does not exist at the same level in the live devicetree, target->np
+ * points to a newly allocated node, and all subsequent targets in the subtree
+ * will be newly allocated nodes.
+ */
+struct target {
+   struct device_node *np;
+   bool in_livetree;
+};
+
+/**
  * struct fragment - info about fragment nodes in overlay expanded device tree
  * @target:target of the overlay operation
  * @overlay:   pointer to the __overlay__ node
@@ -72,8 +92,7 @@ static int devicetree_corrupt(void)
 }
 
 static int build_changeset_next_level(struct overlay_changeset *ovcs,
-   struct device_node *target_node,
-   const struct device_node *overlay_node);
+   struct target *target, const struct device_node *overlay_node);
 
 /*
  * of_resolve_phandles() finds the largest phandle in the live tree.
@@ -257,14 +276,17 @@ static struct property *dup_and_fixup_symbol_prop(
 /**
  * add_changeset_property() - add @overlay_prop to overlay changeset
  * @ovcs:  overlay changeset
- * @target_node:   where to place @overlay_prop in live tree
+ * @target:where @overlay_prop will be placed
  * @overlay_prop:  property to add or update, from overlay tree
  * @is_symbols_prop:   1 if @overlay_prop is from node "/__symbols__"
  *
- * If @overlay_prop does not already exist in @target_node, add changeset entry
- * to add @overlay_prop in @target_node, else add changeset entry to update
+ * If @overlay_prop does not already exist in live devicetree, add changeset
+ * entry to add @overlay_prop in @target, else add changeset entry to update
  * value of @overlay_prop.
  *
+ * @target may be either in the live devicetree or in a new subtree that
+ * is contained in the changeset.
+ *
  * Some special properties are not updated (no error returned).
  *
  * Update of property in symbols node is not allowed.
@@ -273,20 +295,22 @@ static struct property *dup_and_fixup_symbol_prop(
  * invalid @overlay.
  */
 static int add_changeset_property(struct overlay_changeset *ovcs,
-   struct device_node *target_node,
-   struct property *overlay_prop,
+   struct target *target, struct property *overlay_prop,
bool is_symbols_prop)
 {
struct property *new_prop = NULL, *prop;
int ret = 0;
 
-   prop = of_find_property(target_node, overlay_prop->name, NULL);
-
if (!of_prop_cmp(overlay_prop->name, "name") ||
!of_prop_cmp(overlay_prop->name, "phandle") ||
!of_prop_cmp(overlay_prop->name, "linux,phandle"))
return 0;
 
+   if (target->in_livetree)
+   prop = of_find_property(target->np, overlay_prop->name, NULL);
+   else
+   prop = NULL;
+
if (is_symbols_prop) {
if (prop)

[PATCH v5 12/18] of: overlay: check prevents multiple fragments add or delete same node

2018-10-18 Thread frowand . list

From: Frank Rowand 

Multiple overlay fragments adding or deleting the same node is not
supported.  Replace code comment of such, with check to detect the
attempt and fail the overlay apply.

Devicetree unittest where multiple fragments added the same node was
added in the previous patch in the series.  After applying this patch
the unittest messages will no longer include:

   Duplicate name in motor-1, renamed to "controller#1"
   OF: overlay: of_overlay_apply() err=0
   ### dt-test ### of_overlay_fdt_apply() expected -22, ret=0, 
overlay_bad_add_dup_node
   ### dt-test ### FAIL of_unittest_overlay_high_level():2419 Adding overlay 
'overlay_bad_add_dup_node' failed

   ...

   ### dt-test ### end of unittest - 210 passed, 1 failed

but will instead include:

   OF: overlay: ERROR: multiple overlay fragments add and/or delete node 
/testcase-data-2/substation@100/motor-1/controller

   ...

   ### dt-test ### end of unittest - 211 passed, 0 failed

Signed-off-by: Frank Rowand 
---

checkpatch errors "line over 80 characters" and "Too many leading tabs"
are ok, they will be fixed later in this series

 drivers/of/overlay.c | 58 
 1 file changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 88e346234ae7..e12f2c28a36a 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -392,14 +392,6 @@ static int add_changeset_property(struct overlay_changeset 
*ovcs,
  *   a live devicetree created from Open Firmware.
  *
  * NOTE_2: Multiple mods of created nodes not supported.
- *   If more than one fragment contains a node that does not already exist
- *   in the live tree, then for each fragment of_changeset_attach_node()
- *   will add a changeset entry to add the node.  When the changeset is
- *   applied, __of_attach_node() will attach the node twice (once for
- *   each fragment).  At this point the device tree will be corrupted.
- *
- *   TODO: add integrity check to ensure that multiple fragments do not
- * create the same node.
  *
  * Returns 0 on success, -ENOMEM if memory allocation failure, or -EINVAL if
  * invalid @overlay.
@@ -517,6 +509,54 @@ static int build_changeset_symbols_node(struct 
overlay_changeset *ovcs,
 }
 
 /**
+ * check_changeset_dup_add_node() - changeset validation: duplicate add node
+ * @ovcs:  Overlay changeset
+ *
+ * Check changeset @ovcs->cset for multiple add node entries for the same
+ * node.
+ *
+ * Returns 0 on success, -ENOMEM if memory allocation failure, or -EINVAL if
+ * invalid overlay in @ovcs->fragments[].
+ */
+static int check_changeset_dup_add_node(struct overlay_changeset *ovcs)
+{
+   struct of_changeset_entry *ce_1, *ce_2;
+   char *fn_1, *fn_2;
+   int name_match;
+
+   list_for_each_entry(ce_1, >cset.entries, node) {
+
+   if (ce_1->action == OF_RECONFIG_ATTACH_NODE ||
+   ce_1->action == OF_RECONFIG_DETACH_NODE) {
+
+   ce_2 = ce_1;
+   list_for_each_entry_continue(ce_2, >cset.entries, 
node) {
+   if (ce_2->action == OF_RECONFIG_ATTACH_NODE ||
+   ce_2->action == OF_RECONFIG_DETACH_NODE) {
+   /* inexpensive name compare */
+   if (!of_node_cmp(ce_1->np->full_name,
+   ce_2->np->full_name)) {
+   /* expensive full path name 
compare */
+   fn_1 = kasprintf(GFP_KERNEL, 
"%pOF", ce_1->np);
+   fn_2 = kasprintf(GFP_KERNEL, 
"%pOF", ce_2->np);
+   name_match = !strcmp(fn_1, 
fn_2);
+   kfree(fn_1);
+   kfree(fn_2);
+   if (name_match) {
+   pr_err("ERROR: multiple 
overlay fragments add and/or delete node %pOF\n",
+  ce_1->np);
+   return -EINVAL;
+   }
+   }
+   }
+   }
+   }
+   }
+
+   return 0;
+}
+
+/**
  * build_changeset() - populate overlay changeset in @ovcs from 
@ovcs->fragments
  * @ovcs:  Overlay changeset
  *
@@ -571,7 +611,7 @@ static int build_changeset(struct overlay_changeset *ovcs)
}
}
 
-   return 0;
+   return check_changeset_dup_add_node(ovcs);
 }
 
 /*
-- 
Frank Rowand

[PATCH v5 11/18] of: overlay: test case of two fragments adding same node

2018-10-18 Thread frowand . list

From: Frank Rowand 

Multiple overlay fragments adding or deleting the same node is not
supported.  An attempt to do so results in an incorrect devicetree.
The node name will be munged for the second add.

After adding this patch, the unittest messages will show:

   Duplicate name in motor-1, renamed to "controller#1"
   OF: overlay: of_overlay_apply() err=0
   ### dt-test ### of_overlay_fdt_apply() expected -22, ret=0, 
overlay_bad_add_dup_node
   ### dt-test ### FAIL of_unittest_overlay_high_level():2419 Adding overlay 
'overlay_bad_add_dup_node' failed

   ...

   ### dt-test ### end of unittest - 210 passed, 1 failed

The incorrect (munged) node name "controller#1" can be seen in the
/proc filesystem:

   $ pwd
   /proc/device-tree/testcase-data-2/substation@100/motor-1
   $ ls
   compatiblecontrollercontroller#1  name  phandle   spin
   $ ls controller
   power_bus
   $ ls controller#1
   power_bus_emergency

Signed-off-by: Frank Rowand 
---
 drivers/of/unittest-data/Makefile  |  1 +
 .../of/unittest-data/overlay_bad_add_dup_node.dts  | 28 ++
 drivers/of/unittest.c  |  5 
 3 files changed, 34 insertions(+)
 create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_node.dts

diff --git a/drivers/of/unittest-data/Makefile 
b/drivers/of/unittest-data/Makefile
index 013d85e694c6..166dbdbfd1c5 100644
--- a/drivers/of/unittest-data/Makefile
+++ b/drivers/of/unittest-data/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_OF_OVERLAY) += overlay.dtb.o \
overlay_12.dtb.o \
overlay_13.dtb.o \
overlay_15.dtb.o \
+   overlay_bad_add_dup_node.dtb.o \
overlay_bad_phandle.dtb.o \
overlay_bad_symbol.dtb.o \
overlay_base.dtb.o
diff --git a/drivers/of/unittest-data/overlay_bad_add_dup_node.dts 
b/drivers/of/unittest-data/overlay_bad_add_dup_node.dts
new file mode 100644
index ..145dfc3b1024
--- /dev/null
+++ b/drivers/of/unittest-data/overlay_bad_add_dup_node.dts
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/dts-v1/;
+/plugin/;
+
+/*
+ * _1/motor-1 and _ctrl_1 are the same node:
+ *   /testcase-data-2/substation@100/motor-1
+ *
+ * Thus the new node "controller" in each fragment will
+ * result in an attempt to add the same node twice.
+ * This will result in an error and the overlay apply
+ * will fail.
+ */
+
+_1 {
+
+   motor-1 {
+   controller {
+   power_bus = < 0x1 0x2 >;
+   };
+   };
+};
+
+_ctrl_1 {
+   controller {
+   power_bus_emergency = < 0x101 0x102 >;
+   };
+};
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 722537e14848..471b8eb6e842 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -2147,6 +2147,7 @@ struct overlay_info {
 OVERLAY_INFO_EXTERN(overlay_12);
 OVERLAY_INFO_EXTERN(overlay_13);
 OVERLAY_INFO_EXTERN(overlay_15);
+OVERLAY_INFO_EXTERN(overlay_bad_add_dup_node);
 OVERLAY_INFO_EXTERN(overlay_bad_phandle);
 OVERLAY_INFO_EXTERN(overlay_bad_symbol);
 
@@ -2169,6 +2170,7 @@ struct overlay_info {
OVERLAY_INFO(overlay_12, 0),
OVERLAY_INFO(overlay_13, 0),
OVERLAY_INFO(overlay_15, 0),
+   OVERLAY_INFO(overlay_bad_add_dup_node, -EINVAL),
OVERLAY_INFO(overlay_bad_phandle, -EINVAL),
OVERLAY_INFO(overlay_bad_symbol, -EINVAL),
{}
@@ -2413,6 +2415,9 @@ static __init void of_unittest_overlay_high_level(void)
unittest(overlay_data_apply("overlay", NULL),
 "Adding overlay 'overlay' failed\n");
 
+   unittest(overlay_data_apply("overlay_bad_add_dup_node", NULL),
+"Adding overlay 'overlay_bad_add_dup_node' failed\n");
+
unittest(overlay_data_apply("overlay_bad_phandle", NULL),
 "Adding overlay 'overlay_bad_phandle' failed\n");
 
-- 
Frank Rowand

[PATCH v5 10/18] of: overlay: make all pr_debug() and pr_err() messages unique

2018-10-18 Thread frowand . list

From: Frank Rowand 

Make overlay.c debug and error messages unique so that they can be
unambiguously found by grep.

Signed-off-by: Frank Rowand 
---
 drivers/of/overlay.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index e20d8923f475..88e346234ae7 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -507,7 +507,7 @@ static int build_changeset_symbols_node(struct 
overlay_changeset *ovcs,
for_each_property_of_node(overlay_symbols_node, prop) {
ret = add_changeset_property(ovcs, target, prop, 1);
if (ret) {
-   pr_debug("Failed to apply prop @%pOF/%s, err=%d\n",
+   pr_debug("Failed to apply symbols prop @%pOF/%s, 
err=%d\n",
 target->np, prop->name, ret);
return ret;
}
@@ -551,7 +551,8 @@ static int build_changeset(struct overlay_changeset *ovcs)
ret = build_changeset_next_level(ovcs, ,
 fragment->overlay);
if (ret) {
-   pr_debug("apply failed '%pOF'\n", fragment->target);
+   pr_debug("fragment apply failed '%pOF'\n",
+fragment->target);
return ret;
}
}
@@ -564,7 +565,8 @@ static int build_changeset(struct overlay_changeset *ovcs)
ret = build_changeset_symbols_node(ovcs, ,
   fragment->overlay);
if (ret) {
-   pr_debug("apply failed '%pOF'\n", fragment->target);
+   pr_debug("symbols fragment apply failed '%pOF'\n",
+fragment->target);
return ret;
}
}
@@ -873,7 +875,7 @@ static int of_overlay_apply(const void *fdt, struct 
device_node *tree,
 
ret = __of_changeset_apply_notify(>cset);
if (ret)
-   pr_err("overlay changeset entry notify error %d\n", ret);
+   pr_err("overlay apply changeset entry notify error %d\n", ret);
/* notify failure is not fatal, continue */
 
list_add_tail(>ovcs_list, _list);
@@ -1132,7 +1134,7 @@ int of_overlay_remove(int *ovcs_id)
 
ret = __of_changeset_revert_notify(>cset);
if (ret)
-   pr_err("overlay changeset entry notify error %d\n", ret);
+   pr_err("overlay remove changeset entry notify error %d\n", ret);
/* notify failure is not fatal, continue */
 
*ovcs_id = 0;
-- 
Frank Rowand

[PATCH v5 09/18] of: overlay: validate overlay properties #address-cells and #size-cells

2018-10-18 Thread frowand . list

From: Frank Rowand 

If overlay properties #address-cells or #size-cells are already in
the live devicetree for any given node, then the values in the
overlay must match the values in the live tree.

If the properties are already in the live tree then there is no
need to create a changeset entry to add them since they must
have the same value.  This reduces the memory used by the
changeset and eliminates a possible memory leak.

Signed-off-by: Frank Rowand 
---

Changes since v4:
  - create of_prop_val_eq() and change open code to use it
  - remove extra blank lines

 drivers/of/overlay.c | 32 +---
 include/linux/of.h   |  6 ++
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 272a0d1a5e18..e20d8923f475 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop(
  * @target may be either in the live devicetree or in a new subtree that
  * is contained in the changeset.
  *
- * Some special properties are not updated (no error returned).
+ * Some special properties are not added or updated (no error returned):
+ * "name", "phandle", "linux,phandle".
+ *
+ * Properties "#address-cells" and "#size-cells" are not updated if they
+ * are already in the live tree, but if present in the live tree, the values
+ * in the overlay must match the values in the live tree.
  *
  * Update of property in symbols node is not allowed.
  *
@@ -300,6 +305,7 @@ static int add_changeset_property(struct overlay_changeset 
*ovcs,
 {
struct property *new_prop = NULL, *prop;
int ret = 0;
+   bool check_for_non_overlay_node = false;
 
if (!of_prop_cmp(overlay_prop->name, "name") ||
!of_prop_cmp(overlay_prop->name, "phandle") ||
@@ -322,12 +328,32 @@ static int add_changeset_property(struct 
overlay_changeset *ovcs,
if (!new_prop)
return -ENOMEM;
 
-   if (!prop)
+   if (!prop) {
+   check_for_non_overlay_node = true;
ret = of_changeset_add_property(>cset, target->np,
new_prop);
-   else
+   } else if (!of_prop_cmp(prop->name, "#address-cells")) {
+   if (!of_prop_val_eq(prop, new_prop)) {
+   pr_err("ERROR: changing value of #address-cells is not 
allowed in %pOF\n",
+  target->np);
+   ret = -EINVAL;
+   }
+   } else if (!of_prop_cmp(prop->name, "#size-cells")) {
+   if (!of_prop_val_eq(prop, new_prop)) {
+   pr_err("ERROR: changing value of #size-cells is not 
allowed in %pOF\n",
+  target->np);
+   ret = -EINVAL;
+   }
+   } else {
+   check_for_non_overlay_node = true;
ret = of_changeset_update_property(>cset, target->np,
   new_prop);
+   }
+
+   if (check_for_non_overlay_node &&
+   !of_node_check_flag(target->np, OF_OVERLAY))
+   pr_err("WARNING: memory leak will occur if overlay removed, 
property: %pOF/%s\n",
+  target->np, new_prop->name);
 
if (ret) {
kfree(new_prop->name);
diff --git a/include/linux/of.h b/include/linux/of.h
index 72c593455019..1bb14a1f7227 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -947,6 +947,12 @@ static inline int of_cpu_node_to_id(struct device_node *np)
 #define of_node_cmp(s1, s2)strcasecmp((s1), (s2))
 #endif
 
+static inline int of_prop_val_eq(struct property *p1, struct property *p2)
+{
+   return p1->length == p2->length &&
+  !memcmp(p1->value, p2->value, (size_t)p1->length);
+}
+
 #if defined(CONFIG_OF) && defined(CONFIG_NUMA)
 extern int of_node_to_nid(struct device_node *np);
 #else
-- 
Frank Rowand

[PATCH v5 08/18] of: overlay: reorder fields in struct fragment

2018-10-18 Thread frowand . list

From: Frank Rowand 

Order the fields of struct fragment in the same order as
struct of_overlay_notify_data.  The order in struct fragment is
not significant.  If both structs are ordered the same then when
examining the data in a debugger or dump the human involved does
not have to remember which context they are examining.

Signed-off-by: Frank Rowand 
---
 drivers/of/overlay.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 7fcf4a812d06..272a0d1a5e18 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -49,8 +49,8 @@ struct target {
  * @overlay:   pointer to the __overlay__ node
  */
 struct fragment {
-   struct device_node *target;
struct device_node *overlay;
+   struct device_node *target;
 };
 
 /**
-- 
Frank Rowand

[PATCH v5 07/18] of: dynamic: change type of of_{at, de}tach_node() to void

2018-10-18 Thread frowand . list

From: Frank Rowand 

of_attach_node() and of_detach_node() always return zero, so
their return value is meaningless.  Change their type to void
and fix all callers to ignore return value.

Signed-off-by: Frank Rowand 
---
 arch/powerpc/platforms/pseries/dlpar.c| 13 ++---
 arch/powerpc/platforms/pseries/reconfig.c |  6 +-
 drivers/of/dynamic.c  |  9 ++---
 include/linux/of.h|  4 ++--
 4 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index e3010b14aea5..0027eea94a8b 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -244,15 +244,9 @@ struct device_node *dlpar_configure_connector(__be32 
drc_index,
 
 int dlpar_attach_node(struct device_node *dn, struct device_node *parent)
 {
-   int rc;
-
dn->parent = parent;
 
-   rc = of_attach_node(dn);
-   if (rc) {
-   printk(KERN_ERR "Failed to add device node %pOF\n", dn);
-   return rc;
-   }
+   of_attach_node(dn);
 
return 0;
 }
@@ -260,7 +254,6 @@ int dlpar_attach_node(struct device_node *dn, struct 
device_node *parent)
 int dlpar_detach_node(struct device_node *dn)
 {
struct device_node *child;
-   int rc;
 
child = of_get_next_child(dn, NULL);
while (child) {
@@ -268,9 +261,7 @@ int dlpar_detach_node(struct device_node *dn)
child = of_get_next_child(dn, child);
}
 
-   rc = of_detach_node(dn);
-   if (rc)
-   return rc;
+   of_detach_node(dn);
 
of_node_put(dn);
 
diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
b/arch/powerpc/platforms/pseries/reconfig.c
index 0e0208117e77..0b72098da454 100644
--- a/arch/powerpc/platforms/pseries/reconfig.c
+++ b/arch/powerpc/platforms/pseries/reconfig.c
@@ -47,11 +47,7 @@ static int pSeries_reconfig_add_node(const char *path, 
struct property *proplist
goto out_err;
}
 
-   err = of_attach_node(np);
-   if (err) {
-   printk(KERN_ERR "Failed to add device node %s\n", path);
-   goto out_err;
-   }
+   of_attach_node(np);
 
of_node_put(np->parent);
 
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 146681540487..beea792debb6 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -224,7 +224,7 @@ static void __of_attach_node(struct device_node *np)
 /**
  * of_attach_node() - Plug a device node into the tree and global list.
  */
-int of_attach_node(struct device_node *np)
+void of_attach_node(struct device_node *np)
 {
struct of_reconfig_data rd;
unsigned long flags;
@@ -241,8 +241,6 @@ int of_attach_node(struct device_node *np)
mutex_unlock(_mutex);
 
of_reconfig_notify(OF_RECONFIG_ATTACH_NODE, );
-
-   return 0;
 }
 
 void __of_detach_node(struct device_node *np)
@@ -273,11 +271,10 @@ void __of_detach_node(struct device_node *np)
 /**
  * of_detach_node() - "Unplug" a node from the device tree.
  */
-int of_detach_node(struct device_node *np)
+void of_detach_node(struct device_node *np)
 {
struct of_reconfig_data rd;
unsigned long flags;
-   int rc = 0;
 
memset(, 0, sizeof(rd));
rd.dn = np;
@@ -291,8 +288,6 @@ int of_detach_node(struct device_node *np)
mutex_unlock(_mutex);
 
of_reconfig_notify(OF_RECONFIG_DETACH_NODE, );
-
-   return rc;
 }
 EXPORT_SYMBOL_GPL(of_detach_node);
 
diff --git a/include/linux/of.h b/include/linux/of.h
index aa1dafaec6ae..72c593455019 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -406,8 +406,8 @@ extern int of_phandle_iterator_args(struct 
of_phandle_iterator *it,
 #define OF_RECONFIG_REMOVE_PROPERTY0x0004
 #define OF_RECONFIG_UPDATE_PROPERTY0x0005
 
-extern int of_attach_node(struct device_node *);
-extern int of_detach_node(struct device_node *);
+extern void of_attach_node(struct device_node *np);
+extern void of_detach_node(struct device_node *np);
 
 #define of_match_ptr(_ptr) (_ptr)
 
-- 
Frank Rowand

[PATCH v5 06/18] of: overlay: do not duplicate properties from overlay for new nodes

2018-10-18 Thread frowand . list

From: Frank Rowand 

When allocating a new node, add_changeset_node() was duplicating the
properties from the respective node in the overlay instead of
allocating a node with no properties.

When this patch is applied the errors reported by the devictree
unittest from patch "of: overlay: add tests to validate kfrees from
overlay removal" will no longer occur.  These error messages are of
the form:

   "OF: ERROR: ..."

and the unittest results will change from:

   ### dt-test ### end of unittest - 203 passed, 7 failed

to

   ### dt-test ### end of unittest - 210 passed, 0 failed

Signed-off-by: Frank Rowand 
---
 drivers/of/overlay.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 94740f4ee34c..7fcf4a812d06 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -393,7 +393,7 @@ static int add_changeset_node(struct overlay_changeset 
*ovcs,
break;
 
if (!tchild) {
-   tchild = __of_node_dup(node, node_kbasename);
+   tchild = __of_node_dup(NULL, node_kbasename);
if (!tchild)
return -ENOMEM;
 
-- 
Frank Rowand

[PATCH v5 04/18] powerpc/pseries: add of_node_put() in dlpar_detach_node()

2018-10-18 Thread frowand . list

From: Frank Rowand 

"of: overlay: add missing of_node_get() in __of_attach_node_sysfs"
added a missing of_node_get() to __of_attach_node_sysfs().  This
results in a refcount imbalance for nodes attached with
dlpar_attach_node().  The calling sequence from dlpar_attach_node()
to __of_attach_node_sysfs() is:

   dlpar_attach_node()
  of_attach_node()
 __of_attach_node_sysfs()

Signed-off-by: Frank Rowand 
---

* UNTESTED.  I need people with the affected PowerPC systems
*(systems that dynamically allocate and deallocate
*devicetree nodes) to test this patch.

 arch/powerpc/platforms/pseries/dlpar.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index a0b20c03f078..e3010b14aea5 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -272,6 +272,8 @@ int dlpar_detach_node(struct device_node *dn)
if (rc)
return rc;
 
+   of_node_put(dn);
+
return 0;
 }
 
-- 
Frank Rowand

[PATCH v5 03/18] of: overlay: add missing of_node_get() in __of_attach_node_sysfs

2018-10-18 Thread frowand . list

From: Frank Rowand 

There is a matching of_node_put() in __of_detach_node_sysfs()

Remove misleading comment from function header comment for
of_detach_node().

This patch may result in memory leaks from code that directly calls
the dynamic node add and delete functions directly instead of
using changesets.

Signed-off-by: Frank Rowand 
---

This patch should result in powerpc systems that dynamically
allocate a node, then later deallocate the node to have a
memory leak when the node is deallocated.

The next patch in the series will fix the leak.

 drivers/of/dynamic.c | 3 ---
 drivers/of/kobj.c| 4 +++-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 12c3f9a15e94..146681540487 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -272,9 +272,6 @@ void __of_detach_node(struct device_node *np)
 
 /**
  * of_detach_node() - "Unplug" a node from the device tree.
- *
- * The caller must hold a reference to the node.  The memory associated with
- * the node is not freed until its refcount goes to zero.
  */
 int of_detach_node(struct device_node *np)
 {
diff --git a/drivers/of/kobj.c b/drivers/of/kobj.c
index 7a0a18980b98..c72eef988041 100644
--- a/drivers/of/kobj.c
+++ b/drivers/of/kobj.c
@@ -133,6 +133,9 @@ int __of_attach_node_sysfs(struct device_node *np)
}
if (!name)
return -ENOMEM;
+
+   of_node_get(np);
+
rc = kobject_add(>kobj, parent, "%s", name);
kfree(name);
if (rc)
@@ -159,6 +162,5 @@ void __of_detach_node_sysfs(struct device_node *np)
kobject_del(>kobj);
}
 
-   /* finally remove the kobj_init ref */
of_node_put(np);
 }
-- 
Frank Rowand

[PATCH v5 02/18] of: overlay: add missing of_node_put() after add new node to changeset

2018-10-18 Thread frowand . list

From: Frank Rowand 

The refcount of a newly added overlay node decrements to one
(instead of zero) when the overlay changeset is destroyed.  This
change will cause the final decrement be to zero.

After applying this patch, new validation warnings will be
reported from the devicetree unittest during boot due to
a pre-existing devicetree bug.  The warnings will be similar to:

  OF: ERROR: memory leak before free overlay changeset,  
/testcase-data/overlay-node/test-bus/test-unittest4

This pre-existing devicetree bug will also trigger a WARN_ONCE() from
refcount_sub_and_test_checked() when an overlay changeset is
destroyed without having first been applied.  This scenario occurs
when an error in the overlay is detected during the overlay changeset
creation:

  WARNING: CPU: 0 PID: 1 at lib/refcount.c:187 
refcount_sub_and_test_checked+0xa8/0xbc
  refcount_t: underflow; use-after-free.

  (unwind_backtrace) from (show_stack+0x10/0x14)
  (show_stack) from (dump_stack+0x6c/0x8c)
  (dump_stack) from (__warn+0xdc/0x104)
  (__warn) from (warn_slowpath_fmt+0x44/0x6c)
  (warn_slowpath_fmt) from (refcount_sub_and_test_checked+0xa8/0xbc)
  (refcount_sub_and_test_checked) from (kobject_put+0x24/0x208)
  (kobject_put) from (of_changeset_destroy+0x2c/0xb4)
  (of_changeset_destroy) from (free_overlay_changeset+0x1c/0x9c)
  (free_overlay_changeset) from (of_overlay_remove+0x284/0x2cc)
  (of_overlay_remove) from 
(of_unittest_apply_revert_overlay_check.constprop.4+0xf8/0x1e8)
  (of_unittest_apply_revert_overlay_check.constprop.4) from 
(of_unittest_overlay+0x960/0xed8)
  (of_unittest_overlay) from (of_unittest+0x1cc4/0x2138)
  (of_unittest) from (do_one_initcall+0x4c/0x28c)
  (do_one_initcall) from (kernel_init_freeable+0x29c/0x378)
  (kernel_init_freeable) from (kernel_init+0x8/0x110)
  (kernel_init) from (ret_from_fork+0x14/0x2c)

Signed-off-by: Frank Rowand 
---
 drivers/of/overlay.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 1176cb4b6e4e..32cfee68f2e3 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -379,7 +379,9 @@ static int add_changeset_node(struct overlay_changeset 
*ovcs,
if (ret)
return ret;
 
-   return build_changeset_next_level(ovcs, tchild, node);
+   ret = build_changeset_next_level(ovcs, tchild, node);
+   of_node_put(tchild);
+   return ret;
}
 
if (node->phandle && tchild->phandle)
-- 
Frank Rowand

[PATCH v5 00/18] of: overlay: validation checks, subsequent fixes

2018-10-18 Thread frowand . list

From: Frank Rowand 

Add checks to (1) overlay apply process and (2) memory freeing
triggered by overlay release.  The checks are intended to detect
possible memory leaks and invalid overlays.

The checks revealed bugs in existing code.  Fixed the bugs.

While fixing bugs, noted other issues, which are fixed in
separate patches.

*  Powerpc folks: I was not able to test the patches that
*  directly impact Powerpc systems that use dynamic
*  devicetree.  Please review that code carefully and
*  test.  The specific patches are: 03/16, 04/16, 07/16

FPGA folks:

  I made the validation checks that should result in an
  invalid live devicetree report "ERROR" and cause the overlay apply
  to fail.

  I made the memory leak validation tests report "WARNING" and allow
  the overlay apply to complete successfully.  Please let me know
  if you encounter the warnings.  There are at least two paths
  forward to deal with the cases that trigger the warning: (1) change
  the warning to an error and fail the overlay apply, or (2) find a
  way to detect the potential memory leaks and free the memory
  appropriately.

ALL people:

  The validations do _not_ address another major concern I have with
  releasing overlays, which is use after free errors.

Changes since v4:
  - 01/18: make error message format consistent, error first, path last
  - 09/18: create of_prop_val_eq() and change open code to use it
  - 09/18: remove extra blank lines

Changes since v3:
  - 01/18: Add expected value of refcount for destroy cset entry error.  Also
explain the cause of the error.

  - 09/18: for errors of an overlay changing the value of #size-cells or
#address-cells, return -EINVAL so that overlay apply will fail
  - 09/18: for errors of an overlay changing the value of #size-cells or
#address-cells, make the message more direct.
Old message:
  OF: overlay: ERROR: overlay and/or live tree #size-cells invalid in node 
/soc/base_fpga_region
New message:
  OF: overlay: ERROR: changing value of /soc/base_fpga_region/#size-cells 
not allowed

  - 13/18: Update patch comment header to state that this patch modifies the
previous patch to not return immediately on fragment error and
explain this is not a performance issue.
  - 13/18: remove redundant "overlay" from two error messages.  "OF: overlay:"
is already present in pr_fmt()

Changes since v2:

  - 13/18: Use continue to reduce indentation in find_dup_cset_node_entry()
and find_dup_cset_prop()

Changes since v1:

  - move patch 16/16 to 17/18
  - move patch 15/16 to 18/18
  - new patch 15/18
  - new patch 16/18

  - 05/18: add_changeset_node() header comment: incorrect comment for @target

  - 18/18: add same fix for of_parse_phandle_with_args()
  - 18/18: add same fix for of_parse_phandle_with_args_map()


*** BLURB HERE ***

Frank Rowand (18):
  of: overlay: add tests to validate kfrees from overlay removal
  of: overlay: add missing of_node_put() after add new node to changeset
  of: overlay: add missing of_node_get() in __of_attach_node_sysfs
  powerpc/pseries: add of_node_put() in dlpar_detach_node()
  of: overlay: use prop add changeset entry for property in new nodes
  of: overlay: do not duplicate properties from overlay for new nodes
  of: dynamic: change type of of_{at,de}tach_node() to void
  of: overlay: reorder fields in struct fragment
  of: overlay: validate overlay properties #address-cells and
#size-cells
  of: overlay: make all pr_debug() and pr_err() messages unique
  of: overlay: test case of two fragments adding same node
  of: overlay: check prevents multiple fragments add or delete same node
  of: overlay: check prevents multiple fragments touching same property
  of: unittest: remove unused of_unittest_apply_overlay() argument
  of: overlay: set node fields from properties when add new overlay node
  of: unittest: allow base devicetree to have symbol metadata
  of: unittest: find overlays[] entry by name instead of index
  of: unittest: initialize args before calling of_*parse_*()

 arch/powerpc/platforms/pseries/dlpar.c |  15 +-
 arch/powerpc/platforms/pseries/reconfig.c  |   6 +-
 drivers/of/dynamic.c   |  68 +++--
 drivers/of/kobj.c  |   4 +-
 drivers/of/overlay.c   | 292 -
 drivers/of/unittest-data/Makefile  |   2 +
 .../of/unittest-data/overlay_bad_add_dup_node.dts  |  28 ++
 .../of/unittest-data/overlay_bad_add_dup_prop.dts  |  24 ++
 drivers/of/unittest-data/overlay_base.dts  |   1 +
 drivers/of/unittest.c  |  96 +--
 include/linux/of.h |  25 +-
 11 files changed, 439 insertions(+), 122 deletions(-)
 create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_node.dts
 create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_prop.dts

-- 
Frank Rowand

[PATCH v5 01/18] of: overlay: add tests to validate kfrees from overlay removal

2018-10-18 Thread frowand . list

From: Frank Rowand 

Add checks:
  - attempted kfree due to refcount reaching zero before overlay
is removed
  - properties linked to an overlay node when the node is removed
  - node refcount > one during node removal in a changeset destroy,
if the node was created by the changeset

After applying this patch, several validation warnings will be
reported from the devicetree unittest during boot due to
pre-existing devicetree bugs. The warnings will be similar to:

  OF: ERROR: of_node_release(), unexpected properties in 
/testcase-data/overlay-node/test-bus/test-unittest11
  OF: ERROR: memory leak, expected refcount 1 instead of 2, 
of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay 
node /testcase-data-2/substation@100/
  hvac-medium-2

Signed-off-by: Frank Rowand 
---

Changes since v4:
  - make error message format consistent, error first, path last

 drivers/of/dynamic.c | 29 +
 drivers/of/overlay.c |  1 +
 include/linux/of.h   | 15 ++-
 3 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index f4f8ed9b5454..12c3f9a15e94 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -330,6 +330,25 @@ void of_node_release(struct kobject *kobj)
if (!of_node_check_flag(node, OF_DYNAMIC))
return;
 
+   if (of_node_check_flag(node, OF_OVERLAY)) {
+
+   if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) {
+   /* premature refcount of zero, do not free memory */
+   pr_err("ERROR: memory leak before free overlay 
changeset,  %pOF\n",
+  node);
+   return;
+   }
+
+   /*
+* If node->properties non-empty then properties were added
+* to this node either by different overlay that has not
+* yet been removed, or by a non-overlay mechanism.
+*/
+   if (node->properties)
+   pr_err("ERROR: %s(), unexpected properties in %pOF\n",
+  __func__, node);
+   }
+
property_list_free(node->properties);
property_list_free(node->deadprops);
 
@@ -434,6 +453,16 @@ struct device_node *__of_node_dup(const struct device_node 
*np,
 
 static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
 {
+   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
+   of_node_check_flag(ce->np, OF_OVERLAY)) {
+   if (kref_read(>np->kobj.kref) > 1) {
+   pr_err("ERROR: memory leak, expected refcount 1 instead 
of %d, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach 
overlay node %pOF\n",
+  kref_read(>np->kobj.kref), ce->np);
+   } else {
+   of_node_set_flag(ce->np, OF_OVERLAY_FREE_CSET);
+   }
+   }
+
of_node_put(ce->np);
list_del(>node);
kfree(ce);
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index eda57ef12fd0..1176cb4b6e4e 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -373,6 +373,7 @@ static int add_changeset_node(struct overlay_changeset 
*ovcs,
return -ENOMEM;
 
tchild->parent = target_node;
+   of_node_set_flag(tchild, OF_OVERLAY);
 
ret = of_changeset_attach_node(>cset, tchild);
if (ret)
diff --git a/include/linux/of.h b/include/linux/of.h
index 4d25e4f952d9..aa1dafaec6ae 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -138,11 +138,16 @@ static inline void of_node_put(struct device_node *node) 
{ }
 extern struct device_node *of_stdout;
 extern raw_spinlock_t devtree_lock;
 
-/* flag descriptions (need to be visible even when !CONFIG_OF) */
-#define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */
-#define OF_DETACHED2 /* node has been detached from the device tree */
-#define OF_POPULATED   3 /* device already created for the node */
-#define OF_POPULATED_BUS   4 /* of_platform_populate recursed to children 
of this node */
+/*
+ * struct device_node flag descriptions
+ * (need to be visible even when !CONFIG_OF)
+ */
+#define OF_DYNAMIC 1 /* (and properties) allocated via kmalloc */
+#define OF_DETACHED2 /* detached from the device tree */
+#define OF_POPULATED   3 /* device already created */
+#define OF_POPULATED_BUS   4 /* platform bus created for children */
+#define OF_OVERLAY 5 /* allocated for an overlay */
+#define OF_OVERLAY_FREE_CSET   6 /* in overlay cset being freed */
 
 #define OF_BAD_ADDR((u64)-1)
 
-- 
Frank Rowand

Re: [PATCH v4 01/18] of: overlay: add tests to validate kfrees from overlay removal

2018-10-18 Thread Alan Tull

On Wed, Oct 17, 2018 at 4:30 PM Alan Tull  wrote:
>
> On Mon, Oct 15, 2018 at 9:39 PM  wrote:
>
> Hi Frank,
>
> >
> > From: Frank Rowand 
> >
> > Add checks:
> >   - attempted kfree due to refcount reaching zero before overlay
> > is removed
> >   - properties linked to an overlay node when the node is removed
> >   - node refcount > one during node removal in a changeset destroy,
> > if the node was created by the changeset
> >
> > After applying this patch, several validation warnings will be
> > reported from the devicetree unittest during boot due to
> > pre-existing devicetree bugs. The warnings will be similar to:
> >
> >   OF: ERROR: of_node_release() overlay node 
> > /testcase-data/overlay-node/test-bus/test-unittest11/test-unittest111 
> > contains unexpected properties
> >   OF: ERROR: memory leak - destroy cset entry: attach overlay node 
> > /testcase-data-2/substation@100/hvac-medium-2 expected refcount 1 instead 
> > of 2.  of_node_get() / of_node_put() are unbalanced for this node.
> >
> > Signed-off-by: Frank Rowand 
> > ---
> > Changes since v3:
> >   - Add expected value of refcount for destroy cset entry error.  Also
> > explain the cause of the error.
> >
> >  drivers/of/dynamic.c | 29 +
> >  drivers/of/overlay.c |  1 +
> >  include/linux/of.h   | 15 ++-
> >  3 files changed, 40 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > index f4f8ed9b5454..24c97b7a050f 100644
> > --- a/drivers/of/dynamic.c
> > +++ b/drivers/of/dynamic.c
> > @@ -330,6 +330,25 @@ void of_node_release(struct kobject *kobj)
> > if (!of_node_check_flag(node, OF_DYNAMIC))
> > return;
> >
> > +   if (of_node_check_flag(node, OF_OVERLAY)) {
> > +
> > +   if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) {
> > +   /* premature refcount of zero, do not free memory */
> > +   pr_err("ERROR: memory leak %s() overlay node %pOF 
> > before free overlay changeset\n",
> > +  __func__, node);
> > +   return;
> > +   }
> > +
> > +   /*
> > +* If node->properties non-empty then properties were added
> > +* to this node either by different overlay that has not
> > +* yet been removed, or by a non-overlay mechanism.
> > +*/
> > +   if (node->properties)
> > +   pr_err("ERROR: %s() overlay node %pOF contains 
> > unexpected properties\n",
> > +  __func__, node);
> > +   }
> > +
> > property_list_free(node->properties);
> > property_list_free(node->deadprops);
> >
> > @@ -434,6 +453,16 @@ struct device_node *__of_node_dup(const struct 
> > device_node *np,
> >
> >  static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
> >  {
> > +   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > +   of_node_check_flag(ce->np, OF_OVERLAY)) {
> > +   if (kref_read(>np->kobj.kref) > 1) {
> > +   pr_err("ERROR: memory leak - destroy cset entry: 
> > attach overlay node %pOF expected refcount 1 instead of %d.  of_node_get() 
> > / of_node_put() are unbalanced for this node.\n",
> > +  ce->np, kref_read(>np->kobj.kref));
>
> Still testing as much as I have time to do.
>
> I'm hitting this error message once when removing an overlay that adds
> several child nodes.  The only node I get the message for was a node
> that added a fixed-clock (the other nodes didn't trigger the error).
> Then even if I edited all the rest of the overlay DTS and removed all
> other child nodes and all references to the clock from other nodes, I
> still got the error.
>
> Removing dtbo: 1-socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dtb
> [   72.032270] OF: ERROR: memory leak - destroy cset entry: attach
> overlay node /soc/base_fpga_region/clk_0 expected refcount 1 instead
> of 2.  of_node_get() / of_node_put() are unbalanced for this node.

Update: with some helpful offline debug patches from Frank, I was able
to find the source of the of_node_get/put unbalance.  The fixed-rate
clock driver calls of_clk_add_provider() when probed but never calls
of_clk_del_provider()

This patchset quite likely will uncover other of_node_get/put
unbalances around the kernel.

Alan

>
> Here's the very stripped down overlay:
>
> /dts-v1/;
> /plugin/;
> / {
> fragment@0 {
> target-path = "/soc/base_fpga_region";
> #address-cells = <1>;
> #size-cells = <1>;
>
> __overlay__ {
> external-fpga-config;
>
> #address-cells = <1>;
> #size-cells = <1>;
>
> clk_0: clk_0 {
> compatible = "fixed-clock";
>

Regression: compile error on mpc83xx

2018-10-18 Thread Radu Rendec

Hi everyone,

I'm getting the following compile error for mpc83xx in v4.9.115:

arch/powerpc/sysdev/built-in.o: In function `fsl_of_msi_probe':
fsl_msi.c:(.text+0x1548): undefined reference to `fsl_mpic_primary_get_version'

This seems to have been fixed in v3.12-rc1 by commit df1024ad8728. Then,
somewhere between v4.0 and v4.1 the following two commits changed the
arch/powerpc/include/asm/mpic.h file and that function again:
5e86bfde9cd9
807d38b73b63

After those two commits, it seems we're back to square one. I haven't
tested, but it looks like there haven't been any other changes to that
file up to the latest master in Linus' tree, so I believe the problem is
still there in the latest kernel.

Should commit df1024ad8728 be reapplied?

Thanks,
Radu Rendec

Re: [PATCH v4 09/18] of: overlay: validate overlay properties #address-cells and #size-cells

2018-10-18 Thread Frank Rowand

On 10/18/18 11:13, Rob Herring wrote:
> On Mon, Oct 15, 2018 at 07:37:29PM -0700, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> If overlay properties #address-cells or #size-cells are already in
>> the live devicetree for any given node, then the values in the
>> overlay must match the values in the live tree.
>>
>> If the properties are already in the live tree then there is no
>> need to create a changeset entry to add them since they must
>> have the same value.  This reduces the memory used by the
>> changeset and eliminates a possible memory leak.  This is
>> verified by 12 fewer warnings during the devicetree unittest,
>> as the possible memory leak warnings about #address-cells and
> 
> Still missing the rest...
> 
> And what about my other comments too?

Apologies, paper bag.  Will fix all.

-Frank

> 
>>
>> Signed-off-by: Frank Rowand 
>> ---
>> Changes since v3:
>>   - for errors of an overlay changing the value of #size-cells or
>> #address-cells, return -EINVAL so that overlay apply will fail
>>   - for errors of an overlay changing the value of #size-cells or
>> #address-cells, make the message more direct.
>> Old message:
>>   OF: overlay: ERROR: overlay and/or live tree #size-cells invalid in 
>> node /soc/base_fpga_region
>> New message:
>>   OF: overlay: ERROR: changing value of 
>> /soc/base_fpga_region/#size-cells not allowed
>>
>>  drivers/of/overlay.c | 42 +++---
>>  1 file changed, 39 insertions(+), 3 deletions(-)
>

Re: [PATCH v4 04/18] powerpc/pseries: add of_node_put() in dlpar_detach_node()

2018-10-18 Thread Frank Rowand

On 10/18/18 10:09, Rob Herring wrote:
> On Mon, Oct 15, 2018 at 07:37:24PM -0700, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> "of: overlay: add missing of_node_get() in __of_attach_node_sysfs"
>> added a missing of_node_get() to __of_attach_node_sysfs().  This
>> results in a refcount imbalance for nodes attached with
>> dlpar_attach_node().  The calling sequence from dlpar_attach_node()
>> to __of_attach_node_sysfs() is:
>>
>>dlpar_attach_node()
>>   of_attach_node()
>>  __of_attach_node_sysfs()
> 
> IIRC, there's a long standing item in the todo (Grant's) to convert the 
> open coded dlpar code. Maybe you want to do that first?

I'd like to avoid extra delays to getting the current (with necesary
fixes) series accepted because the series is rather intrusive and
could have conflicts with other patches.

I'm also worried that I don't have access to any of the systems that
use the dynamic overlay code, and I don't have any way to test the
changes.

Can we encourage the users of this code to convert the open coded
dlpar code?


>>
>> Signed-off-by: Frank Rowand 
>> ---
>>
>> * UNTESTED.  I need people with the affected PowerPC systems
>> *(systems that dynamically allocate and deallocate
>> *devicetree nodes) to test this patch.
> 
> Can't we write a test case that does the same thing?

I could write a simplistic test case, but I'm concerned that
simplistic is not anywhere near sufficient.  And my test case
would reflect the same assumptions I already have when I
wrote this patch.


>>
>>  arch/powerpc/platforms/pseries/dlpar.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
>> b/arch/powerpc/platforms/pseries/dlpar.c
>> index a0b20c03f078..e3010b14aea5 100644
>> --- a/arch/powerpc/platforms/pseries/dlpar.c
>> +++ b/arch/powerpc/platforms/pseries/dlpar.c
>> @@ -272,6 +272,8 @@ int dlpar_detach_node(struct device_node *dn)
>>  if (rc)
>>  return rc;
>>  
>> +of_node_put(dn);
>> +
>>  return 0;
>>  }
>>  
>> -- 
>> Frank Rowand 
>>
>

Re: [PATCH v4 02/18] of: overlay: add missing of_node_put() after add new node to changeset

2018-10-18 Thread Frank Rowand

On 10/18/18 10:05, Rob Herring wrote:
> On Mon, Oct 15, 2018 at 07:37:22PM -0700, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> The refcount of a newly added overlay node decrements to one
>> (instead of zero) when the overlay changeset is destroyed.  This
>> change will cause the final decrement be to zero.
>>
>> After applying this patch, new validation warnings will be
>> reported from the devicetree unittest during boot due to
>> a pre-existing devicetree bug.  The warnings will be similar to:
>>
>>   OF: ERROR: memory leak of_node_release() overlay node 
>> /testcase-data/overlay-node/test-bus/test-unittest4 before free overlay 
>> changeset
> 
> Same comment on formatting.

OK.


>>
>> This pre-existing devicetree bug will also trigger a WARN_ONCE() from
>> refcount_sub_and_test_checked() when an overlay changeset is
>> destroyed without having first been applied.  This scenario occurs
>> when an error in the overlay is detected during the overlay changeset
>> creation:
>>
>>   WARNING: CPU: 0 PID: 1 at lib/refcount.c:187 
>> refcount_sub_and_test_checked+0xa8/0xbc
>>   refcount_t: underflow; use-after-free.
>>
>>   (unwind_backtrace) from (show_stack+0x10/0x14)
>>   (show_stack) from (dump_stack+0x6c/0x8c)
>>   (dump_stack) from (__warn+0xdc/0x104)
>>   (__warn) from (warn_slowpath_fmt+0x44/0x6c)
>>   (warn_slowpath_fmt) from (refcount_sub_and_test_checked+0xa8/0xbc)
>>   (refcount_sub_and_test_checked) from (kobject_put+0x24/0x208)
>>   (kobject_put) from (of_changeset_destroy+0x2c/0xb4)
>>   (of_changeset_destroy) from (free_overlay_changeset+0x1c/0x9c)
>>   (free_overlay_changeset) from (of_overlay_remove+0x284/0x2cc)
>>   (of_overlay_remove) from 
>> (of_unittest_apply_revert_overlay_check.constprop.4+0xf8/0x1e8)
>>   (of_unittest_apply_revert_overlay_check.constprop.4) from 
>> (of_unittest_overlay+0x960/0xed8)
>>   (of_unittest_overlay) from (of_unittest+0x1cc4/0x2138)
>>   (of_unittest) from (do_one_initcall+0x4c/0x28c)
>>   (do_one_initcall) from (kernel_init_freeable+0x29c/0x378)
>>   (kernel_init_freeable) from (kernel_init+0x8/0x110)
>>   (kernel_init) from (ret_from_fork+0x14/0x2c)
>>
>> Signed-off-by: Frank Rowand 
>> ---
>>  drivers/of/overlay.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
>> index 1176cb4b6e4e..32cfee68f2e3 100644
>> --- a/drivers/of/overlay.c
>> +++ b/drivers/of/overlay.c
>> @@ -379,7 +379,9 @@ static int add_changeset_node(struct overlay_changeset 
>> *ovcs,
>>  if (ret)
>>  return ret;
>>  
>> -return build_changeset_next_level(ovcs, tchild, node);
>> +ret = build_changeset_next_level(ovcs, tchild, node);
>> +of_node_put(tchild);
>> +return ret;
>>  }
>>  
>>  if (node->phandle && tchild->phandle)
>> -- 
>> Frank Rowand 
>>
>

Re: [PATCH v4 01/18] of: overlay: add tests to validate kfrees from overlay removal

2018-10-18 Thread Frank Rowand

On 10/18/18 10:03, Rob Herring wrote:
> On Mon, Oct 15, 2018 at 07:37:21PM -0700, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> Add checks:
>>   - attempted kfree due to refcount reaching zero before overlay
>> is removed
>>   - properties linked to an overlay node when the node is removed
>>   - node refcount > one during node removal in a changeset destroy,
>> if the node was created by the changeset
>>
>> After applying this patch, several validation warnings will be
>> reported from the devicetree unittest during boot due to
>> pre-existing devicetree bugs. The warnings will be similar to:
>>
>>   OF: ERROR: of_node_release() overlay node 
>> /testcase-data/overlay-node/test-bus/test-unittest11/test-unittest111 
>> contains unexpected properties
>>   OF: ERROR: memory leak - destroy cset entry: attach overlay node 
>> /testcase-data-2/substation@100/hvac-medium-2 expected refcount 1 instead of 
>> 2.  of_node_get() / of_node_put() are unbalanced for this node.
> 
> These messages could be formatted more consistently. Put the path either 
> at the beginning (after any prefix) or end. Beginning is more like a 
> compiler error. End puts what the problem is before it's off the edge of 
> the screen. 

The inconsistency makes the word flow more natural, but I agree that
consistency is more important.  I think I can make all the messages
say the problem first, then provide the path at the end.


>> Signed-off-by: Frank Rowand 
>> ---
>> Changes since v3:
>>   - Add expected value of refcount for destroy cset entry error.  Also
>> explain the cause of the error.
>>
>>  drivers/of/dynamic.c | 29 +
>>  drivers/of/overlay.c |  1 +
>>  include/linux/of.h   | 15 ++-
>>  3 files changed, 40 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index f4f8ed9b5454..24c97b7a050f 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -330,6 +330,25 @@ void of_node_release(struct kobject *kobj)
>>  if (!of_node_check_flag(node, OF_DYNAMIC))
>>  return;
>>  
>> +if (of_node_check_flag(node, OF_OVERLAY)) {
>> +
>> +if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) {
> 
> I worry the flags are getting unwieldy.

I considered that.  I think we are still ok, and I don't have a better
solution than adding flag values.  (I did have some Rube Goldberg
variations.)


> 
>> +/* premature refcount of zero, do not free memory */
>> +pr_err("ERROR: memory leak %s() overlay node %pOF 
>> before free overlay changeset\n",
>> +   __func__, node);
>> +return;
>> +}
>> +
>> +/*
>> + * If node->properties non-empty then properties were added
>> + * to this node either by different overlay that has not
>> + * yet been removed, or by a non-overlay mechanism.
>> + */
>> +if (node->properties)
>> +pr_err("ERROR: %s() overlay node %pOF contains 
>> unexpected properties\n",
>> +   __func__, node);
>> +}
>> +
>>  property_list_free(node->properties);
>>  property_list_free(node->deadprops);
>>  
>> @@ -434,6 +453,16 @@ struct device_node *__of_node_dup(const struct 
>> device_node *np,
>>  
>>  static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
>>  {
>> +if (ce->action == OF_RECONFIG_ATTACH_NODE &&
>> +of_node_check_flag(ce->np, OF_OVERLAY)) {
>> +if (kref_read(>np->kobj.kref) > 1) {
>> +pr_err("ERROR: memory leak - destroy cset entry: attach 
>> overlay node %pOF expected refcount 1 instead of %d.  of_node_get() / 
>> of_node_put() are unbalanced for this node.\n",
>> +   ce->np, kref_read(>np->kobj.kref));
>> +} else {
>> +of_node_set_flag(ce->np, OF_OVERLAY_FREE_CSET);
>> +}
>> +}
>> +
>>  of_node_put(ce->np);
>>  list_del(>node);
>>  kfree(ce);
>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
>> index eda57ef12fd0..1176cb4b6e4e 100644
>> --- a/drivers/of/overlay.c
>> +++ b/drivers/of/overlay.c
>> @@ -373,6 +373,7 @@ static int add_changeset_node(struct overlay_changeset 
>> *ovcs,
>>  return -ENOMEM;
>>  
>>  tchild->parent = target_node;
>> +of_node_set_flag(tchild, OF_OVERLAY);
>>  
>>  ret = of_changeset_attach_node(>cset, tchild);
>>  if (ret)
>> diff --git a/include/linux/of.h b/include/linux/of.h
>> index 4d25e4f952d9..aa1dafaec6ae 100644
>> --- a/include/linux/of.h
>> +++ b/include/linux/of.h
>> @@ -138,11 +138,16 @@ static inline void of_node_put(struct device_node 
>> *node) { }
>>  extern struct device_node *of_stdout;
>>  extern raw_spinlock_t devtree_lock;
>>  
>> -/* flag descriptions (need to be visible even when !CONFIG_OF)

Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-18 Thread Piotr Jaroszynski


On 10/18/18 11:05 AM, Alex Williamson wrote:

On Thu, 18 Oct 2018 10:37:46 -0700
Piotr Jaroszynski  wrote:


On 10/18/18 9:55 AM, Alex Williamson wrote:

On Thu, 18 Oct 2018 11:31:33 +1100
Alexey Kardashevskiy  wrote:
   

On 18/10/2018 08:52, Alex Williamson wrote:

On Wed, 17 Oct 2018 12:19:20 +1100
Alexey Kardashevskiy  wrote:
  

On 17/10/2018 06:08, Alex Williamson wrote:

On Mon, 15 Oct 2018 20:42:33 +1100
Alexey Kardashevskiy  wrote:

+
+   if (pdev->vendor == PCI_VENDOR_ID_IBM &&
+   pdev->device == 0x04ea) {
+   ret = vfio_pci_ibm_npu2_init(vdev);
+   if (ret) {
+   dev_warn(>pdev->dev,
+   "Failed to setup NVIDIA NV2 ATSD 
region\n");
+   goto disable_exit;
}


So the NPU is also actually owned by vfio-pci and assigned to the VM?


Yes. On a running system it looks like:

0007:00:00.0 Bridge: IBM Device 04ea (rev 01)
0007:00:00.1 Bridge: IBM Device 04ea (rev 01)
0007:00:01.0 Bridge: IBM Device 04ea (rev 01)
0007:00:01.1 Bridge: IBM Device 04ea (rev 01)
0007:00:02.0 Bridge: IBM Device 04ea (rev 01)
0007:00:02.1 Bridge: IBM Device 04ea (rev 01)
0035:00:00.0 PCI bridge: IBM Device 04c1
0035:01:00.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0035:02:04.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0035:02:05.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0035:02:0d.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
(rev a1
0035:04:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
(rev a1)
0035:05:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
(rev a1)

One "IBM Device" bridge represents one NVLink2, i.e. a piece of NPU.
They all and 3 GPUs go to the same IOMMU group and get passed through to
a guest.

The entire NPU does not have representation via sysfs as a whole though.


So the NPU is a bridge, but it uses a normal header type so vfio-pci
will bind to it?


An NPU is a NVLink bridge, it is not PCI in any sense. We (the host
powerpc firmware known as "skiboot" or "opal") have chosen to emulate a
virtual bridge per 1 NVLink on the firmware level. So for each physical
NPU there are 6 virtual bridges. So the NVIDIA driver does not need to
know much about NPUs.
  

And the ATSD register that we need on it is not
accessible through these PCI representations of the sub-pieces of the
NPU?  Thanks,


No, only via the device tree. The skiboot puts the ATSD register address
to the PHB's DT property called 'ibm,mmio-atsd' of these virtual bridges.


Ok, so the NPU is essential a virtual device already, mostly just a
stub.  But it seems that each NPU is associated to a specific GPU, how
is that association done?  In the use case here it seems like it's just
a vehicle to provide this ibm,mmio-atsd property to guest DT and the tgt
routing information to the GPU.  So if both of those were attached to
the GPU, there'd be no purpose in assigning the NPU other than it's in
the same IOMMU group with a type 0 header, so something needs to be
done with it.  If it's a virtual device, perhaps it could have a type 1
header so vfio wouldn't care about it, then we would only assign the
GPU with these extra properties, which seems easier for management
tools and users.  If the guest driver needs a visible NPU device, QEMU
could possibly emulate one to make the GPU association work
automatically.  Maybe this isn't really a problem, but I wonder if
you've looked up the management stack to see what tools need to know to
assign these NPU devices and whether specific configurations are
required to make the NPU to GPU association work.  Thanks,


I'm not that familiar with how this was originally set up, but note that
Alexey is just making it work exactly like baremetal does. The baremetal
GPU driver works as-is in the VM and expects the same properties in the
device-tree. Obviously it doesn't have to be that way, but there is
value in keeping it identical.

Another probably bigger point is that the NPU device also implements the
nvlink HW interface and is required for actually training and
maintaining the link up. The driver in the guest trains the links by
programming both the GPU end and the NPU end of each link so the NPU
device needs to be exposed to the guest.


Ok, so there is functionality in assigning the NPU device itself, it's
not just an attachment point for meta data.  But it still seems there
must be some association of NPU to GPU, the tgt address seems to pair
the NPU with a specific GPU, they're not simply a fungible set of NPUs
and GPUs.  Is that association explicit anywhere or is it related to
the topology or device numbering that needs to match between the host
and guest?  Thanks,


GPUs are linked to NPU devices through device tree properties, I think. 
Linux has a helper to look up linked NPU devices for a PCI device

Re: [PATCH v4 09/18] of: overlay: validate overlay properties #address-cells and #size-cells

2018-10-18 Thread Rob Herring

On Mon, Oct 15, 2018 at 07:37:29PM -0700, frowand.l...@gmail.com wrote:
> From: Frank Rowand 
> 
> If overlay properties #address-cells or #size-cells are already in
> the live devicetree for any given node, then the values in the
> overlay must match the values in the live tree.
> 
> If the properties are already in the live tree then there is no
> need to create a changeset entry to add them since they must
> have the same value.  This reduces the memory used by the
> changeset and eliminates a possible memory leak.  This is
> verified by 12 fewer warnings during the devicetree unittest,
> as the possible memory leak warnings about #address-cells and

Still missing the rest...

And what about my other comments too?

> 
> Signed-off-by: Frank Rowand 
> ---
> Changes since v3:
>   - for errors of an overlay changing the value of #size-cells or
> #address-cells, return -EINVAL so that overlay apply will fail
>   - for errors of an overlay changing the value of #size-cells or
> #address-cells, make the message more direct.
> Old message:
>   OF: overlay: ERROR: overlay and/or live tree #size-cells invalid in 
> node /soc/base_fpga_region
> New message:
>   OF: overlay: ERROR: changing value of /soc/base_fpga_region/#size-cells 
> not allowed
> 
>  drivers/of/overlay.c | 42 +++---
>  1 file changed, 39 insertions(+), 3 deletions(-)

Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-18 Thread Alex Williamson

On Thu, 18 Oct 2018 10:37:46 -0700
Piotr Jaroszynski  wrote:

> On 10/18/18 9:55 AM, Alex Williamson wrote:
> > On Thu, 18 Oct 2018 11:31:33 +1100
> > Alexey Kardashevskiy  wrote:
> >   
> >> On 18/10/2018 08:52, Alex Williamson wrote:  
> >>> On Wed, 17 Oct 2018 12:19:20 +1100
> >>> Alexey Kardashevskiy  wrote:
> >>>  
>  On 17/10/2018 06:08, Alex Williamson wrote:  
> > On Mon, 15 Oct 2018 20:42:33 +1100
> > Alexey Kardashevskiy  wrote:
> >> +
> >> +  if (pdev->vendor == PCI_VENDOR_ID_IBM &&
> >> +  pdev->device == 0x04ea) {
> >> +  ret = vfio_pci_ibm_npu2_init(vdev);
> >> +  if (ret) {
> >> +  dev_warn(>pdev->dev,
> >> +  "Failed to setup NVIDIA NV2 
> >> ATSD region\n");
> >> +  goto disable_exit;
> >>}  
> >
> > So the NPU is also actually owned by vfio-pci and assigned to the VM?  
> 
>  Yes. On a running system it looks like:
> 
>  0007:00:00.0 Bridge: IBM Device 04ea (rev 01)
>  0007:00:00.1 Bridge: IBM Device 04ea (rev 01)
>  0007:00:01.0 Bridge: IBM Device 04ea (rev 01)
>  0007:00:01.1 Bridge: IBM Device 04ea (rev 01)
>  0007:00:02.0 Bridge: IBM Device 04ea (rev 01)
>  0007:00:02.1 Bridge: IBM Device 04ea (rev 01)
>  0035:00:00.0 PCI bridge: IBM Device 04c1
>  0035:01:00.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>  0035:02:04.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>  0035:02:05.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>  0035:02:0d.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
>  0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
>  (rev a1
>  0035:04:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
>  (rev a1)
>  0035:05:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
>  (rev a1)
> 
>  One "IBM Device" bridge represents one NVLink2, i.e. a piece of NPU.
>  They all and 3 GPUs go to the same IOMMU group and get passed through to
>  a guest.
> 
>  The entire NPU does not have representation via sysfs as a whole though. 
>   
> >>>
> >>> So the NPU is a bridge, but it uses a normal header type so vfio-pci
> >>> will bind to it?  
> >>
> >> An NPU is a NVLink bridge, it is not PCI in any sense. We (the host
> >> powerpc firmware known as "skiboot" or "opal") have chosen to emulate a
> >> virtual bridge per 1 NVLink on the firmware level. So for each physical
> >> NPU there are 6 virtual bridges. So the NVIDIA driver does not need to
> >> know much about NPUs.
> >>  
> >>> And the ATSD register that we need on it is not
> >>> accessible through these PCI representations of the sub-pieces of the
> >>> NPU?  Thanks,  
> >>
> >> No, only via the device tree. The skiboot puts the ATSD register address
> >> to the PHB's DT property called 'ibm,mmio-atsd' of these virtual bridges.  
> > 
> > Ok, so the NPU is essential a virtual device already, mostly just a
> > stub.  But it seems that each NPU is associated to a specific GPU, how
> > is that association done?  In the use case here it seems like it's just
> > a vehicle to provide this ibm,mmio-atsd property to guest DT and the tgt
> > routing information to the GPU.  So if both of those were attached to
> > the GPU, there'd be no purpose in assigning the NPU other than it's in
> > the same IOMMU group with a type 0 header, so something needs to be
> > done with it.  If it's a virtual device, perhaps it could have a type 1
> > header so vfio wouldn't care about it, then we would only assign the
> > GPU with these extra properties, which seems easier for management
> > tools and users.  If the guest driver needs a visible NPU device, QEMU
> > could possibly emulate one to make the GPU association work
> > automatically.  Maybe this isn't really a problem, but I wonder if
> > you've looked up the management stack to see what tools need to know to
> > assign these NPU devices and whether specific configurations are
> > required to make the NPU to GPU association work.  Thanks,  
> 
> I'm not that familiar with how this was originally set up, but note that 
> Alexey is just making it work exactly like baremetal does. The baremetal 
> GPU driver works as-is in the VM and expects the same properties in the 
> device-tree. Obviously it doesn't have to be that way, but there is 
> value in keeping it identical.
> 
> Another probably bigger point is that the NPU device also implements the 
> nvlink HW interface and is required for actually training and 
> maintaining the link up. The driver in the guest trains the links by 
> programming both the GPU end and the NPU end of each link so the NPU 
> device needs to be exposed to the guest.

Ok, so there is functionality in assigning the NPU device itself, it's
not just an attachment point for meta

Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-18 Thread Piotr Jaroszynski


On 10/18/18 9:55 AM, Alex Williamson wrote:

On Thu, 18 Oct 2018 11:31:33 +1100
Alexey Kardashevskiy  wrote:


On 18/10/2018 08:52, Alex Williamson wrote:

On Wed, 17 Oct 2018 12:19:20 +1100
Alexey Kardashevskiy  wrote:
   

On 17/10/2018 06:08, Alex Williamson wrote:

On Mon, 15 Oct 2018 20:42:33 +1100
Alexey Kardashevskiy  wrote:
 

POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
pluggable PCIe devices but implement PCIe links for config space and MMIO.
In addition to that the GPUs are interconnected to each other and also
have direct links to the P9 CPU. The links are NVLink2 and provide direct
access to the system RAM for GPUs via NPU (an NVLink2 "proxy" on P9 chip).
These systems also support ATS (address translation services) which is
a part of the NVLink2 prototol. Such GPUs also share on-board RAM
(16GB in tested config) to the system via the same NVLink2 so a CPU has
cache-coherent access to a GPU RAM.

This exports GPU RAM to the userspace as a new PCI region. This
preregisters the new memory as device memory as it might be used for DMA.
This inserts pfns from the fault handler as the GPU memory is not onlined
until the NVIDIA driver is loaded and trained the links so doing this
earlier produces low level errors which we fence in the firmware so
it does not hurt the host system but still better to avoid.

This exports ATSD (Address Translation Shootdown) register of NPU which
allows the guest to invalidate TLB. The register conviniently occupies
a single 64k page. Since NPU maps the GPU memory, it has a "tgt" property
(which is an abbreviated host system bus address). This exports the "tgt"
as a capability so the guest can program it into the GPU so the GPU can
know how to route DMA trafic.


I'm not really following what "tgt" is and why it's needed.  Is the GPU
memory here different than the GPU RAM region above?  Why does the user
need the host system bus address of this "tgt" thing?  Are we not able
to relocate it in guest physical address space, does this shootdown
only work in the host physical address space and therefore we need this
offset?  Please explain, I'm confused.



This "tgt" is made of:
- "memory select" (bits 45, 46)
- "group select" (bits 43, 44)
- "chip select" (bit 42)
- chip internal address (bits 0..41)

These are internal to GPU and this is where GPU RAM is mapped into the
GPU's real space, this fits 46 bits.

On POWER9 CPU the bits are different and higher so the same memory is
mapped higher on P9 CPU. Just because we can map it higher, I guess.

So it is not exactly the address but this provides the exact physical
location of the memory.

We have a group of 3 interconnected GPUs, they got their own
memory/group/chip numbers. The GPUs use ATS service to translate
userspace to physical (host or guest) addresses. Now a GPU needs to know
which specific link to use for a specific physical address, in other
words what this physical address belongs to - a CPU or one of GPUs. This
is when "tgt" is used by the GPU hardware.


Clear as mud ;)


/me is sad. I hope Piotr explained it better...


It's starting to be a bit more clear, and Piotr anticipated the
security questions I was mulling over.


Great.




So tgt, provided by the npu2 capability of the ATSD
region of the NPU tells the GPU (a completely separate device) how to
route it its own RAM via its NVLink interface?  How can one tgt
indicate the routing for multiple interfaces?


This NVLink DMA is using direct host physical addresses (no IOMMU, no
filtering) which come from ATS. So unless we tell the GPU its own
address range on the host CPU, it will route trafic via CPU. And the
driver can also discover the NVLink topology and tell each GPU physical
addresses of peer GPUs.


I think this is a key point too, the tgt seems to only identify the
routing for the GPU local RAM, but the guest driver is able to
conglomerate this information so each GPU knows how to get directly to
the other GPUs.


Yes.




A GPU could run all the DMA trafic via the system bus indeed, just not
as fast.

I am also struggling here and adding an Nvidia person in cc: (I should
have done that when I posted the patches, my bad) to correct when/if I
am wrong.


  
  

For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
know LPID (a logical partition ID or a KVM guest hardware ID in other
words) and PID (a memory context ID of an userspace process, not to be
confused with a linux pid). This assigns a GPU to LPID in the NPU and
this is why this adds a listener for KVM on an IOMMU group. A PID comes
via NVLink from a GPU and NPU uses a PID wildcard to pass it through.

This requires coherent memory and ATSD to be available on the host as
the GPU vendor only supports configurations with both features enabled
and other configurations are known not to work. Because of this and
because of the ways the features are advertised to the host system
(which is a device tree with very platform specific properties),
this

Re: [PATCH v4 04/18] powerpc/pseries: add of_node_put() in dlpar_detach_node()

2018-10-18 Thread Rob Herring

On Mon, Oct 15, 2018 at 07:37:24PM -0700, frowand.l...@gmail.com wrote:
> From: Frank Rowand 
> 
> "of: overlay: add missing of_node_get() in __of_attach_node_sysfs"
> added a missing of_node_get() to __of_attach_node_sysfs().  This
> results in a refcount imbalance for nodes attached with
> dlpar_attach_node().  The calling sequence from dlpar_attach_node()
> to __of_attach_node_sysfs() is:
> 
>dlpar_attach_node()
>   of_attach_node()
>  __of_attach_node_sysfs()

IIRC, there's a long standing item in the todo (Grant's) to convert the 
open coded dlpar code. Maybe you want to do that first?

> 
> Signed-off-by: Frank Rowand 
> ---
> 
> * UNTESTED.  I need people with the affected PowerPC systems
> *(systems that dynamically allocate and deallocate
> *devicetree nodes) to test this patch.

Can't we write a test case that does the same thing?

> 
>  arch/powerpc/platforms/pseries/dlpar.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
> b/arch/powerpc/platforms/pseries/dlpar.c
> index a0b20c03f078..e3010b14aea5 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -272,6 +272,8 @@ int dlpar_detach_node(struct device_node *dn)
>   if (rc)
>   return rc;
>  
> + of_node_put(dn);
> +
>   return 0;
>  }
>  
> -- 
> Frank Rowand 
>

Re: [PATCH v4 02/18] of: overlay: add missing of_node_put() after add new node to changeset

2018-10-18 Thread Rob Herring

On Mon, Oct 15, 2018 at 07:37:22PM -0700, frowand.l...@gmail.com wrote:
> From: Frank Rowand 
> 
> The refcount of a newly added overlay node decrements to one
> (instead of zero) when the overlay changeset is destroyed.  This
> change will cause the final decrement be to zero.
> 
> After applying this patch, new validation warnings will be
> reported from the devicetree unittest during boot due to
> a pre-existing devicetree bug.  The warnings will be similar to:
> 
>   OF: ERROR: memory leak of_node_release() overlay node 
> /testcase-data/overlay-node/test-bus/test-unittest4 before free overlay 
> changeset

Same comment on formatting.

> 
> This pre-existing devicetree bug will also trigger a WARN_ONCE() from
> refcount_sub_and_test_checked() when an overlay changeset is
> destroyed without having first been applied.  This scenario occurs
> when an error in the overlay is detected during the overlay changeset
> creation:
> 
>   WARNING: CPU: 0 PID: 1 at lib/refcount.c:187 
> refcount_sub_and_test_checked+0xa8/0xbc
>   refcount_t: underflow; use-after-free.
> 
>   (unwind_backtrace) from (show_stack+0x10/0x14)
>   (show_stack) from (dump_stack+0x6c/0x8c)
>   (dump_stack) from (__warn+0xdc/0x104)
>   (__warn) from (warn_slowpath_fmt+0x44/0x6c)
>   (warn_slowpath_fmt) from (refcount_sub_and_test_checked+0xa8/0xbc)
>   (refcount_sub_and_test_checked) from (kobject_put+0x24/0x208)
>   (kobject_put) from (of_changeset_destroy+0x2c/0xb4)
>   (of_changeset_destroy) from (free_overlay_changeset+0x1c/0x9c)
>   (free_overlay_changeset) from (of_overlay_remove+0x284/0x2cc)
>   (of_overlay_remove) from 
> (of_unittest_apply_revert_overlay_check.constprop.4+0xf8/0x1e8)
>   (of_unittest_apply_revert_overlay_check.constprop.4) from 
> (of_unittest_overlay+0x960/0xed8)
>   (of_unittest_overlay) from (of_unittest+0x1cc4/0x2138)
>   (of_unittest) from (do_one_initcall+0x4c/0x28c)
>   (do_one_initcall) from (kernel_init_freeable+0x29c/0x378)
>   (kernel_init_freeable) from (kernel_init+0x8/0x110)
>   (kernel_init) from (ret_from_fork+0x14/0x2c)
> 
> Signed-off-by: Frank Rowand 
> ---
>  drivers/of/overlay.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 1176cb4b6e4e..32cfee68f2e3 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -379,7 +379,9 @@ static int add_changeset_node(struct overlay_changeset 
> *ovcs,
>   if (ret)
>   return ret;
>  
> - return build_changeset_next_level(ovcs, tchild, node);
> + ret = build_changeset_next_level(ovcs, tchild, node);
> + of_node_put(tchild);
> + return ret;
>   }
>  
>   if (node->phandle && tchild->phandle)
> -- 
> Frank Rowand 
>

Re: [PATCH v4 01/18] of: overlay: add tests to validate kfrees from overlay removal

2018-10-18 Thread Rob Herring

On Mon, Oct 15, 2018 at 07:37:21PM -0700, frowand.l...@gmail.com wrote:
> From: Frank Rowand 
> 
> Add checks:
>   - attempted kfree due to refcount reaching zero before overlay
> is removed
>   - properties linked to an overlay node when the node is removed
>   - node refcount > one during node removal in a changeset destroy,
> if the node was created by the changeset
> 
> After applying this patch, several validation warnings will be
> reported from the devicetree unittest during boot due to
> pre-existing devicetree bugs. The warnings will be similar to:
> 
>   OF: ERROR: of_node_release() overlay node 
> /testcase-data/overlay-node/test-bus/test-unittest11/test-unittest111 
> contains unexpected properties
>   OF: ERROR: memory leak - destroy cset entry: attach overlay node 
> /testcase-data-2/substation@100/hvac-medium-2 expected refcount 1 instead of 
> 2.  of_node_get() / of_node_put() are unbalanced for this node.

These messages could be formatted more consistently. Put the path either 
at the beginning (after any prefix) or end. Beginning is more like a 
compiler error. End puts what the problem is before it's off the edge of 
the screen. 

> Signed-off-by: Frank Rowand 
> ---
> Changes since v3:
>   - Add expected value of refcount for destroy cset entry error.  Also
> explain the cause of the error.
> 
>  drivers/of/dynamic.c | 29 +
>  drivers/of/overlay.c |  1 +
>  include/linux/of.h   | 15 ++-
>  3 files changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index f4f8ed9b5454..24c97b7a050f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -330,6 +330,25 @@ void of_node_release(struct kobject *kobj)
>   if (!of_node_check_flag(node, OF_DYNAMIC))
>   return;
>  
> + if (of_node_check_flag(node, OF_OVERLAY)) {
> +
> + if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) {

I worry the flags are getting unwieldy.

> + /* premature refcount of zero, do not free memory */
> + pr_err("ERROR: memory leak %s() overlay node %pOF 
> before free overlay changeset\n",
> +__func__, node);
> + return;
> + }
> +
> + /*
> +  * If node->properties non-empty then properties were added
> +  * to this node either by different overlay that has not
> +  * yet been removed, or by a non-overlay mechanism.
> +  */
> + if (node->properties)
> + pr_err("ERROR: %s() overlay node %pOF contains 
> unexpected properties\n",
> +__func__, node);
> + }
> +
>   property_list_free(node->properties);
>   property_list_free(node->deadprops);
>  
> @@ -434,6 +453,16 @@ struct device_node *__of_node_dup(const struct 
> device_node *np,
>  
>  static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
>  {
> + if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> + of_node_check_flag(ce->np, OF_OVERLAY)) {
> + if (kref_read(>np->kobj.kref) > 1) {
> + pr_err("ERROR: memory leak - destroy cset entry: attach 
> overlay node %pOF expected refcount 1 instead of %d.  of_node_get() / 
> of_node_put() are unbalanced for this node.\n",
> +ce->np, kref_read(>np->kobj.kref));
> + } else {
> + of_node_set_flag(ce->np, OF_OVERLAY_FREE_CSET);
> + }
> + }
> +
>   of_node_put(ce->np);
>   list_del(>node);
>   kfree(ce);
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index eda57ef12fd0..1176cb4b6e4e 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -373,6 +373,7 @@ static int add_changeset_node(struct overlay_changeset 
> *ovcs,
>   return -ENOMEM;
>  
>   tchild->parent = target_node;
> + of_node_set_flag(tchild, OF_OVERLAY);
>  
>   ret = of_changeset_attach_node(>cset, tchild);
>   if (ret)
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 4d25e4f952d9..aa1dafaec6ae 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -138,11 +138,16 @@ static inline void of_node_put(struct device_node 
> *node) { }
>  extern struct device_node *of_stdout;
>  extern raw_spinlock_t devtree_lock;
>  
> -/* flag descriptions (need to be visible even when !CONFIG_OF) */
> -#define OF_DYNAMIC   1 /* node and properties were allocated via kmalloc */
> -#define OF_DETACHED  2 /* node has been detached from the device tree */
> -#define OF_POPULATED 3 /* device already created for the node */
> -#define OF_POPULATED_BUS 4 /* of_platform_populate recursed to children 
> of this node */
> +/*
> + * struct device_node flag descriptions
> + * (need to be visible even when !CONFIG_OF)
> + */
> +#define OF_DYNAMIC   1 /*

Re: [PATCH kernel 3/3] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

2018-10-18 Thread Alex Williamson

On Thu, 18 Oct 2018 11:31:33 +1100
Alexey Kardashevskiy  wrote:

> On 18/10/2018 08:52, Alex Williamson wrote:
> > On Wed, 17 Oct 2018 12:19:20 +1100
> > Alexey Kardashevskiy  wrote:
> >   
> >> On 17/10/2018 06:08, Alex Williamson wrote:  
> >>> On Mon, 15 Oct 2018 20:42:33 +1100
> >>> Alexey Kardashevskiy  wrote:
> >>> 
>  POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
>  pluggable PCIe devices but implement PCIe links for config space and 
>  MMIO.
>  In addition to that the GPUs are interconnected to each other and also
>  have direct links to the P9 CPU. The links are NVLink2 and provide direct
>  access to the system RAM for GPUs via NPU (an NVLink2 "proxy" on P9 
>  chip).
>  These systems also support ATS (address translation services) which is
>  a part of the NVLink2 prototol. Such GPUs also share on-board RAM
>  (16GB in tested config) to the system via the same NVLink2 so a CPU has
>  cache-coherent access to a GPU RAM.
> 
>  This exports GPU RAM to the userspace as a new PCI region. This
>  preregisters the new memory as device memory as it might be used for DMA.
>  This inserts pfns from the fault handler as the GPU memory is not onlined
>  until the NVIDIA driver is loaded and trained the links so doing this
>  earlier produces low level errors which we fence in the firmware so
>  it does not hurt the host system but still better to avoid.
> 
>  This exports ATSD (Address Translation Shootdown) register of NPU which
>  allows the guest to invalidate TLB. The register conviniently occupies
>  a single 64k page. Since NPU maps the GPU memory, it has a "tgt" property
>  (which is an abbreviated host system bus address). This exports the "tgt"
>  as a capability so the guest can program it into the GPU so the GPU can
>  know how to route DMA trafic.
> >>>
> >>> I'm not really following what "tgt" is and why it's needed.  Is the GPU
> >>> memory here different than the GPU RAM region above?  Why does the user
> >>> need the host system bus address of this "tgt" thing?  Are we not able
> >>> to relocate it in guest physical address space, does this shootdown
> >>> only work in the host physical address space and therefore we need this
> >>> offset?  Please explain, I'm confused.
> >>
> >>
> >> This "tgt" is made of:
> >> - "memory select" (bits 45, 46)
> >> - "group select" (bits 43, 44)
> >> - "chip select" (bit 42)
> >> - chip internal address (bits 0..41)
> >>
> >> These are internal to GPU and this is where GPU RAM is mapped into the
> >> GPU's real space, this fits 46 bits.
> >>
> >> On POWER9 CPU the bits are different and higher so the same memory is
> >> mapped higher on P9 CPU. Just because we can map it higher, I guess.
> >>
> >> So it is not exactly the address but this provides the exact physical
> >> location of the memory.
> >>
> >> We have a group of 3 interconnected GPUs, they got their own
> >> memory/group/chip numbers. The GPUs use ATS service to translate
> >> userspace to physical (host or guest) addresses. Now a GPU needs to know
> >> which specific link to use for a specific physical address, in other
> >> words what this physical address belongs to - a CPU or one of GPUs. This
> >> is when "tgt" is used by the GPU hardware.  
> > 
> > Clear as mud ;)   
> 
> /me is sad. I hope Piotr explained it better...

It's starting to be a bit more clear, and Piotr anticipated the
security questions I was mulling over.

> > So tgt, provided by the npu2 capability of the ATSD
> > region of the NPU tells the GPU (a completely separate device) how to
> > route it its own RAM via its NVLink interface?  How can one tgt
> > indicate the routing for multiple interfaces?  
> 
> This NVLink DMA is using direct host physical addresses (no IOMMU, no
> filtering) which come from ATS. So unless we tell the GPU its own
> address range on the host CPU, it will route trafic via CPU. And the
> driver can also discover the NVLink topology and tell each GPU physical
> addresses of peer GPUs.

I think this is a key point too, the tgt seems to only identify the
routing for the GPU local RAM, but the guest driver is able to
conglomerate this information so each GPU knows how to get directly to
the other GPUs.

> >> A GPU could run all the DMA trafic via the system bus indeed, just not
> >> as fast.
> >>
> >> I am also struggling here and adding an Nvidia person in cc: (I should
> >> have done that when I posted the patches, my bad) to correct when/if I
> >> am wrong.
> >>
> >>
> >>  
> >>>  
>  For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
>  know LPID (a logical partition ID or a KVM guest hardware ID in other
>  words) and PID (a memory context ID of an userspace process, not to be
>  confused with a linux pid). This assigns a GPU to LPID in the NPU and
>  this is why this adds a listener for KVM on an IOMMU group. A PID comes

[PATCH] powerpc/8xx: Add DT node for using the SEC engine of the MPC885

2018-10-18 Thread Christophe Leroy

The MPC885 has SEC engine version 1.2 with the following details:
- Number of Crypto channels: 1
- Exec Units: DEU, MDEU and AESU
- Available descriptors: 00010, 00100, 00110, 01000, 11000, 11010

It is also supposed to have descriptor 0, but it doesn't work
properly so we keep it out for the moment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/boot/dts/mpc885ads.dts | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/mpc885ads.dts 
b/arch/powerpc/boot/dts/mpc885ads.dts
index 5b037f51741d..3aa300afbbca 100644
--- a/arch/powerpc/boot/dts/mpc885ads.dts
+++ b/arch/powerpc/boot/dts/mpc885ads.dts
@@ -72,7 +72,7 @@
#address-cells = <1>;
#size-cells = <1>;
device_type = "soc";
-   ranges = <0x0 0xff00 0x4000>;
+   ranges = <0x0 0xff00 0x28000>;
bus-frequency = <0>;
 
// Temporary -- will go away once kernel uses ranges for 
get_immrbase().
@@ -224,6 +224,17 @@
#size-cells = <0>;
};
};
+
+   crypto@2 {
+   compatible = "fsl,sec1.2", "fsl,sec1.0";
+   reg = <0x2 0x8000>;
+   interrupts = <1 1>;
+   interrupt-parent = <>;
+   fsl,num-channels = <1>;
+   fsl,channel-fifo-len = <24>;
+   fsl,exec-units-mask = <0x4c>;
+   fsl,descriptor-types-mask = <0x05000154>;
+   };
};
 
chosen {
-- 
2.13.3

[PATCH] powerpc/mm: Make pte_pgprot return all pte bits

2018-10-18 Thread Michael Ellerman

From: "Aneesh Kumar K.V" 

Other archs do the same and instead of adding required pte bits (which
got masked out) in __ioremap_at(), make sure we filter only pfn bits
out.

Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code")
Reviewed-by: Christophe Leroy 
Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h   |  6 --
 arch/powerpc/include/asm/book3s/64/pgtable.h   |  8 
 arch/powerpc/include/asm/nohash/32/pte-40x.h   |  5 -
 arch/powerpc/include/asm/nohash/32/pte-44x.h   |  5 -
 arch/powerpc/include/asm/nohash/32/pte-8xx.h   |  5 -
 arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h |  5 -
 arch/powerpc/include/asm/nohash/pgtable.h  |  1 -
 arch/powerpc/include/asm/nohash/pte-book3e.h   |  5 -
 arch/powerpc/include/asm/pgtable.h | 10 ++
 9 files changed, 10 insertions(+), 40 deletions(-)

Resend so patchwork picks it up.

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 0fbd4c642b51..e61dd3ae5bc0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -48,11 +48,6 @@ static inline bool pte_user(pte_t pte)
 #define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HASHPTE | _PAGE_DIRTY | \
 _PAGE_ACCESSED | _PAGE_SPECIAL)
 
-/* Mask of bits returned by pte_pgprot() */
-#define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \
-_PAGE_WRITETHRU | _PAGE_USER | _PAGE_ACCESSED | \
-_PAGE_RW | _PAGE_DIRTY)
-
 /*
  * We define 2 sets of base prot bits, one for basic pages (ie,
  * cacheable kernel and user pages) and one for non cacheable
@@ -396,7 +391,6 @@ static inline int pte_young(pte_t pte)  { 
return !!(pte_val(pte) & _PAGE_ACCESSE
 static inline int pte_special(pte_t pte)   { return !!(pte_val(pte) & 
_PAGE_SPECIAL); }
 static inline int pte_none(pte_t pte)  { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
 static inline bool pte_exec(pte_t pte) { return true; }
-static inline pgprot_t pte_pgprot(pte_t pte)   { return __pgprot(pte_val(pte) 
& PAGE_PROT_BITS); }
 
 static inline int pte_present(pte_t pte)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 9db2b8eba61d..d33421648d39 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -128,13 +128,6 @@
 
 #define H_PTE_PKEY  (H_PTE_PKEY_BIT0 | H_PTE_PKEY_BIT1 | H_PTE_PKEY_BIT2 | \
 H_PTE_PKEY_BIT3 | H_PTE_PKEY_BIT4)
-/*
- * Mask of bits returned by pte_pgprot()
- */
-#define PAGE_PROT_BITS  (_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT | \
-H_PAGE_4K_PFN | _PAGE_PRIVILEGED | _PAGE_ACCESSED | \
-_PAGE_READ | _PAGE_WRITE |  _PAGE_DIRTY | _PAGE_EXEC | 
\
-_PAGE_SOFT_DIRTY | H_PTE_PKEY)
 /*
  * We define 2 sets of base prot bits, one for basic pages (ie,
  * cacheable kernel and user pages) and one for non cacheable
@@ -496,7 +489,6 @@ static inline bool pte_exec(pte_t pte)
return !!(pte_raw(pte) & cpu_to_be64(_PAGE_EXEC));
 }
 
-static inline pgprot_t pte_pgprot(pte_t pte)   { return __pgprot(pte_val(pte) 
& PAGE_PROT_BITS); }
 
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 static inline bool pte_soft_dirty(pte_t pte)
diff --git a/arch/powerpc/include/asm/nohash/32/pte-40x.h 
b/arch/powerpc/include/asm/nohash/32/pte-40x.h
index 7a8b3c94592f..661f4599f2fc 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-40x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-40x.h
@@ -73,11 +73,6 @@
 /* Until my rework is finished, 40x still needs atomic PTE updates */
 #define PTE_ATOMIC_UPDATES 1
 
-/* Mask of bits returned by pte_pgprot() */
-#define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_NO_CACHE | \
-_PAGE_WRITETHRU | _PAGE_USER | _PAGE_ACCESSED | \
-_PAGE_RW | _PAGE_HWWRITE | _PAGE_DIRTY | _PAGE_EXEC)
-
 #define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED)
 #define _PAGE_BASE (_PAGE_BASE_NC)
 
diff --git a/arch/powerpc/include/asm/nohash/32/pte-44x.h 
b/arch/powerpc/include/asm/nohash/32/pte-44x.h
index 8d6b268a986f..78bc304f750e 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-44x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-44x.h
@@ -93,11 +93,6 @@
 #define _PAGE_KERNEL_RW(_PAGE_DIRTY | _PAGE_RW)
 #define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
 
-/* Mask of bits returned by pte_pgprot() */
-#define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \
-_PAGE_WRITETHRU | _PAGE_USER | _PAGE_ACCESSED | \
-_PAGE_RW | _PAGE_DIRTY | _PAGE_EXEC)
-
 /* TODO: Add large page lowmem mapping support */
 #define

Re: [PATCH] powerpc: Don't print kernel instructions in show_user_instructions()

2018-10-18 Thread Michael Ellerman

Christophe LEROY  writes:
> Le 05/10/2018 à 15:21, Michael Ellerman a écrit :
>> Recently we implemented show_user_instructions() which dumps the code
>> around the NIP when a user space process dies with an unhandled
>> signal. This was modelled on the x86 code, and we even went so far as
>> to implement the exact same bug, namely that if the user process
>> crashed with its NIP pointing into the kernel we will dump kernel text
>> to dmesg. eg:
>> 
>>bad-bctr[2996]: segfault (11) at c001 nip c001 lr 
>> 12d0b0894 code 1
>>bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 
>> 38810020 fb810050 7f8802a6
>>bad-bctr[2996]: code: 3860001c f8010080 48242371 6000 <7c7b1b79> 
>> 4082002c e8010080 eb610048
>> 
>> This was discovered on x86 by Jann Horn and fixed in commit
>> 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode 
>> RIP").
>> 
>> Fix it by checking the adjusted NIP value (pc) and number of
>> instructions against USER_DS, and bail if we fail the check, eg:
>> 
>>bad-bctr[2969]: segfault (11) at c001 nip c001 lr 
>> 107930894 code 1
>>bad-bctr[2969]: Bad NIP, not dumping instructions.
>> 
>> Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()")
>> Signed-off-by: Michael Ellerman 
>> ---
>>   arch/powerpc/kernel/process.c | 10 ++
>>   1 file changed, 10 insertions(+)
>> 
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index 913c5725cdb2..bb6ac471a784 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1306,6 +1306,16 @@ void show_user_instructions(struct pt_regs *regs)
>>   
>>  pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int));
>>   
>> +/*
>> + * Make sure the NIP points at userspace, not kernel text/data or
>> + * elsewhere.
>> + */
>> +if (!__access_ok(pc, instructions_to_print * sizeof(int), USER_DS)) {
>> +pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
>> +current->comm, current->pid);
>> +return;
>> +}
>> +
>
> Is there any reason for not using access_ok() here ?

I wanted to check against USER_DS explicitly. But maybe that was
over-paranoid of me.

cheers

Re: [PATCH 05/36] dt-bindings: arm: renesas: Move 'renesas,prr' binding to its own doc

2018-10-18 Thread Simon Horman

On Mon, Oct 08, 2018 at 09:59:19AM -0500, Rob Herring wrote:
> On Mon, Oct 8, 2018 at 2:05 AM Geert Uytterhoeven  
> wrote:
> >
> > Hi Rob,
> >
> > On Fri, Oct 5, 2018 at 6:58 PM Rob Herring  wrote:
> > > In preparation to convert board-level bindings to json-schema, move
> > > various misc SoC bindings out to their own file.
> > >
> > > Cc: Mark Rutland 
> > > Cc: Simon Horman 
> > > Cc: Magnus Damm 
> > > Cc: devicet...@vger.kernel.org
> > > Cc: linux-renesas-...@vger.kernel.org
> > > Signed-off-by: Rob Herring 
> >
> > Looks good to me, but needs a rebase, as the PRR section has been extended
> > in -next.
> 
> Is this something you all can apply still for 4.20?

Sorry, missing this until now. It was too late.

Shall I take it for v4.21?

Re: [PATCH 29/36] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-10-18 Thread Simon Horman

On Mon, Oct 08, 2018 at 09:05:58AM -0500, Rob Herring wrote:
> On Mon, Oct 8, 2018 at 3:02 AM Simon Horman  wrote:
> >
> > On Fri, Oct 05, 2018 at 11:58:41AM -0500, Rob Herring wrote:
> > > Convert Renesas SoC bindings to DT schema format using json-schema.
> > >
> > > Cc: Simon Horman 
> > > Cc: Magnus Damm 
> > > Cc: Mark Rutland 
> > > Cc: linux-renesas-...@vger.kernel.org
> > > Cc: devicet...@vger.kernel.org
> > > Signed-off-by: Rob Herring 
> >
> > This seems fine to me other than that it does not seem
> > to apply cleanly to next.
> >
> > shmobile.txt sees a couple of updates per release cycle so from my point of
> > view it would ideal if this change could hit -rc1 to allow patches for
> > v4.21 to be accepted smoothly (already one from Sergei will need rebasing).
> 
> When we get to the point of merging (which isn't going to be 4.20),
> you and other maintainers can probably take all these patches. Other
> than the few restructuring patches, the only dependency is the build
> support which isn't a dependency to apply it, but build it. I plan to
> build any patches as part of reviewing at least early on. OTOH, the
> build support is small enough and self contained that maybe it can
> just be applied for 4.20.

Thanks, understood.

My preference would be to, as you suggest, take changes like
this through the renesas tree.

Re: [PATCH] powerpc: Don't print kernel instructions in show_user_instructions()

2018-10-18 Thread Christophe LEROY





Le 18/10/2018 à 13:12, Jann Horn a écrit :

On Thu, Oct 18, 2018 at 11:28 AM Christophe LEROY
 wrote:

Le 05/10/2018 à 15:21, Michael Ellerman a écrit :

Recently we implemented show_user_instructions() which dumps the code
around the NIP when a user space process dies with an unhandled
signal. This was modelled on the x86 code, and we even went so far as
to implement the exact same bug, namely that if the user process
crashed with its NIP pointing into the kernel we will dump kernel text
to dmesg. eg:

bad-bctr[2996]: segfault (11) at c001 nip c001 lr 
12d0b0894 code 1
bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 
fb810050 7f8802a6
bad-bctr[2996]: code: 3860001c f8010080 48242371 6000 <7c7b1b79> 
4082002c e8010080 eb610048

This was discovered on x86 by Jann Horn and fixed in commit
342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP").

Fix it by checking the adjusted NIP value (pc) and number of
instructions against USER_DS, and bail if we fail the check, eg:

bad-bctr[2969]: segfault (11) at c001 nip c001 lr 
107930894 code 1
bad-bctr[2969]: Bad NIP, not dumping instructions.

Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()")
Signed-off-by: Michael Ellerman 
---
   arch/powerpc/kernel/process.c | 10 ++
   1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 913c5725cdb2..bb6ac471a784 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1306,6 +1306,16 @@ void show_user_instructions(struct pt_regs *regs)

   pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int));

+ /*
+  * Make sure the NIP points at userspace, not kernel text/data or
+  * elsewhere.
+  */
+ if (!__access_ok(pc, instructions_to_print * sizeof(int), USER_DS)) {
+ pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
+ current->comm, current->pid);
+ return;
+ }
+


Is there any reason for not using access_ok() here ?


It's probably more robust this way, in case someone decides to call
into this from kernel exception context at some point, or something
like that?



But access_ok() uses current->thread.addr_limit, while USER_DS may 
provide a larger segment:


#ifdef __powerpc64__
/* We use TASK_SIZE_USER64 as TASK_SIZE is not constant */
#define USER_DS MAKE_MM_SEG(TASK_SIZE_USER64 - 1)
#else
#define USER_DS MAKE_MM_SEG(TASK_SIZE - 1)
#endif

Christophe

[PATCH v2] powerpc/uaccess: fix warning/error with access_ok()

2018-10-18 Thread Christophe Leroy

With the following peace of code, the following compilation warning
is encountered:

if (_IOC_DIR(ioc) != _IOC_NONE) {
int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : 
VERIFY_READ;

if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) {

drivers/platform/test/dev.c: In function ‘my_ioctl’:
drivers/platform/test/dev.c:219:7: warning: unused variable ‘verify’ 
[-Wunused-variable]
   int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

This patch fixes it by handing the type to __access_ok(), changing it
to an inline function for PPC64 as already done for PPC32

Signed-off-by: Christophe Leroy 
---
 v2: fixed the three direct users of __access_ok()

 arch/powerpc/include/asm/uaccess.h | 13 -
 arch/powerpc/kernel/process.c  |  2 +-
 arch/powerpc/lib/sstep.c   |  4 ++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 15bea9a0f260..97faf0353919 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -47,13 +47,16 @@ static inline void set_fs(mm_segment_t fs)
  * This check is sufficient because there is a large enough
  * gap between user addresses and the kernel addresses
  */
-#define __access_ok(addr, size, segment)   \
-   (((addr) <= (segment).seg) && ((size) <= (segment).seg))
+static inline int __access_ok(int type, unsigned long addr, unsigned long size,
+ mm_segment_t seg)
+{
+   return addr <= seg.seg && size <= seg.seg;
+}
 
 #else
 
-static inline int __access_ok(unsigned long addr, unsigned long size,
-   mm_segment_t seg)
+static inline int __access_ok(int type, unsigned long addr, unsigned long size,
+ mm_segment_t seg)
 {
if (addr > seg.seg)
return 0;
@@ -64,7 +67,7 @@ static inline int __access_ok(unsigned long addr, unsigned 
long size,
 
 #define access_ok(type, addr, size)\
(__chk_user_ptr(addr),  \
-__access_ok((__force unsigned long)(addr), (size), get_fs()))
+__access_ok((type), (__force unsigned long)(addr), (size), get_fs()))
 
 /*
  * These are the main single-value transfer routines.  They automatically
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 7ad304a3cc7d..4cc84fe13f9c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1310,7 +1310,7 @@ void show_user_instructions(struct pt_regs *regs)
 * Make sure the NIP points at userspace, not kernel text/data or
 * elsewhere.
 */
-   if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) {
+   if (!__access_ok(VERIFY_READ, pc, NR_INSN_TO_PRINT * sizeof(int), 
USER_DS)) {
pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
current->comm, current->pid);
return;
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index d81568f783e5..ff117418257c 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -110,9 +110,9 @@ static nokprobe_inline long address_ok(struct pt_regs *regs,
 {
if (!user_mode(regs))
return 1;
-   if (__access_ok(ea, nb, USER_DS))
+   if (__access_ok(VERIFY_WRITE, ea, nb, USER_DS))
return 1;
-   if (__access_ok(ea, 1, USER_DS))
+   if (__access_ok(VERIFY_WRITE, ea, 1, USER_DS))
/* Access overlaps the end of the user region */
regs->dar = USER_DS.seg;
else
-- 
2.13.3

Re: SV: MPC8321 boot failure

2018-10-18 Thread gre...@linuxfoundation.org

On Thu, Oct 18, 2018 at 08:51:46AM +, David Gounaris wrote:
> Hi, I can also confirm that it works after cherry-picking the proposed commit.
> 
> Reported-and-tested-by: David Gounaris 
> mailto:david.gouna...@infinera.com>>
> 

Now queued up, thanks.

greg k-h

[bug report] powerpc/perf: Add nest IMC PMU support

2018-10-18 Thread Dan Carpenter

Hello Anju T Sudhakar,

The patch 885dcd709ba9: "powerpc/perf: Add nest IMC PMU support" from
Jul 19, 2017, leads to the following static checker warning:

arch/powerpc/perf/imc-pmu.c:506 nest_imc_event_init()
warn: 'pcni' can't be NULL.

arch/powerpc/perf/imc-pmu.c
   485  if (event->cpu < 0)
   486  return -EINVAL;
   487  
   488  pmu = imc_event_to_pmu(event);
   489  
   490  /* Sanity check for config (event offset) */
   491  if ((config & IMC_EVENT_OFFSET_MASK) > pmu->counter_mem_size)
   492  return -EINVAL;
   493  
   494  /*
   495   * Nest HW counter memory resides in a per-chip reserve-memory 
(HOMER).
   496   * Get the base memory addresss for this cpu.
   497   */
   498  chip_id = cpu_to_chip_id(event->cpu);
   499  pcni = pmu->mem_info;

   500  do {
   501  if (pcni->id == chip_id) {
   502  flag = true;
   503  break;
   504  }
   505  pcni++;
^^
   506  } while (pcni);
 
This will loop until we crash.  I'm not sure what was intended.

   507  
   508  if (!flag)
   509  return -ENODEV;
   510  

regards,
dan carpenter

Re: [PATCH] powerpc: Don't print kernel instructions in show_user_instructions()

2018-10-18 Thread Christophe LEROY





Le 05/10/2018 à 15:21, Michael Ellerman a écrit :

Recently we implemented show_user_instructions() which dumps the code
around the NIP when a user space process dies with an unhandled
signal. This was modelled on the x86 code, and we even went so far as
to implement the exact same bug, namely that if the user process
crashed with its NIP pointing into the kernel we will dump kernel text
to dmesg. eg:

   bad-bctr[2996]: segfault (11) at c001 nip c001 lr 
12d0b0894 code 1
   bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 
fb810050 7f8802a6
   bad-bctr[2996]: code: 3860001c f8010080 48242371 6000 <7c7b1b79> 
4082002c e8010080 eb610048

This was discovered on x86 by Jann Horn and fixed in commit
342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP").

Fix it by checking the adjusted NIP value (pc) and number of
instructions against USER_DS, and bail if we fail the check, eg:

   bad-bctr[2969]: segfault (11) at c001 nip c001 lr 
107930894 code 1
   bad-bctr[2969]: Bad NIP, not dumping instructions.

Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()")
Signed-off-by: Michael Ellerman 
---
  arch/powerpc/kernel/process.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 913c5725cdb2..bb6ac471a784 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1306,6 +1306,16 @@ void show_user_instructions(struct pt_regs *regs)
  
  	pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int));
  
+	/*

+* Make sure the NIP points at userspace, not kernel text/data or
+* elsewhere.
+*/
+   if (!__access_ok(pc, instructions_to_print * sizeof(int), USER_DS)) {
+   pr_info("%s[%d]: Bad NIP, not dumping instructions.\n",
+   current->comm, current->pid);
+   return;
+   }
+


Is there any reason for not using access_ok() here ?

Christophe


pr_info("%s[%d]: code: ", current->comm, current->pid);
  
  	for (i = 0; i < instructions_to_print; i++) {

Re: [PATCH] powerpc/uaccess: fix warning/error with access_ok()

2018-10-18 Thread Christophe LEROY





Le 18/10/2018 à 10:48, Christophe Leroy a écrit :

With the following peace of code, the following compilation warning
is encountered:

if (_IOC_DIR(ioc) != _IOC_NONE) {
int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : 
VERIFY_READ;

if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) {

drivers/platform/test/dev.c: In function ‘my_ioctl’:
drivers/platform/test/dev.c:219:7: warning: unused variable ‘verify’ 
[-Wunused-variable]
int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

This patch fixes it by handing the type to __access_ok(), changing it
to an inline function for PPC64 as already done for PPC32


Oops, not that easy, there are places using __access_ok() directly, 
those need to be modified as well.


Christophe



Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/uaccess.h | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 15bea9a0f260..97faf0353919 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -47,13 +47,16 @@ static inline void set_fs(mm_segment_t fs)
   * This check is sufficient because there is a large enough
   * gap between user addresses and the kernel addresses
   */
-#define __access_ok(addr, size, segment)   \
-   (((addr) <= (segment).seg) && ((size) <= (segment).seg))
+static inline int __access_ok(int type, unsigned long addr, unsigned long size,
+ mm_segment_t seg)
+{
+   return addr <= seg.seg && size <= seg.seg;
+}
  
  #else
  
-static inline int __access_ok(unsigned long addr, unsigned long size,

-   mm_segment_t seg)
+static inline int __access_ok(int type, unsigned long addr, unsigned long size,
+ mm_segment_t seg)
  {
if (addr > seg.seg)
return 0;
@@ -64,7 +67,7 @@ static inline int __access_ok(unsigned long addr, unsigned 
long size,
  
  #define access_ok(type, addr, size)		\

(__chk_user_ptr(addr),  \
-__access_ok((__force unsigned long)(addr), (size), get_fs()))
+__access_ok((type), (__force unsigned long)(addr), (size), get_fs()))
  
  /*

   * These are the main single-value transfer routines.  They automatically

[PATCH] powerpc/uaccess: fix warning/error with access_ok()

2018-10-18 Thread Christophe Leroy

With the following peace of code, the following compilation warning
is encountered:

if (_IOC_DIR(ioc) != _IOC_NONE) {
int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : 
VERIFY_READ;

if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) {

drivers/platform/test/dev.c: In function ‘my_ioctl’:
drivers/platform/test/dev.c:219:7: warning: unused variable ‘verify’ 
[-Wunused-variable]
   int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

This patch fixes it by handing the type to __access_ok(), changing it
to an inline function for PPC64 as already done for PPC32

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/uaccess.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 15bea9a0f260..97faf0353919 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -47,13 +47,16 @@ static inline void set_fs(mm_segment_t fs)
  * This check is sufficient because there is a large enough
  * gap between user addresses and the kernel addresses
  */
-#define __access_ok(addr, size, segment)   \
-   (((addr) <= (segment).seg) && ((size) <= (segment).seg))
+static inline int __access_ok(int type, unsigned long addr, unsigned long size,
+ mm_segment_t seg)
+{
+   return addr <= seg.seg && size <= seg.seg;
+}
 
 #else
 
-static inline int __access_ok(unsigned long addr, unsigned long size,
-   mm_segment_t seg)
+static inline int __access_ok(int type, unsigned long addr, unsigned long size,
+ mm_segment_t seg)
 {
if (addr > seg.seg)
return 0;
@@ -64,7 +67,7 @@ static inline int __access_ok(unsigned long addr, unsigned 
long size,
 
 #define access_ok(type, addr, size)\
(__chk_user_ptr(addr),  \
-__access_ok((__force unsigned long)(addr), (size), get_fs()))
+__access_ok((type), (__force unsigned long)(addr), (size), get_fs()))
 
 /*
  * These are the main single-value transfer routines.  They automatically
-- 
2.13.3

[PATCH kernel 2/2] powerpc/powernv/pseries: Rework device adding to IOMMU groups

2018-10-18 Thread Alexey Kardashevskiy

The powernv platform registers IOMMU groups and adds devices to them
from the pci_controller_ops::setup_bridge() hook except one case when
virtual functions (SRIOV VFs) are added from a bus notifier.

The pseries platform registers IOMMU groups from
the pci_controller_ops::dma_bus_setup() hook and adds devices from
the pci_controller_ops::dma_dev_setup() hook. The very same bus notifier
used for powernv does not add devices for pseries though as
__of_scan_bus() adds devices first, then it does the bus/dev DMA setup.

Both platforms use iommu_add_device() which takes a device and expects
it to have a valid IOMMU table struct with an iommu_table_group pointer
which in turn points the iommu_group struct (which represents
an IOMMU group). Although the helper seems easy to use, it relies on
some pre-existing device configuration and associated data structures
which it does not really need.

This simplifies iommu_add_device() to take the table_group pointer
directly. Pseries already has a table_group pointer handy and the bus
notified is not used anyway. For powernv, this copies the existing bus
notifier, makes it work for powernv only which means an easy way of
getting to the table_group pointer. This was tested on VFs but should
also support physical PCI hotplug.

Since iommu_add_device() receives the table_group pointer directly,
pseries does not do TCE cache invalidation (the hypervisor does) nor
allow multiple groups per a VFIO container (in other words sharing
an IOMMU table between partitionable endpoints), this removes
iommu_table_group_link from pseries.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h  | 12 +++
 arch/powerpc/kernel/iommu.c   | 58 ++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 10 +-
 arch/powerpc/platforms/powernv/pci.c  | 43 ++-
 arch/powerpc/platforms/pseries/iommu.c| 46 
 5 files changed, 74 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 726f07b..39bee10 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -220,9 +220,9 @@ struct iommu_table_group {
 
 extern void iommu_register_group(struct iommu_table_group *table_group,
 int pci_domain_number, unsigned long pe_num);
-extern int iommu_add_device(struct device *dev);
+extern int iommu_add_device(struct iommu_table_group *table_group,
+   struct device *dev);
 extern void iommu_del_device(struct device *dev);
-extern int __init tce_iommu_bus_notifier_init(void);
 extern long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl,
unsigned long entry, unsigned long *hpa,
enum dma_data_direction *direction);
@@ -233,7 +233,8 @@ static inline void iommu_register_group(struct 
iommu_table_group *table_group,
 {
 }
 
-static inline int iommu_add_device(struct device *dev)
+static inline int iommu_add_device(struct iommu_table_group *table_group,
+   struct device *dev)
 {
return 0;
 }
@@ -241,11 +242,6 @@ static inline int iommu_add_device(struct device *dev)
 static inline void iommu_del_device(struct device *dev)
 {
 }
-
-static inline int __init tce_iommu_bus_notifier_init(void)
-{
-return 0;
-}
 #endif /* !CONFIG_IOMMU_API */
 
 int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 47d75c5..fb14fbf 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1094,11 +1094,8 @@ void iommu_release_ownership(struct iommu_table *tbl)
 }
 EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
-int iommu_add_device(struct device *dev)
+int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
-   struct iommu_table *tbl;
-   struct iommu_table_group_link *tgl;
-
/*
 * The sysfs entries should be populated before
 * binding IOMMU group. If sysfs entries isn't
@@ -1114,32 +,10 @@ int iommu_add_device(struct device *dev)
return -EBUSY;
}
 
-   tbl = get_iommu_table_base(dev);
-   if (!tbl) {
-   pr_debug("%s: Skipping device %s with no tbl\n",
-__func__, dev_name(dev));
-   return 0;
-   }
-
-   tgl = list_first_entry_or_null(>it_group_list,
-   struct iommu_table_group_link, next);
-   if (!tgl) {
-   pr_debug("%s: Skipping device %s with no group\n",
-__func__, dev_name(dev));
-   return 0;
-   }
pr_debug("%s: Adding %s to iommu group %d\n",
-__func__, dev_name(dev),
-iommu_group_id(tgl->table_group->group));
+__func__, dev_name(dev),  iommu_group_id(table_group->group));
 
-   if (PAGE_SIZE <

[PATCH kernel 1/2] powerpc/pseries: Remove IOMMU API support for non-LPAR systems

2018-10-18 Thread Alexey Kardashevskiy

The pci_dma_bus_setup_pSeries and pci_dma_dev_setup_pSeries hooks are
registered for the pseries platform which does not have FW_FEATURE_LPAR;
these would be pre-powernv platforms which we never supported PCI pass
through for anyway so remove it.

Signed-off-by: Alexey Kardashevskiy 
---

Propably should remove all pseries-but-not-lpar code.
---
 arch/powerpc/platforms/pseries/iommu.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index cf90582..eae2578 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -648,7 +648,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
iommu_table_setparms(pci->phb, dn, tbl);
tbl->it_ops = _table_pseries_ops;
iommu_init_table(tbl, pci->phb->node);
-   iommu_register_group(pci->table_group, pci_domain_nr(bus), 0);
 
/* Divide the rest (1.75GB) among the children */
pci->phb->dma_window_size = 0x8000ul;
@@ -759,10 +758,7 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
iommu_table_setparms(phb, dn, tbl);
tbl->it_ops = _table_pseries_ops;
iommu_init_table(tbl, phb->node);
-   iommu_register_group(PCI_DN(dn)->table_group,
-   pci_domain_nr(phb->bus), 0);
set_iommu_table_base(>dev, tbl);
-   iommu_add_device(>dev);
return;
}
 
@@ -773,11 +769,10 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
while (dn && PCI_DN(dn) && PCI_DN(dn)->table_group == NULL)
dn = dn->parent;
 
-   if (dn && PCI_DN(dn)) {
+   if (dn && PCI_DN(dn))
set_iommu_table_base(>dev,
PCI_DN(dn)->table_group->tables[0]);
-   iommu_add_device(>dev);
-   } else
+   else
printk(KERN_WARNING "iommu: Device %s has no iommu table\n",
   pci_name(dev));
 }
-- 
2.11.0

[PATCH kernel 0/2] powerpc/iommu: Redo iommu groups

2018-10-18 Thread Alexey Kardashevskiy

The aim is to:

1. simplify the code

2. get rid of iommu_table_group_link.
Now: iommu_table points to a list of iommu_table_group_link which
point to iommu_table_group (powerpc specific part of an IOMMU group);
one iommu_table_group_link per iommu_table_group per IODA PE.

Plan: add support of multiple IODA PEs per one generic IOMMU group,
I am just not sure how yet. We already have a case of GPU+NPU groups,
and a worse one is coming, with 3xGPU + 6xNVLink2.


Please comment. Thanks.



Alexey Kardashevskiy (2):
  powerpc/pseries: Remove IOMMU API support for non-LPAR systems
  powerpc/powernv/pseries: Rework device adding to IOMMU groups

 arch/powerpc/include/asm/iommu.h  | 12 +++
 arch/powerpc/kernel/iommu.c   | 58 ++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 10 +-
 arch/powerpc/platforms/powernv/pci.c  | 43 ++-
 arch/powerpc/platforms/pseries/iommu.c| 55 ++---
 5 files changed, 76 insertions(+), 102 deletions(-)

-- 
2.11.0

Re: [PATCH] powerpc: Add missing include

2018-10-18 Thread Mathieu Malaterre

On Thu, Oct 18, 2018 at 6:37 AM Christophe LEROY
 wrote:
>
>
>
> Le 17/10/2018 à 21:25, Mathieu Malaterre a écrit :
> > In commit 88b0fe175735 ("powerpc: Add show_user_instructions()") the
> > function show_user_instructions was added.
> >
> > This commit adds an include of header file  to provide
> > the missing function prototype. Silence the following gcc warning
> > (treated as error with W=1):
> >
> >arch/powerpc/kernel/process.c:1302:6: error: no previous prototype for 
> > ‘show_user_instructions’ [-Werror=missing-prototypes]
> >
> > Signed-off-by: Mathieu Malaterre 
>
> This is already fixed, see
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c9386bfd37d37f29588de9ea9add455510049c33

Excellent !

Sorry for the noise :(

> Christophe
>
> > ---
> >   arch/powerpc/kernel/process.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index bb6ac471a784..1c64491e9702 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -65,6 +65,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >
> >   #include 
> >   #include 
> >

86 matches

Mail list logo