Re: [PATCHv2 1/3] mm/numa: change the topo of build_zonelist_xx()

2018-12-20 Thread kbuild test robot
Hi Pingfan, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.20-rc7 next-20181220] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux

Re: [RFC PATCH v2 04/11] powerpc/mm: Add a framework for Kernel Userspace Access Protection

2018-12-20 Thread Christophe Leroy
Le 21/12/2018 à 06:07, Michael Ellerman a écrit : Christophe Leroy writes: This patch implements a framework for Kernel Userspace Access Protection. Then subarches will have to possibility to provide their own implementation by providing setup_kuap() and lock/unlock_user_access() Some

Re: [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node

2018-12-20 Thread Pingfan Liu
On Thu, Dec 20, 2018 at 8:44 PM Michal Hocko wrote: > > On Thu 20-12-18 20:26:28, Pingfan Liu wrote: > > On Thu, Dec 20, 2018 at 7:35 PM Michal Hocko wrote: > > > > > > On Thu 20-12-18 17:50:38, Pingfan Liu wrote: > > > [...] > > > > @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t

linux-next: manual merge of the kvm tree with the powerpc tree

2018-12-20 Thread Stephen Rothwell
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/powerpc/mm/fault.c between commit: 49a502ea23bf ("powerpc/mm: Make NULL pointer deferences explicit on bad page faults.") from the powerpc tree and commit: d7b456152230 ("KVM: PPC: Book3S HV: Implement functions

Re: [RFC PATCH v2 04/11] powerpc/mm: Add a framework for Kernel Userspace Access Protection

2018-12-20 Thread Michael Ellerman
Christophe Leroy writes: > This patch implements a framework for Kernel Userspace Access > Protection. > > Then subarches will have to possibility to provide their own > implementation by providing setup_kuap() and lock/unlock_user_access() > > Some platform will need to know the area accessed

Re: [PATCH] serial: 8250: Default SERIAL_OF_PLATFORM to SERIAL_8250

2018-12-20 Thread Florian Fainelli
;>> defined where it was not previously. Example mpc85xx_defconfig. This in >>> turn results in boot failures for those configurations, with an error >>> message of >>> >>> of_serial: probe of e0004500.serial failed with error -22 >>> >>>

[PATCH] Revert "serial: 8250: Default SERIAL_OF_PLATFORM to SERIAL_8250"

2018-12-20 Thread Florian Fainelli
This reverts commit 6d11023c345e369bcb9d5a68b271764e362c1f6e ("serial: 8250: Default SERIAL_OF_PLATFORM to SERIAL_8250") since that breaks at least mpc8544ds (PowerPC) using arch/powerpc/kernel/legacy_serial.c. See https://lkml.org/lkml/2018/12/5/1491 for discussion and analysis Fixes:

Re: [PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Alex Williamson
On Fri, 21 Dec 2018 12:50:00 +1100 Alexey Kardashevskiy wrote: > On 21/12/2018 12:37, Alex Williamson wrote: > > On Fri, 21 Dec 2018 12:23:16 +1100 > > Alexey Kardashevskiy wrote: > > > >> On 21/12/2018 03:46, Alex Williamson wrote: > >>> On Thu, 20 Dec 2018 19:23:50 +1100 > >>> Alexey

Re: trace_hardirqs_on/off vs. extra stack frames

2018-12-20 Thread Steven Rostedt
On Fri, 21 Dec 2018 12:11:35 +1100 Benjamin Herrenschmidt wrote: > Hi Steven ! > > I'm trying to untangle something, and I need your help :-) > > In commit 3cb5f1a3e58c0bd70d47d9907cc5c65192281dee, you added a summy > stack frame around the assembly calls to trace_hardirqs_on/off on the >

Re: [PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Alexey Kardashevskiy
On 21/12/2018 12:37, Alex Williamson wrote: > On Fri, 21 Dec 2018 12:23:16 +1100 > Alexey Kardashevskiy wrote: > >> On 21/12/2018 03:46, Alex Williamson wrote: >>> On Thu, 20 Dec 2018 19:23:50 +1100 >>> Alexey Kardashevskiy wrote: >>> POWER9 Witherspoon machines come with 4 or 6 V100

Re: [PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Alex Williamson
On Fri, 21 Dec 2018 12:23:16 +1100 Alexey Kardashevskiy wrote: > On 21/12/2018 03:46, Alex Williamson wrote: > > On Thu, 20 Dec 2018 19:23:50 +1100 > > Alexey Kardashevskiy wrote: > > > >> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not > >> pluggable PCIe devices but

Re: [PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Alexey Kardashevskiy
On 21/12/2018 03:46, Alex Williamson wrote: > On Thu, 20 Dec 2018 19:23:50 +1100 > Alexey Kardashevskiy wrote: > >> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not >> pluggable PCIe devices but still have PCIe links which are used >> for config space and MMIO. In addition

trace_hardirqs_on/off vs. extra stack frames

2018-12-20 Thread Benjamin Herrenschmidt
Hi Steven ! I'm trying to untangle something, and I need your help :-) In commit 3cb5f1a3e58c0bd70d47d9907cc5c65192281dee, you added a summy stack frame around the assembly calls to trace_hardirqs_on/off on the ground that when using the latency tracer (irqsoff), you might poke at CALLER_ADDR1

Re: [PATCH v2] powerpc/pkeys: copy pkey-tracking-information at fork()

2018-12-20 Thread Michael Ellerman
Ram Pai writes: > Pkey tracking information is not copied over to the mm_struct of the > child during fork(). This can cause the child to erroneously allocate > keys that were already allocated. Any allocated execute-only key is lost > aswell. > > Add code; called by dup_mmap(), to copy the

Re: [PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Michael Ellerman
Murilo Opsfelder Araujo writes: > On Thu, Dec 20, 2018 at 07:23:50PM +1100, Alexey Kardashevskiy wrote: ... >> diff --git a/drivers/vfio/pci/trace.h b/drivers/vfio/pci/trace.h >> new file mode 100644 >> index 000..b80d2d3 >> --- /dev/null >> +++ b/drivers/vfio/pci/trace.h ... >>

Re: [RFC/WIP] powerpc: Fix 32-bit handling of MSR_EE on exceptions

2018-12-20 Thread Benjamin Herrenschmidt
> > /* > >* MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE. > > @@ -205,20 +208,46 @@ transfer_to_handler_cont: > > mflrr9 > > lwz r11,0(r9) /* virtual address of handler */ > > lwz r9,4(r9)/* where to go when done */ > >

Re: [PATCH 1/2] PCI/IOV: provide flag to skip VF scanning

2018-12-20 Thread Bjorn Helgaas
Hi Sebastian, On Tue, Dec 18, 2018 at 11:16:49AM +0100, Sebastian Ott wrote: > Provide a flag to skip scanning for new VFs after SRIOV enablement. > This can be set by implementations for which the VFs are already > reported by other means. > > Signed-off-by: Sebastian Ott > --- >

[PATCH v2] powerpc/pkeys: copy pkey-tracking-information at fork()

2018-12-20 Thread Ram Pai
Pkey tracking information is not copied over to the mm_struct of the child during fork(). This can cause the child to erroneously allocate keys that were already allocated. Any allocated execute-only key is lost aswell. Add code; called by dup_mmap(), to copy the pkey state from parent to

Re: [PATCH] powerpc/pkeys: copy pkey-tracking-information at fork()

2018-12-20 Thread Ram Pai
On Fri, Dec 21, 2018 at 12:19:13AM +1100, Michael Ellerman wrote: > Hi Ram, > > Thanks for fixing this. > > Ram Pai writes: > > diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c > > index b271b28..5d65c47 100644 > > --- a/arch/powerpc/mm/pkeys.c > > +++ b/arch/powerpc/mm/pkeys.c >

[PATCH 9/9] powerpc/fadump: Update documentation about OPAL platform support

2018-12-20 Thread Hari Bathini
With FADump support now available on both pseries and OPAL platforms, update FADump documentation with these details. Also, update about backup area and why it is used. Signed-off-by: Hari Bathini --- Documentation/powerpc/firmware-assisted-dump.txt | 102 ++ 1 file

[PATCH 8/9] powerpc/fadump: use FADump instead of fadump for how it is pronounced

2018-12-20 Thread Hari Bathini
Signed-off-by: Hari Bathini --- Documentation/powerpc/firmware-assisted-dump.txt | 56 +++--- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index

[PATCH 7/9] powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel

2018-12-20 Thread Hari Bathini
Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures that crash data, from previously crash'ed kernel, is preserved. This helps in cases where FADUMP is not enabled but the subsequent memory preserving kernel boot is likely to process this crash data. One typical usecase for this

[PATCH 6/9] powerpc/powernv: export /proc/opalcore for analysing opal crashes

2018-12-20 Thread Hari Bathini
From: Hari Bathini Export /proc/opalcore file to analyze opal crashes Signed-off-by: Hari Bathini --- arch/powerpc/platforms/powernv/Makefile |2 arch/powerpc/platforms/powernv/opal-core.c | 385 ++ arch/powerpc/platforms/powernv/opal-core.h | 35 ++

[PATCH 5/9] powerpc/fadump: process architected register state data provided by firmware

2018-12-20 Thread Hari Bathini
From: Hari Bathini Firmware provides architected register state data at the time of crash. This data contains PIR value. Need to store the logical CPUs PIR values to match the data provided by f/w with the corresponding logical CPU. Signed-off-by: Hari Bathini Signed-off-by: Vasant Hegde ---

[PATCH 4/9] powerpc/fadump: enable fadump support on OPAL based POWER platform

2018-12-20 Thread Hari Bathini
From: Hari Bathini Firmware-assisted dump support is enabled for OPAL based POWER platforms in P9 firmware. Make the corresponding updates in kernel to enable fadump support for such platforms. Signed-off-by: Hari Bathini --- arch/powerpc/Kconfig|5

[PATCH 3/9] pseries/fadump: move out platform specific support from generic code

2018-12-20 Thread Hari Bathini
Introduce callbacks for platform specific operations like register, unregister, invalidate & such, and move pseries specific code into platform code. Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump.h | 71 --- arch/powerpc/kernel/fadump.c|

[PATCH 2/9] powerpc/fadump: Improve fadump documentation

2018-12-20 Thread Hari Bathini
The figures depicting FADump's (Firmware-Assisted Dump) memory layout are missing some finer details like different memory regions and what they represent. Improve the documentation by updating those details. Signed-off-by: Hari Bathini --- Documentation/powerpc/firmware-assisted-dump.txt |

[PATCH 1/9] powerpc/fadump: move internal fadump code to a new file

2018-12-20 Thread Hari Bathini
Refactoring fadump code means internal fadump code is referenced from different places. For ease, move internal code to a new file. Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump.h | 112 --- arch/powerpc/kernel/Makefile |2

[PATCH 0/9] Add FADump support on PowerNV platform

2018-12-20 Thread Hari Bathini
Firmware-Assisted Dump (FADump) is currently supported only on pseries platform. This patch series adds support for powernv platform too. The first and third patches refactor the FADump code to make use of common code across multiple platforms. The fourth patch adds basic FADump support to

Re: [PATCH] ocxl: Clarify error path in setup_xsl_irq()

2018-12-20 Thread Greg Kurz
On Tue, 11 Dec 2018 11:19:55 +1100 Andrew Donnellan wrote: > On 11/12/18 2:18 am, Greg Kurz wrote: > > Implementing rollback with goto and labels is a common practice that > > leads to prettier and more maintainable code. FWIW, this design pattern > > is already being used in alloc_link() a few

Re: [PATCH] ocxl/afu_irq: Don't include

2018-12-20 Thread Greg Kurz
On Tue, 11 Dec 2018 11:09:39 +1100 Andrew Donnellan wrote: > Acked-by: Andrew Donnellan > Friendly ping before Xmas break :) > On 11/12/18 2:13 am, Greg Kurz wrote: > > The AFU irq code doesn't need to reach out to the platform. > > > > Signed-off-by: Greg Kurz > > --- > >

Re: [PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Alex Williamson
On Thu, 20 Dec 2018 19:23:50 +1100 Alexey Kardashevskiy wrote: > POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not > pluggable PCIe devices but still have PCIe links which are used > for config space and MMIO. In addition to that the GPUs have 6 NVLinks > which are connected

Re: [RFC/WIP] powerpc: Fix 32-bit handling of MSR_EE on exceptions

2018-12-20 Thread Christophe Leroy
On 12/20/2018 05:40 AM, Benjamin Herrenschmidt wrote: Hi folks ! Why trying to figure out why we had occasionally lockdep barf about interrupt state on ppc32 (440 in my case but I could reproduce on e500 as well using qemu), I realized that we are still doing something rather gothic and

Re: [PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Murilo Opsfelder Araujo
On Thu, Dec 20, 2018 at 07:23:50PM +1100, Alexey Kardashevskiy wrote: > POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not > pluggable PCIe devices but still have PCIe links which are used > for config space and MMIO. In addition to that the GPUs have 6 NVLinks > which are

Re: [PATCH v2] ocxl: Fix endiannes bug in read_afu_name()

2018-12-20 Thread Greg Kurz
On Wed, 12 Dec 2018 13:26:10 +1100 Andrew Donnellan wrote: > On 12/12/18 4:58 am, Greg Kurz wrote: > > The AFU Descriptor Template in the PCI config space has a Name Space > > field which is a 24 Byte ASCII character string of descriptive name > > space for the AFU. The OCXL driver read the

Re: [PATCH] ocxl: Fix endiannes bug in ocxl_link_update_pe()

2018-12-20 Thread Greg Kurz
On Mon, 17 Dec 2018 11:38:51 +1100 "Alastair D'Silva" wrote: > On Sun, 2018-12-16 at 22:28 +0100, Greg Kurz wrote: > > All fields in the PE are big-endian. Use cpu_to_be32() like > > everywhere > > else something is written to the PE. Otherwise a wrong TID will be > > used > > by the NPU. If

Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types

2018-12-20 Thread David Hildenbrand
On 20.12.18 14:08, Michal Hocko wrote: > On Thu 20-12-18 13:58:16, David Hildenbrand wrote: >> On 30.11.18 18:59, David Hildenbrand wrote: >>> This is the second approach, introducing more meaningful memory block >>> types and not changing online behavior in the kernel. It is based on >>> latest

Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types

2018-12-20 Thread Michal Hocko
On Thu 20-12-18 13:58:16, David Hildenbrand wrote: > On 30.11.18 18:59, David Hildenbrand wrote: > > This is the second approach, introducing more meaningful memory block > > types and not changing online behavior in the kernel. It is based on > > latest linux-next. > > > > As we found out during

Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types

2018-12-20 Thread David Hildenbrand
On 30.11.18 18:59, David Hildenbrand wrote: > This is the second approach, introducing more meaningful memory block > types and not changing online behavior in the kernel. It is based on > latest linux-next. > > As we found out during dicussion, user space should always handle onlining > of

Re: [PATCH] powerpc/pkeys: copy pkey-tracking-information at fork()

2018-12-20 Thread Michael Ellerman
Hi Ram, Thanks for fixing this. Ram Pai writes: > diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c > index b271b28..5d65c47 100644 > --- a/arch/powerpc/mm/pkeys.c > +++ b/arch/powerpc/mm/pkeys.c > @@ -414,3 +414,10 @@ bool arch_vma_access_permitted(struct vm_area_struct > *vma,

Re: [PATCH] selftests/powerpc: New TM signal self test

2018-12-20 Thread Michael Ellerman
Breno Leitao writes: > A new self test that forces MSR[TS] to be set without calling any TM > instruction. This test also tries to cause a page fault at a signal > handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing > thread->texasr to be rewritten with TEXASR[FS] = 0, which will

Re: [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node

2018-12-20 Thread Michal Hocko
On Thu 20-12-18 20:26:28, Pingfan Liu wrote: > On Thu, Dec 20, 2018 at 7:35 PM Michal Hocko wrote: > > > > On Thu 20-12-18 17:50:38, Pingfan Liu wrote: > > [...] > > > @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags) > > > */ > > > static inline struct zonelist

Re: [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node

2018-12-20 Thread Pingfan Liu
On Thu, Dec 20, 2018 at 7:35 PM Michal Hocko wrote: > > On Thu 20-12-18 17:50:38, Pingfan Liu wrote: > [...] > > @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags) > > */ > > static inline struct zonelist *node_zonelist(int nid, gfp_t flags) > > { > > - return

Re: [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node

2018-12-20 Thread Michal Hocko
On Thu 20-12-18 17:50:38, Pingfan Liu wrote: [...] > @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags) > */ > static inline struct zonelist *node_zonelist(int nid, gfp_t flags) > { > - return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); > + if

Re: [PATCH kernel v7 00/20] powerpc/powernv/npu, vfio: NVIDIA V100 + P9 passthrough

2018-12-20 Thread Alexey Kardashevskiy
On 20/12/2018 20:38, Michael Ellerman wrote: > Alexey Kardashevskiy writes: > >> My bad, I was not cc-ing everyone but now with v7 I am, sorry about that. > > I've already applied v6, I'll assume this is unchanged from that unless > you tell me otherwise. 14/20 has fixed warning about

Re: [PATCH v3 2/3] powerpc: Discard dynsym section for !PPC32

2018-12-20 Thread Michael Ellerman
Joel Stanley writes: > Alan Modra explains: > > > Likely you could discard .interp > and .dynstr too, and .dynsym when > > !CONFIG_PPC32. > > Discarding of interp and dynstr happened in a previous patch. The dynsym > cleanup was a bit less straightforward, so it gets it's own patch. > >

Re: [PATCH v3 1/3] powerpc: Discard more sections in linker script

2018-12-20 Thread Michael Ellerman
Joel Stanley writes: > Building the ppc64 kernel with a modern binutils results in this > warning: > > powerpc64le-linux-gnu-ld: warning: orphan section `.gnu.hash' from > `linker stubs' being placed in section `.gnu.hash' > > Alan Modra explains: > > > .gnu.hash, like .hash, is used by

[PATCHv2 3/3] powerpc/numa: make all possible node be instanced against NULL reference in node_zonelist()

2018-12-20 Thread Pingfan Liu
This patch tries to resolve a bug rooted at mm when using nr_cpus. It was reported at [1]. The root cause is: device->numa_node info is used as preferred_nid param for __alloc_pages_nodemask(), which causes NULL reference when ac->zonelist = node_zonelist(preferred_nid, gfp_mask), due to the

[PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node

2018-12-20 Thread Pingfan Liu
I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. It is due to some pgdat is not instanced when specifying nr_cpus, e.g, on x86, not initialized by init_cpu_to_node()->init_memory_less_node(). But device->numa_node info is used as preferred_nid param for __alloc_pages_nodemask(),

[PATCHv2 1/3] mm/numa: change the topo of build_zonelist_xx()

2018-12-20 Thread Pingfan Liu
The current build_zonelist_xx func relies on pgdat instance to build zonelist, if a numa node is offline, there will no pgdat instance for it. But in some case, there is still requirement for zonelist of offline node, especially with nr_cpus option. This patch change these funcs topo to ease the

[PATCHv2 0/3] mm: bugfix for NULL reference in mm on all archs

2018-12-20 Thread Pingfan Liu
This bug is original reported at https://lore.kernel.org/patchwork/patch/1020838/ In a short word, this bug should affect all archs, where a machine with a numa-node having no memory, if nr_cpus prevents the instance of nodeA, and the device on nodeA tries to allocate memory with

Re: [PATCH kernel v7 00/20] powerpc/powernv/npu, vfio: NVIDIA V100 + P9 passthrough

2018-12-20 Thread Michael Ellerman
Alexey Kardashevskiy writes: > My bad, I was not cc-ing everyone but now with v7 I am, sorry about that. I've already applied v6, I'll assume this is unchanged from that unless you tell me otherwise. cheers > This is for passing through NVIDIA V100 GPUs on POWER9 systems. > 20/20 has the

Re: [PATCH] powerpc/8xx: Map a second 8M text page at startup when needed.

2018-12-20 Thread Christophe Leroy
Le 20/12/2018 à 09:24, Christoph Hellwig a écrit : On Thu, Dec 20, 2018 at 05:48:25AM +, Christophe Leroy wrote: Some debug setup like CONFIG_KASAN generate huge kernels with text size over the 8M limit. This patch maps a second 8M page when _einittext is over 8M. Do we also need a

[PATCH kernel v7 19/20] vfio_pci: Allow regions to add own capabilities

2018-12-20 Thread Alexey Kardashevskiy
VFIO regions already support region capabilities with a limited set of fields. However the subdriver might have to report to the userspace additional bits. This adds an add_capability() hook to vfio_pci_regops. Signed-off-by: Alexey Kardashevskiy Acked-by: Alex Williamson --- Changes: v3: *

[PATCH kernel v7 20/20] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver

2018-12-20 Thread Alexey Kardashevskiy
POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not pluggable PCIe devices but still have PCIe links which are used for config space and MMIO. In addition to that the GPUs have 6 NVLinks which are connected to other GPUs and the POWER9 CPU. POWER9 chips have a special unit on a

[PATCH kernel v7 18/20] vfio_pci: Allow mapping extra regions

2018-12-20 Thread Alexey Kardashevskiy
So far we only allowed mapping of MMIO BARs to the userspace. However there are GPUs with on-board coherent RAM accessible via side channels which we also want to map to the userspace. The first client for this is NVIDIA V100 GPU with NVLink2 direct links to a POWER9 NPU-enabled CPU; such GPUs

[PATCH kernel v7 09/20] powerpc/powernv/pseries: Rework device adding to IOMMU groups

2018-12-20 Thread Alexey Kardashevskiy
The powernv platform registers IOMMU groups and adds devices to them from the pci_controller_ops::setup_bridge() hook except one case when virtual functions (SRIOV VFs) are added from a bus notifier. The pseries platform registers IOMMU groups from the pci_controller_ops::dma_bus_setup() hook and

[PATCH kernel v7 17/20] powerpc/powernv/npu: Fault user page into the hypervisor's pagetable

2018-12-20 Thread Alexey Kardashevskiy
When a page fault happens in a GPU, the GPU signals the OS and the GPU driver calls the fault handler which populated a page table; this allows the GPU to complete an ATS request. On the bare metal get_user_pages() is enough as it adds a pte to the kernel page table but under KVM the partition

[PATCH kernel v7 16/20] powerpc/powernv/npu: Check mmio_atsd array bounds when populating

2018-12-20 Thread Alexey Kardashevskiy
A broken device tree might contain more than 8 values and introduce hard to debug memory corruption bug. This adds the boundary check. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/platforms/powernv/npu-dma.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git

[PATCH kernel v7 15/20] powerpc/powernv/npu: Add release_ownership hook

2018-12-20 Thread Alexey Kardashevskiy
In order to make ATS work and translate addresses for arbitrary LPID and PID, we need to program an NPU with LPID and allow PID wildcard matching with a specific MSR mask. This implements a helper to assign a GPU to LPAR and program the NPU with a wildcard for PID and a helper to do clean-up. The

[PATCH kernel v7 14/20] powerpc/powernv/npu: Add compound IOMMU groups

2018-12-20 Thread Alexey Kardashevskiy
At the moment the powernv platform registers an IOMMU group for each PE. There is an exception though: an NVLink bridge which is attached to the corresponding GPU's IOMMU group making it a master. Now we have POWER9 systems with GPUs connected to each other directly bypassing PCI. At the moment

[PATCH kernel v7 05/20] powerpc/powernv/npu: Move OPAL calls away from context manipulation

2018-12-20 Thread Alexey Kardashevskiy
When introduced, the NPU context init/destroy helpers called OPAL which enabled/disabled PID (a userspace memory context ID) filtering in an NPU per a GPU; this was a requirement for P9 DD1.0. However newer chip revision added a PID wildcard support so there is no more need to call OPAL every time

[PATCH kernel v7 13/20] powerpc/powernv/npu: Convert NPU IOMMU helpers to iommu_table_group_ops

2018-12-20 Thread Alexey Kardashevskiy
At the moment NPU IOMMU is manipulated directly from the IODA2 PCI PE code; PCI PE acts as a master to NPU PE. Soon we will have compound IOMMU groups with several PEs from several different PHB (such as interconnected GPUs and NPUs) so there will be no single master but a one big IOMMU group.

[PATCH kernel v7 04/20] powerpc/powernv: Move npu struct from pnv_phb to pci_controller

2018-12-20 Thread Alexey Kardashevskiy
The powernv PCI code stores NPU data in the pnv_phb struct. The latter is referenced by pci_controller::private_data. We are going to have NPU2 support in the pseries platform as well but it does not store any private_data in in the pci_controller struct; and even if it did, it would be a

[PATCH kernel v7 12/20] powerpc/powernv/npu: Move single TVE handling to NPU PE

2018-12-20 Thread Alexey Kardashevskiy
Normal PCI PEs have 2 TVEs, one per a DMA window; however NPU PE has only one which points to one of two tables of the corresponding PCI PE. So whenever a new DMA window is programmed to PEs, the NPU PE needs to release old table in order to use the new one. Commit d41ce7b1bcc3e

[PATCH kernel v7 02/20] powerpc/mm/iommu/vfio_spapr_tce: Change mm_iommu_get to reference a region

2018-12-20 Thread Alexey Kardashevskiy
Normally mm_iommu_get() should add a reference and mm_iommu_put() should remove it. However historically mm_iommu_find() does the referencing and mm_iommu_get() is doing allocation and referencing. We are going to add another helper to preregister device memory so instead of having mm_iommu_new()

[PATCH kernel v7 11/20] powerpc/powernv: Reference iommu_table while it is linked to a group

2018-12-20 Thread Alexey Kardashevskiy
The iommu_table pointer stored in iommu_table_group may get stale by accident, this adds referencing and removes a redundant comment about this. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson --- arch/powerpc/platforms/powernv/pci-ioda-tce.c | 3 ++-

[PATCH kernel v7 10/20] powerpc/iommu_api: Move IOMMU groups setup to a single place

2018-12-20 Thread Alexey Kardashevskiy
Registering new IOMMU groups and adding devices to them are separated in code and the latter is dug in the DMA setup code which it does not really belong to. This moved IOMMU groups setup to a separate helper which registers a group and adds devices as before. This does not make a difference as

[PATCH kernel v7 01/20] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2

2018-12-20 Thread Alexey Kardashevskiy
The skiboot firmware has a hot reset handler which fences the NVIDIA V100 GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs: https://github.com/open-power/skiboot/commit/fca2b2b839a67 Now we are going to pass V100 via VFIO which most certainly involves KVM guests which are

[PATCH kernel v7 00/20] powerpc/powernv/npu, vfio: NVIDIA V100 + P9 passthrough

2018-12-20 Thread Alexey Kardashevskiy
My bad, I was not cc-ing everyone but now with v7 I am, sorry about that. This is for passing through NVIDIA V100 GPUs on POWER9 systems. 20/20 has the details of hardware setup. This implements support for NVIDIA V100 GPU with coherent memory and NPU/ATS support available in the POWER9 CPU.

[PATCH kernel v7 08/20] powerpc/pseries: Remove IOMMU API support for non-LPAR systems

2018-12-20 Thread Alexey Kardashevskiy
The pci_dma_bus_setup_pSeries and pci_dma_dev_setup_pSeries hooks are registered for the pseries platform which does not have FW_FEATURE_LPAR; these would be pre-powernv platforms which we never supported PCI pass through for anyway so remove it. Signed-off-by: Alexey Kardashevskiy Reviewed-by:

[PATCH kernel v7 07/20] powerpc/pseries/npu: Enable platform support

2018-12-20 Thread Alexey Kardashevskiy
We already changed NPU API for GPUs to not to call OPAL and the remaining bit is initializing NPU structures. This searches for POWER9 NVLinks attached to any device on a PHB and initializes an NPU structure if any found. Signed-off-by: Alexey Kardashevskiy --- Changes: v5: * added WARN_ON_ONCE

Re: [PATCH] powerpc/8xx: Map a second 8M text page at startup when needed.

2018-12-20 Thread Christoph Hellwig
On Thu, Dec 20, 2018 at 05:48:25AM +, Christophe Leroy wrote: > Some debug setup like CONFIG_KASAN generate huge > kernels with text size over the 8M limit. > > This patch maps a second 8M page when _einittext is over 8M. Do we also need a check to generate a useful warning if we ever

[PATCH kernel v7 06/20] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation

2018-12-20 Thread Alexey Kardashevskiy
We might have memory@ nodes with "linux,usable-memory" set to zero (for example, to replicate powernv's behaviour for GPU coherent memory) which means that the memory needs an extra initialization but since it can be used afterwards, the pseries platform will try mapping it for DMA so the DMA

[PATCH kernel v7 03/20] powerpc/vfio/iommu/kvm: Do not pin device memory

2018-12-20 Thread Alexey Kardashevskiy
This new memory does not have page structs as it is not plugged to the host so gup() will fail anyway. This adds 2 helpers: - mm_iommu_newdev() to preregister the "memory device" memory so the rest of API can still be used; - mm_iommu_is_devmem() to know if the physical address is one of thise

Re: [PATCH kernel v6 18/20] vfio_pci: Allow mapping extra regions

2018-12-20 Thread Christoph Hellwig
On Wed, Dec 19, 2018 at 09:43:58AM -0700, Alex Williamson wrote: > [cc +kvm, +lkml] > > Sorry, just noticed these are only visible on ppc lists or for those > directly cc'd. vfio's official development list is the kvm list. I'll > let spapr specific changes get away without copying this list,