[PATCH v5 10/12] fs, dax: kill IS_DAX()

2018-03-01 Thread Dan Williams
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all the remaining IS_DAX() usages to use explicit tests for FSDAX. Cc: Matthew Wilcox Cc: Ross Zwisler Cc: Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Review

[PATCH v5 11/12] dax: fix S_DAX definition

2018-03-01 Thread Dan Williams
Make sure S_DAX is defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. Otherwise vma_is_dax() may incorrectly return false in the Device-DAX case. Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Christoph Hellwig Cc: Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mma

[PATCH v5 09/12] mm, dax: replace IS_DAX() with IS_DEVDAX() or IS_FSDAX()

2018-03-01 Thread Dan Williams
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for the DEVDAX and FSDAX sub-cases of DAX functionality. Cc: Matthew Wilcox Cc: Ross Zwisler Cc: Fixes: dee410792419 ("/dev/dax, core: file

[PATCH v5 12/12] vfio: disable filesystem-dax page pinning

2018-03-01 Thread Dan Williams
Filesystem-DAX is incompatible with 'longterm' page pinning. Without page cache indirection a DAX mapping maps filesystem blocks directly. This means that the filesystem must not modify a file's block map while any page in a mapping is pinned. In order to prevent the situation of userspace holding

[PATCH v5 07/12] ext4, dax: replace IS_DAX() with IS_FSDAX()

2018-03-01 Thread Dan Williams
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous. Cc: "Theodore Ts'o" Cc: Andreas Dilger Cc: Alexander Viro Cc: Matthew Wilcox Cc: Ross Zwisler Cc: Fixes

[PATCH v5 08/12] xfs, dax: replace IS_DAX() with IS_FSDAX()

2018-03-01 Thread Dan Williams
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous. Cc: "Darrick J. Wong" Cc: linux-...@vger.kernel.org Cc: Matthew Wilcox Cc: Ross Zwisler Cc: Fixes: dee410

[PATCH v5 05/12] ext4, dax: define ext4_dax_*() infrastructure in all cases

2018-03-01 Thread Dan Williams
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will st

[PATCH v5 04/12] ext2, dax: define ext2_dax_*() infrastructure in all cases

2018-03-01 Thread Dan Williams
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will st

[PATCH v5 06/12] ext2, dax: replace IS_DAX() with IS_FSDAX()

2018-03-01 Thread Dan Williams
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous. Cc: Matthew Wilcox Cc: Ross Zwisler Cc: Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap

[PATCH v5 03/12] ext2, dax: finish implementing dax_sem helpers

2018-03-01 Thread Dan Williams
dax_sem_{up,down}_write_sem() allow the ext2 dax semaphore to be compiled out in the CONFIG_FS_DAX=n case. However there are still some open coded uses of the semaphore. Add dax_sem_{up_read,down_read}() and dax_sem_assert_held() helpers. Use them to convert all open-coded usages of the semaphore t

[PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes

2018-03-01 Thread Dan Williams
Changes since v4 [1]: * Fix the changelog of "dax: introduce IS_DEVDAX() and IS_FSDAX()" to better clarify the need for new helpers (Jan) * Replace dax_sem_is_locked() with dax_sem_assert_held() (Jan) * Use file_inode() in vma_is_dax() (Jan) * Resend the full series to linux-xfs@ (Dave) * Collect

[PATCH v5 01/12] dax: fix vma_is_fsdax() helper

2018-03-01 Thread Dan Williams
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on device-dax instances when those are meant to be explicitly allowed. Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm") Cc: Reported-by: Gerd Rausch

[PATCH v5 02/12] dax: introduce IS_DEVDAX() and IS_FSDAX()

2018-03-01 Thread Dan Williams
The current IS_DAX() helper that checks if a file is in DAX mode serves two purposes. It is a control flow branch condition for DAX vs non-DAX paths and it is a mechanism to perform dead code elimination. The dead code elimination is required in the CONFIG_FS_DAX=n case since there are symbols in f

[PATCH v3 1/3] mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize()

2018-03-01 Thread Dan Williams
The current powerpc definition of vma_mmu_pagesize() open codes looking up the page size via hstate. It is identical to the generic vma_kernel_pagesize() implementation. Now, vma_kernel_pagesize() is growing support for determining the page size of Device-DAX vmas in addition to the existing Huget

[PATCH v3 0/3] mm, smaps: MMUPageSize for device-dax

2018-03-01 Thread Dan Williams
Changes since v2: * Split the fix of the definition vma_mmu_pagesize() on powerpc to its own patch. [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014101.html --- Andrew, Similar to commit 31383c6865a5 "mm, hugetlbfs: introduce ->split() to vm_operations_struct" here is anothe

[PATCH v3 2/3] mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct

2018-03-01 Thread Dan Williams
When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and report the MMU page mapping size that is being enforced by the vma. Similar to commit 31383c6865a5 "mm, hugetlbfs: introduce ->split() to vm_operations_struct" it would be messy to teach vma_mmu_pagesize() about

[PATCH v3 3/3] device-dax: implement ->pagesize() for smaps to report MMUPageSize

2018-03-01 Thread Dan Williams
Given that device-dax is making similar page mapping size guarantees as hugetlbfs, emit the size in smaps and any other kernel path that requests the mapping size of a vma. Reported-by: Jane Chu Signed-off-by: Dan Williams --- drivers/dax/device.c | 10 ++ 1 file changed, 10 insertion

Returned mail: Data format error

2018-03-01 Thread Mail Delivery Subsystem
This Message was undeliverable due to the following reason: Your message was not delivered because the destination computer was not reachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely ther

Re: [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 05:36 PM, Dan Williams wrote: On Thu, Mar 1, 2018 at 4:15 PM, Logan Gunthorpe wrote: On 01/03/18 10:44 AM, Bjorn Helgaas wrote: I think these two statements are out of order, since the attributes dereference pdev->p2pdma. And it looks like you set "error" unnecessarily, since

Re: [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats

2018-03-01 Thread Dan Williams
On Thu, Mar 1, 2018 at 4:15 PM, Logan Gunthorpe wrote: > > > On 01/03/18 10:44 AM, Bjorn Helgaas wrote: >> >> I think these two statements are out of order, since the attributes >> dereference pdev->p2pdma. And it looks like you set "error" >> unnecessarily, since you return immediately looking a

Re: [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 10:44 AM, Bjorn Helgaas wrote: I think these two statements are out of order, since the attributes dereference pdev->p2pdma. And it looks like you set "error" unnecessarily, since you return immediately looking at it. Per the previous series, sysfs_create_group is must_check for

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 04:57 PM, Stephen Bates wrote: We don't want to lump these all together without knowing which region you're allocating from, right? In all seriousness I do agree with you on these Keith in the long term. We would consider adding property flags for the memory as it is added to t

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 04:15 PM, Bjorn Helgaas wrote: The question is what the relevant switch is. We call pci_enable_acs() on every PCI device, including Root Ports. It looks like this relies on get_upstream_bridge_port() to filter out some things. I don't think get_upstream_bridge_port() is doing the

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> We don't want to lump these all together without knowing which region you're > allocating from, right? In all seriousness I do agree with you on these Keith in the long term. We would consider adding property flags for the memory as it is added to the p2p core and then the allocator could evo

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 04:26 PM, Benjamin Herrenschmidt wrote: The big problem is not the vmemmap, it's the linear mapping. Ah, yes, ok. Logan ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> There's a meaningful difference between writing to an NVMe CMB vs PMR When the PMR spec becomes public we can discuss how best to integrate it into the P2P framework (if at all) ;-). Stephen ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 04:49 PM, Keith Busch wrote: On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote: P2P is about offloading the memory and PCI subsystem of the host CPU and this is achieved no matter which p2p_dev is used. Even within a device, memory attributes for its various regions

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Keith Busch
On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote: > > P2P is about offloading the memory and PCI subsystem of the host CPU > and this is achieved no matter which p2p_dev is used. Even within a device, memory attributes for its various regions may not be the same. There's a meaningfu

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
On Thu, Mar 01, 2018 at 11:14:46PM +, Stephen Bates wrote: > > I'm pretty sure the spec disallows routing-to-self so doing a P2P > > transaction in that sense isn't going to work unless the device > > specifically supports it and intercepts the traffic before it gets to > > the port. > > T

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> No, locality matters. If you have a bunch of NICs and bunch of drives > and the allocator chooses to put all P2P memory on a single drive your > performance will suck horribly even if all the traffic is offloaded. Sagi brought this up earlier in his comments about the _find_ function.

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 04:20 PM, Jason Gunthorpe wrote: On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote: No, locality matters. If you have a bunch of NICs and bunch of drives and the allocator chooses to put all P2P memory on a single drive your performance will suck horribly even if all

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 16:19 -0700, Logan Gunthorpe wrote: (Switching back to my non-IBM address ...) > On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote: > > We use only 52 in practice but yes. > > > > > That's 64PB. If you use need > > > a sparse vmemmap for the entire space it will take 16T

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 16:19 -0700, Logan Gunthorpe wrote: > > On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote: > > We use only 52 in practice but yes. > > > > > That's 64PB. If you use need > > > a sparse vmemmap for the entire space it will take 16TB which leaves you > > > with 63.98PB of a

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote: We use only 52 in practice but yes. That's 64PB. If you use need a sparse vmemmap for the entire space it will take 16TB which leaves you with 63.98PB of address space left. (Similar calculations for other numbers of address bits.) We on

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Bjorn Helgaas
On Thu, Mar 01, 2018 at 06:54:01PM +, Stephen Bates wrote: > Thanks for the detailed review Bjorn! > > >> +Enabling this option will also disable ACS on all ports behind > >> +any PCIe switch. This effictively puts all devices behind any > >> +switch into the same IOMMU group. >

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Stephen Bates
> I'm pretty sure the spec disallows routing-to-self so doing a P2P > transaction in that sense isn't going to work unless the device > specifically supports it and intercepts the traffic before it gets to > the port. This is correct. Unless the device intercepts the TLP before it hits the roo

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Logan Gunthorpe
I don't think this is correct. A Root Port defines a hierarchy domain (I'm looking at PCIe r4.0, sec 1.3.1). The capability to route peer-to-peer transactions *between* hierarchy domains is optional. I think this means a Root Complex is not required to route transactions from one Root Port t

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
On Thu, Mar 01, 2018 at 11:55:51AM -0700, Logan Gunthorpe wrote: > Hi Bjorn, > > Thanks for the review. I'll correct all the nits for the next version. > > On 01/03/18 10:37 AM, Bjorn Helgaas wrote: > > On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote: > > > Some PCI devices may ha

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
>> We'd prefer to have a generic way to get p2pmem instead of restricting >> ourselves to only using CMBs. We did work in the past where the P2P memory >> was part of an IB adapter and not the NVMe card. So this won't work if it's >> an NVMe only interface. > It just seems like it it makin

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 14:57 -0700, Logan Gunthorpe wrote: > > On 01/03/18 02:45 PM, Logan Gunthorpe wrote: > > It handles it fine for many situations. But when you try to map > > something that is at the end of the physical address space then the > > spares-vmemmap needs virtual address space th

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 03:45 PM, Jason Gunthorpe wrote: I can appreciate you might have some special use case for that, but it absolutely should require special configuration and not just magically happen. Well if driver doesn't want someone doing p2p transfers with the memory it shouldn't publish it t

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 14:31 -0800, Linus Torvalds wrote: > On Thu, Mar 1, 2018 at 2:06 PM, Benjamin Herrenschmidt > wrote: > > > > Could be that x86 has the smarts to do the right thing, still trying to > > untangle the code :-) > > Afaik, x86 will not cache PCI unless the system is misconfigur

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Linus Torvalds
On Thu, Mar 1, 2018 at 2:06 PM, Benjamin Herrenschmidt wrote: > > Could be that x86 has the smarts to do the right thing, still trying to > untangle the code :-) Afaik, x86 will not cache PCI unless the system is misconfigured, and even then it's more likely to just raise a machine check exceptio

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 13:53 -0700, Jason Gunthorpe wrote: > On Fri, Mar 02, 2018 at 07:40:15AM +1100, Benjamin Herrenschmidt wrote: > > Also we need to be able to hard block MEMREMAP_WB mappings of non-RAM > > on ppc64 (maybe via an arch hook as it might depend on the processor > > family). Server

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 02:45 PM, Logan Gunthorpe wrote: It handles it fine for many situations. But when you try to map something that is at the end of the physical address space then the spares-vmemmap needs virtual address space that's the size of the physical address space divided by PAGE_SIZE which

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 02:37 PM, Dan Williams wrote: Ah ok, I'd need to look at the details. I had been assuming that sparse-vmemmap could handle such a situation, but that could indeed be a broken assumption. It handles it fine for many situations. But when you try to map something that is at the end

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 02:35 PM, Jerome Glisse wrote: Note that they are usecase for P2P where IOMMU isolation matter and the traffic through root complex isn't see as an issue. Well, we can worry about that once we have a solution to the problem of knowing whether a root complex supports P2P at all.

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Stephen Bates
> The intention of HMM is to be useful for all device memory that wish > to have struct page for various reasons. Hi Jermone and thanks for your input! Understood. We have looked at HMM in the past and long term I definitely would like to consider how we can add P2P functionality to HMM for both

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Dan Williams
On Thu, Mar 1, 2018 at 12:34 PM, Benjamin Herrenschmidt wrote: > On Thu, 2018-03-01 at 11:21 -0800, Dan Williams wrote: >> On Wed, Feb 28, 2018 at 7:56 PM, Benjamin Herrenschmidt >> wrote: >> > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote: >> > > On Wed, 2018-02-28 at 16:39 -07

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Jerome Glisse
On Thu, Mar 01, 2018 at 09:32:20PM +, Stephen Bates wrote: > > your kernel provider needs to decide whether they favor device assignment > > or p2p > > Thanks Alex! The hardware requirements for P2P (switch, high performance EPs) > are such that we really only expect CONFIG_P2P_DMA to be en

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Stephen Bates
> your kernel provider needs to decide whether they favor device assignment or > p2p Thanks Alex! The hardware requirements for P2P (switch, high performance EPs) are such that we really only expect CONFIG_P2P_DMA to be enabled in specific instances and in those instances the users have made a

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 02:21 PM, Alex Williamson wrote: This is still a pretty terrible solution though, your kernel provider needs to decide whether they favor device assignment or p2p, because we can't do both, unless there's a patch I haven't seen yet that allows boot time rather than compile time conf

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Jerome Glisse
On Thu, Mar 01, 2018 at 02:15:01PM -0700, Logan Gunthorpe wrote: > > > On 01/03/18 02:10 PM, Jerome Glisse wrote: > > It seems people miss-understand HMM :( you do not have to use all of > > its features. If all you care about is having struct page then just > > use that for instance in your case

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 02:18 PM, Jerome Glisse wrote: This is pretty easy to do with HMM: unsigned long hmm_page_to_phys_pfn(struct page *page) This is not useful unless you want to go through all the kernel paths we are using and replace page_to_phys() and friends with something else that calls an HMM

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Alex Williamson
On Thu, 1 Mar 2018 18:54:01 + "Stephen Bates" wrote: > Thanks for the detailed review Bjorn! > > >> > >> +Enabling this option will also disable ACS on all ports behind > >> +any PCIe switch. This effictively puts all devices behind any > >> +switch into the same IOMMU group.

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Jerome Glisse
On Thu, Mar 01, 2018 at 02:11:34PM -0700, Logan Gunthorpe wrote: > > > On 01/03/18 02:03 PM, Benjamin Herrenschmidt wrote: > > However, what happens if anything calls page_address() on them ? Some > > DMA ops do that for example, or some devices might ... > > Although we could probably work arou

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 02:10 PM, Jerome Glisse wrote: It seems people miss-understand HMM :( you do not have to use all of its features. If all you care about is having struct page then just use that for instance in your case only use those following 3 functions: hmm_devmem_add() or hmm_devmem_add_resour

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 02:03 PM, Benjamin Herrenschmidt wrote: However, what happens if anything calls page_address() on them ? Some DMA ops do that for example, or some devices might ... Although we could probably work around it with some pain, we rely on page_address() and virt_to_phys(), etc to work

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Jerome Glisse
On Thu, Mar 01, 2018 at 02:03:26PM -0700, Logan Gunthorpe wrote: > > > On 01/03/18 01:55 PM, Jerome Glisse wrote: > > Well this again a new user of struct page for device memory just for > > one usecase. I wanted HMM to be more versatile so that it could be use > > for this kind of thing too. I g

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 11:21 -0800, Dan Williams wrote: > > > The devm_memremap_pages() infrastructure allows placing the memmap in > "System-RAM" even if the hotplugged range is in PCI space. So, even if > it is an issue on some configurations, it's just a simple adjustment > to where the memmap

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 01:55 PM, Jerome Glisse wrote: Well this again a new user of struct page for device memory just for one usecase. I wanted HMM to be more versatile so that it could be use for this kind of thing too. I guess the message didn't go through. I will take some cycles tomorrow to look into

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 01:53 PM, Jason Gunthorpe wrote: On Fri, Mar 02, 2018 at 07:40:15AM +1100, Benjamin Herrenschmidt wrote: Also we need to be able to hard block MEMREMAP_WB mappings of non-RAM on ppc64 (maybe via an arch hook as it might depend on the processor family). Server powerpc cannot do cach

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Jerome Glisse
On Fri, Mar 02, 2018 at 07:29:55AM +1100, Benjamin Herrenschmidt wrote: > On Thu, 2018-03-01 at 11:04 -0700, Logan Gunthorpe wrote: > > > > On 28/02/18 08:56 PM, Benjamin Herrenschmidt wrote: > > > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote: > > > > The problem is that acccord

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 01:29 PM, Benjamin Herrenschmidt wrote: Oliver can you look into this ? You sais the memory was effectively hotplug'ed into the system when creating the struct pages. That would mean to me that it's a) mapped (which for us is cachable, maybe x86 has tricks to avoid that) and b) pote

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Fri, 2018-03-02 at 07:34 +1100, Benjamin Herrenschmidt wrote: > > But what happens with that PCI memory ? Is it effectively turned into > nromal memory (ie, usable for normal allocations, potentially used to > populate user pages etc...) or is it kept aside ? (What I mean is is it added to the

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 11:21 -0800, Dan Williams wrote: > On Wed, Feb 28, 2018 at 7:56 PM, Benjamin Herrenschmidt > wrote: > > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote: > > > On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote: > > > > Hi Everyone, > > > > > > > > > So

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 18:09 +, Stephen Bates wrote: > > > So Oliver (CC) was having issues getting any of that to work for us. > > > > > > The problem is that acccording to him (I didn't double check the latest > > > patches) you effectively hotplug the PCIe memory into the system when > > >

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Benjamin Herrenschmidt
On Thu, 2018-03-01 at 11:04 -0700, Logan Gunthorpe wrote: > > On 28/02/18 08:56 PM, Benjamin Herrenschmidt wrote: > > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote: > > > The problem is that acccording to him (I didn't double check the latest > > > patches) you effectively hotplu

Re: [PATCH v2 03/10] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 10:49 AM, Bjorn Helgaas wrote: +int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir) Same question as before about why the mixture of "pci_*" interfaces that take "struct device *" parameters. In this cas

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 03:31 AM, Sagi Grimberg wrote: * We also reject using devices that employ 'dma_virt_ops' which should    fairly simply handle Jason's concerns that this work might break with    the HFI, QIB and rxe drivers that use the virtual ops to implement    their own special DMA operations.

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 12:21 PM, Dan Williams wrote: Note: I think the above means it won't work behind a switch on x86 either, will it ? The devm_memremap_pages() infrastructure allows placing the memmap in "System-RAM" even if the hotplugged range is in PCI space. So, even if it is an issue on some co

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 11:42 AM, Jason Gunthorpe wrote: On Thu, Mar 01, 2018 at 08:35:55PM +0200, Sagi Grimberg wrote: This is also why I don't entirely understand why this series has a generic allocator for p2p mem, it makes little sense to me. Why wouldn't the nmve driver just claim the entire CMB of

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Dan Williams
On Wed, Feb 28, 2018 at 7:56 PM, Benjamin Herrenschmidt wrote: > On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote: >> On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote: >> > Hi Everyone, >> >> >> So Oliver (CC) was having issues getting any of that to work for us. >> >> The p

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 11:02 AM, Bjorn Helgaas wrote: void pci_enable_acs(struct pci_dev *dev) { + if (pci_p2pdma_disable_acs(dev)) + return; This doesn't read naturally to me. I do see that when CONFIG_PCI_P2PDMA is not set, pci_p2pdma_disable_acs() does nothing and returns 0,

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Logan Gunthorpe
Wouldn't it all be simpler if the p2p_dev resolution would be private to the namespace? So is adding some all the namespaces in a subsystem must comply to using p2p? Seems a little bit harsh if its not absolutely needed. Would be nice to export a subsystems between two ports (on two HCAs, acros

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> I agree, I don't think this series should target anything other than > using p2p memory located in one of the devices expected to participate > in the p2p trasnaction for a first pass.. I disagree. There is definitely interest in using a NVMe CMB as a bounce buffer and in deploying systems

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Logan Gunthorpe
Hi Bjorn, Thanks for the review. I'll correct all the nits for the next version. On 01/03/18 10:37 AM, Bjorn Helgaas wrote: On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote: Some PCI devices may have memory mapped in a BAR space that's intended for use in Peer-to-Peer transactio

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Stephen Bates
Thanks for the detailed review Bjorn! >> >> + Enabling this option will also disable ACS on all ports behind >> + any PCIe switch. This effictively puts all devices behind any >> + switch into the same IOMMU group. > > Does this really mean "all devices behind the same Root Port

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Sagi Grimberg
On 01/03/18 04:03 AM, Sagi Grimberg wrote: Can you describe what would be the plan to have it when these devices do come along? I'd say that p2p_dev needs to become a nvmet_ns reference and not from nvmet_ctrl. Then, when cmb capable devices come along, the ns can prefer to use its own cmb inst

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Stephen Bates
>> So Oliver (CC) was having issues getting any of that to work for us. >> >> The problem is that acccording to him (I didn't double check the latest >> patches) you effectively hotplug the PCIe memory into the system when >> creating struct pages. >> >> This cannot possibly work for us. First we

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Logan Gunthorpe
On 28/02/18 08:56 PM, Benjamin Herrenschmidt wrote: On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote: The problem is that acccording to him (I didn't double check the latest patches) you effectively hotplug the PCIe memory into the system when creating struct pages. This cannot

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Bjorn Helgaas
On Wed, Feb 28, 2018 at 04:40:00PM -0700, Logan Gunthorpe wrote: > For peer-to-peer transactions to work the downstream ports in each > switch must not have the ACS flags set. At this time there is no way > to dynamically change the flags and update the corresponding IOMMU > groups so this is done

Re: [PATCH v2 03/10] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset

2018-03-01 Thread Bjorn Helgaas
On Wed, Feb 28, 2018 at 04:39:59PM -0700, Logan Gunthorpe wrote: > The DMA address used when mapping PCI P2P memory must be the PCI bus > address. Thus, introduce pci_p2pmem_[un]map_sg() to map the correct > addresses when using P2P memory. > > For this, we assume that an SGL passed to these funct

Re: [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats

2018-03-01 Thread Bjorn Helgaas
On Wed, Feb 28, 2018 at 04:39:58PM -0700, Logan Gunthorpe wrote: > Attributes display the total amount of P2P memory, the amount available > and whether it is published or not. Can you add enough text here to make the body of the changelog complete in itself? That might mean just repeating the su

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Logan Gunthorpe
On 01/03/18 04:03 AM, Sagi Grimberg wrote: Can you describe what would be the plan to have it when these devices do come along? I'd say that p2p_dev needs to become a nvmet_ns reference and not from nvmet_ctrl. Then, when cmb capable devices come along, the ns can prefer to use its own cmb inst

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
s/peer to peer/peer-to-peer/ to match text below and in spec. On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote: > Some PCI devices may have memory mapped in a BAR space that's > intended for use in Peer-to-Peer transactions. In order to enable > such transactions the memory must be

Re: [PATCH v2 06/10] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-03-01 Thread Logan Gunthorpe
Hey Sagi, Thanks for the review! On 01/03/18 03:32 AM, Sagi Grimberg wrote:   int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,   struct scatterlist *sg, u32 sg_cnt, u32 sg_offset, -    u64 remote_addr, u32 rkey, enum dma_data_direction dir) +    u64

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> > Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would > > save an extra PCI transfer as the NVME card could just take the data > > out of it's own memory. However, at this time, cards with CMB buffers > > don't seem to be available. > Can you describe what would be the plan to

Re: [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests

2018-03-01 Thread Stephen Bates
> Any plans adding the capability to nvme-rdma? Should be > straight-forward... In theory, the use-case would be rdma backend > fabric behind. Shouldn't be hard to test either... Nice idea Sagi. Yes we have been starting to look at that. Though again we would probably want to impose the "attached

Re: [PATCH v2 05/10] block: Introduce PCI P2P flags for request and request queue

2018-03-01 Thread Sagi Grimberg
Looks fine, Reviewed-by: Sagi Grimberg ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests

2018-03-01 Thread Sagi Grimberg
For P2P requests we must use the pci_p2pmem_[un]map_sg() functions instead of the dma_map_sg functions. With that, we can then indicate PCI_P2P support in the request queue. For this, we create an NVME_F_PCI_P2P flag which tells the core to set QUEUE_FLAG_PCI_P2P in the request queue. This lo

Re: [PATCH v2 09/10] nvme-pci: Add a quirk for a pseudo CMB

2018-03-01 Thread Sagi Grimberg
Looks fine, Reviewed-by: Sagi Grimberg ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Sagi Grimberg
We create a configfs attribute in each nvme-fabrics target port to enable p2p memory use. When enabled, the port will only then use the p2p memory if a p2p memory device can be found which is behind the same switch as the RDMA port and all the block devices in use. If the user enabled it an no d

Re: [PATCH v2 06/10] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-03-01 Thread Sagi Grimberg
On 03/01/2018 01:40 AM, Logan Gunthorpe wrote: In order to use PCI P2P memory pci_p2pmem_[un]map_sg() functions must be called to map the correct DMA address. To do this, we add a flags variable and the RDMA_RW_CTX_FLAG_PCI_P2P flag. When the flag is specified use the appropriate map function.

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Sagi Grimberg
Hi Everyone, Hi Logan, Here's v2 of our series to introduce P2P based copy offload to NVMe fabrics. This version has been rebased onto v4.16-rc3 which already includes Christoph's devpagemap work the previous version was based off as well as a couple of the cleanup patches that were in v1.