RE: [PATCH v4 0/8] Support for Open-Channel SSDs

2015-06-08 Thread Stephen Bates
). Tested-by: Stephen Bates stephen.ba...@pmcs.com Cheers Stephen -Original Message- From: Matias Bjørling [mailto:m...@bjorling.me] Sent: Friday, June 5, 2015 6:54 AM To: h...@infradead.org; ax...@fb.com; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-n

Re: [PATCH 0/3] iopmem : A block device for PCIe memory

2016-11-06 Thread Stephen Bates
On Tue, October 25, 2016 3:19 pm, Dave Chinner wrote: > On Tue, Oct 25, 2016 at 05:50:43AM -0600, Stephen Bates wrote: >> >> Dave are you saying that even for local mappings of files on a DAX >> capable system it is possible for the mappings to move on you unless the &

Re: [PATCHSET v2] block: IO polling improvements

2016-11-07 Thread Stephen Bates
> Fixed a few bugs in this, and addressed some review comments. Patches > are against my 4.10 block branch, for-4.10/block. Jens Thanks for proposing this. Looks very cool. I will try and get you a review and some testing this week... Cheers Stephen

Re: [PATCH 0/3] iopmem : A block device for PCIe memory

2016-10-19 Thread Stephen Bates
> >> > >> If you're only using the block-device as a entry-point to create > >> dax-mappings then a device-dax (drivers/dax/) character-device might > >> be a better fit. > >> > > > > We chose a block device because we felt it was intuitive for users to > > carve up a memory region but putting a

Re: [PATCH 1/3] memremap.c : Add support for ZONE_DEVICE IO memory with struct pages.

2016-10-25 Thread Stephen Bates
On Wed, Oct 19, 2016 at 01:01:06PM -0700, Dan Williams wrote: > >> > >> In the cover letter, "[PATCH 0/3] iopmem : A block device for PCIe > >> memory", it mentions that the lack of I/O coherency is a known issue > >> and users of this functionality need to be cognizant of the pitfalls. > >> If

Re: [PATCH 0/3] iopmem : A block device for PCIe memory

2016-10-25 Thread Stephen Bates
Hi Dave and Christoph On Fri, Oct 21, 2016 at 10:12:53PM +1100, Dave Chinner wrote: > On Fri, Oct 21, 2016 at 02:57:14AM -0700, Christoph Hellwig wrote: > > On Fri, Oct 21, 2016 at 10:22:39AM +1100, Dave Chinner wrote: > > > You do realise that local filesystems can silently change the > > >

Re: [PATCH 0/3] iopmem : A block device for PCIe memory

2016-10-19 Thread Stephen Bates
On Tue, Oct 18, 2016 at 08:51:15PM -0700, Dan Williams wrote: > [ adding Ashok and David for potential iommu comments ] > Hi Dan Thanks for adding Ashok and David! > > I agree with the motivation and the need for a solution, but I have > some questions about this implementation. > > > > >

Re: [PATCH 1/3] memremap.c : Add support for ZONE_DEVICE IO memory with struct pages.

2016-10-19 Thread Stephen Bates
On Wed, Oct 19, 2016 at 10:50:25AM -0700, Dan Williams wrote: > On Tue, Oct 18, 2016 at 2:42 PM, Stephen Bates <sba...@raithlin.com> wrote: > > From: Logan Gunthorpe <log...@deltatee.com> > > > > We build on recent work that adds memory regions owned by a d

[PATCH 3/3] iopmem : Add documentation for iopmem driver

2016-10-18 Thread Stephen Bates
Add documentation for the iopmem PCIe device driver. Signed-off-by: Stephen Bates <sba...@raithlin.com> Signed-off-by: Logan Gunthorpe <log...@deltatee.com> --- Documentation/blockdev/00-INDEX | 2 ++ Documentation/blockdev/iopmem.txt | 62

[PATCH 1/3] memremap.c : Add support for ZONE_DEVICE IO memory with struct pages.

2016-10-18 Thread Stephen Bates
https://lists.01.org/pipermail/linux-nvdimm/2015-August/001810.html [2] https://lists.01.org/pipermail/linux-nvdimm/2015-October/002387.html Signed-off-by: Stephen Bates <sba...@raithlin.com> Signed-off-by: Logan Gunthorpe <log...@deltatee.com> --- drivers/dax/pmem.c| 4

[PATCH 0/3] iopmem : A block device for PCIe memory

2016-10-18 Thread Stephen Bates
IO memory with struct pages. Stephen Bates (2): iopmem : Add a block device driver for PCIe attached IO memory. iopmem : Add documentation for iopmem driver Documentation/blockdev/00-INDEX | 2 + Documentation/blockdev/iopmem.txt | 62 +++ MAINTAINERS | 7

[PATCH 2/3] iopmem : Add a block device driver for PCIe attached IO memory.

2016-10-18 Thread Stephen Bates
Add a new block device driver that binds to PCIe devices and turns PCIe BARs into DAX capable block devices. Signed-off-by: Stephen Bates <sba...@raithlin.com> Signed-off-by: Logan Gunthorpe <log...@deltatee.com> --- MAINTAINERS| 7 ++ drivers/block/Kconfig | 27 +

[PATCH 0/2] nvme: Improvements in sysfs entry for NVMe CMBs

2016-12-16 Thread Stephen Bates
Hi This series adds some more verbosity to the NVMe CMB sysfs entry. Jens I based this off v4.9 because for some reason your for-4.10/block is missing my original CMB commit (202021c1a63c6)? Stephen Stephen Bates (2): nvme : Use correct scnprintf in cmb show nvme: improve cmb sysfs

[PATCH 1/2] nvme : Use correct scnprintf in cmb show

2016-12-16 Thread Stephen Bates
Make sure we are using the correct scnprintf in the sysfs show function for the CMB. Signed-off-by: Stephen Bates <sba...@raithlin.com> --- drivers/nvme/host/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c

Re: [PATCH 0/2] nvme: Improvements in sysfs entry for NVMe CMBs

2016-12-16 Thread Stephen Bates
>> Jens I based this off v4.9 because for some reason your for-4.10/block >> is missing my original CMB commit (202021c1a63c6)? > > for-4.10/block was forked off v4.9-rc1, and that patch didn't make it in > until v4.9-rc2. Since for-4.10/block has been merged, any patches for this > series or next

[PATCH 2/2] nvme: improve cmb sysfs reporting

2016-12-16 Thread Stephen Bates
Add more information to the NVMe CMB sysfs entry. This includes information about the CMB size, location and capabilities. Signed-off-by: Stephen Bates <sba...@raithlin.com> --- drivers/nvme/host/pci.c | 31 +-- include/linux/nvme.h| 8 2 files c

[LSF/MM TOPIC][LSF/MM ATTEND] Enabling Peer-to-Peer DMAs between PCIe devices

2016-12-12 Thread Stephen Bates
Hi I'd like to discuss the topic of how best to enable DMAs between PCIe devices in the Linux kernel. There have been many attempts to add to the kernel the ability to DMA between two PCIe devices. However, to date, none of these have been accepted. However as PCIe devices like NICs, NVMe SSDs

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-11 Thread Stephen Bates
On Fri, January 6, 2017 4:10 pm, Logan Gunthorpe wrote: > > > On 06/01/17 11:26 AM, Jason Gunthorpe wrote: > > >> Make a generic API for all of this and you'd have my vote.. >> >> >> IMHO, you must support basic pinning semantics - that is necessary to >> support generic short lived DMA (eg

Re: [PATCH 0/2] nvme: Improvements in sysfs entry for NVMe CMBs

2017-01-09 Thread Stephen Bates
> > I have added 1/2, since that one is a no-brainer. For 2/2, not so sure. > Generally we try to avoid having sysfs file that aren't single value > output. That isn't a super hard rule, but it is preferable. > > -- > Jens Axboe > Thanks Jens and sorry for the delay (extended vacation). Thanks

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Stephen Bates
>> >> The NVMe fabrics stuff could probably make use of this. It's an >> in-kernel system to allow remote access to an NVMe device over RDMA. So >> they ought to be able to optimize their transfers by DMAing directly to >> the NVMe's CMB -- no userspace interface would be required but there >>

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Stephen Bates
Hi All This has been a great thread (thanks to Alex for kicking it off) and I wanted to jump in and maybe try and put some summary around the discussion. I also wanted to propose we include this as a topic for LFS/MM because I think we need more discussion on the best way to add this

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Stephen Bates
>>> I've already recommended that iopmem not be a block device and >>> instead be a device-dax instance. I also don't think it should claim >>> the PCI ID, rather the driver that wants to map one of its bars this >>> way can register the memory region with the device-dax core. >>> >>> I'm not sure

Re: [PATCH 2/2] nvme: improve cmb sysfs reporting

2017-01-09 Thread Stephen Bates
> Minor nit below > > >> + >> +for (i = NVME_CMB_CAP_SQS; i <= NVME_CMB_CAP_WDS; i++) >> > I'd prefer seeing (i = 0; i < ARRAY_SIZE(..); i++) because it provides > automatic bounds checking against future code. > Thanks Jon, I will take a look at doing this in a V1. Stephen

Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem

2017-04-07 Thread Stephen Bates
On 2017-04-06, 6:33 AM, "Sagi Grimberg" wrote: > Say it's connected via 2 legs, the bar is accessed from leg A and the > data from the disk comes via leg B. In this case, the data is heading > towards the p2p device via leg B (might be congested), the completion > goes directly

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-20 Thread Stephen Bates
> Yes, this makes sense I think we really just want to distinguish host > memory or not in terms of the dev_pagemap type. I would like to see mutually exclusive flags for host memory (or not) and persistence (or not). Stephen

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-20 Thread Stephen Bates
>> Yes, this makes sense I think we really just want to distinguish host >> memory or not in terms of the dev_pagemap type. > >> I would like to see mutually exclusive flags for host memory (or not) and >> persistence (or not). >> > > Why persistence? It has zero meaning to the mm. I like the

Kernel Oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000050; IP is at blk_mq_poll+0xa0/0x2e0

2017-04-16 Thread Stephen Bates
Hi All As part of my testing of IO polling [1] I am seeing a NULL pointer dereference oops that seems to have been introduced in the preparation for 4.11. The kernel oops output is below and this seems to be due to blk_mq_tag_to_rq returning NULL in blk_mq_poll in blk-mq.c. I have not had a

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-25 Thread Stephen Bates
> My first reflex when reading this thread was to think that this whole domain > lends it self excellently to testing via Qemu. Could it be that doing this in > the opposite direction might be a safer approach in the long run even though > (significant) more work up-front? While the idea of

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-25 Thread Stephen Bates
>> Yes, that's why I used 'significant'. One good thing is that given resources >> it can easily be done in parallel with other development, and will give >> additional >> insight of some form. > >Yup, well if someone wants to start working on an emulated RDMA device >that actually simulates

Re: [PATCH 4/7] alpha: provide ioread64 and iowrite64 implementations

2017-06-22 Thread Stephen Bates
> +#define iowrite64be(v,p) iowrite32(cpu_to_be64(v), (p)) Logan, thanks for taking this cleanup on. I think this should be iowrite64 not iowrite32? Stephen

Re: [PATCH] blk-mq: Improvements to the hybrid polling sleep time calculation

2017-08-29 Thread Stephen Bates
>> From: Stephen Bates <sba...@raithlin.com> >> >> Hybrid polling currently uses half the average completion time as an >> estimate of how long to poll for. We can improve upon this by noting >> that polling before the minimum completion time makes no sense. A

Re: [PATCH] genalloc: Make the avail variable an atomic64_t

2017-10-25 Thread Stephen Bates
> I found that genalloc is very slow for large chunk sizes because > bitmap_find_next_zero_area has to grind through that entire bitmap. > Hence, I recommend avoiding genalloc for large chunk sizes. Thanks for the feedback Daniel! We have been doing 16GiB without any noticeable issues. > I'm

Re: [PATCH v2] genalloc: Make the avail variable an atomic_long_t

2017-10-29 Thread Stephen Bates
> Do we still need #include ? For me, it compiles without it. Yes we do. Kbuild reported a failure when I tried omitting it (arm-multi_v7_defconfig). > Reviewed-by: Daniel Mentz danielme...@google.com Thanks for the review Andrew can you look at picking this up or do you want me to respin

Re: [PATCH] pci: Add a acs_disable option for pci kernel parameter

2017-10-29 Thread Stephen Bates
>> This patch adds a new boot option to the pci kernel parameter called >> "acs_disable" that will disable ACS. This is useful for PCI peer to >> peer communication but can cause problems when IOVA isolation is >> required and an IOMMU is enabled. Use with care. > Eww. Thanks for the feedback

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-08 Thread Stephen Bates
Hi Don >Well, p2p DMA is a function of a cooperating 'agent' somewhere above the two >devices. >That agent should 'request' to the kernel that ACS be removed/circumvented > (p2p enabled) btwn two endpoints. >I recommend doing so via a sysfs method. Yes we looked at something like this

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-08 Thread Stephen Bates
Hi Jerome >I think there is confusion here, Alex properly explained the scheme > PCIE-device do a ATS request to the IOMMU which returns a valid >translation for a virtual address. Device can then use that address >directly without going through IOMMU for translation. This makes

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-08 Thread Stephen Bates
>Yeah, so based on the discussion I'm leaning toward just having a >command line option that takes a list of BDFs and disables ACS for them. >(Essentially as Dan has suggested.) This avoids the shotgun. I concur that this seems to be where the conversation is taking us. @Alex -

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-08 Thread Stephen Bates
Hi Alex >But it would be a much easier proposal to disable ACS when the IOMMU is >not enabled, ACS has no real purpose in that case. I guess one issue I have with this is that it disables IOMMU groups for all Root Ports and not just the one(s) we wish to do p2pdma on. >The

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-09 Thread Stephen Bates
Christian >Interesting point, give me a moment to check that. That finally makes >all the hardware I have standing around here valuable :) Yes. At the very least it provides an initial standards based path for P2P DMAs across RPs which is something we have discussed on this list in

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-09 Thread Stephen Bates
Hi Don >RDMA VFs lend themselves to NVMEoF w/device-assignment need a way to >put NVME 'resources' into an assignable/manageable object for > 'IOMMU-grouping', >which is really a 'DMA security domain' and less an 'IOMMU grouping > domain'. Ha, I like your term "DMA Security

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-09 Thread Stephen Bates
Hi Logan >Yeah, I'm having a hard time coming up with an easy enough solution for >the user. I agree with Dan though, the bus renumbering risk would be >fairly low in the custom hardware seeing the switches are likely going >to be directly soldered to the same board with the CPU.

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-09 Thread Stephen Bates
Jerome and Christian > I think there is confusion here, Alex properly explained the scheme > PCIE-device do a ATS request to the IOMMU which returns a valid > translation for a virtual address. Device can then use that address > directly without going through IOMMU for translation. So I went

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-09 Thread Stephen Bates
Hi Alex and Don >Correct, the VM has no concept of the host's IOMMU groups, only the > hypervisor knows about the groups, But as I understand it these groups are usually passed through to VMs on a pre-group basis by the hypervisor? So IOMMU group 1 might be passed to VM A and IOMMU

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-09 Thread Stephen Bates
Hi Jerome > Now inside that page table you can point GPU virtual address > to use GPU memory or use system memory. Those system memory entry can > also be mark as ATS against a given PASID. Thanks. This all makes sense. But do you have examples of this in a kernel driver (if so can you

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-10 Thread Stephen Bates
Hi Christian > Why would a switch not identify that as a peer address? We use the PASID >together with ATS to identify the address space which a transaction >should use. I think you are conflating two types of TLPs here. If the device supports ATS then it will issue a TR TLP to obtain

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-10 Thread Stephen Bates
Hi Jerome > As it is tie to PASID this is done using IOMMU so looks for caller > of amd_iommu_bind_pasid() or intel_svm_bind_mm() in GPU the existing > user is the AMD GPU driver see: Ah thanks. This cleared things up for me. A quick search shows there are still no users of

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-10 Thread Stephen Bates
> Not to me. In the p2pdma code we specifically program DMA engines with > the PCI bus address. Ah yes of course. Brain fart on my part. We are not programming the P2PDMA initiator with an IOVA but with the PCI bus address... > So regardless of whether we are using the IOMMU or > not, the

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-10 Thread Stephen Bates
Hi Jerome >Note on GPU we do would not rely on ATS for peer to peer. Some part >of the GPU (DMA engines) do not necessarily support ATS. Yet those >are the part likely to be use in peer to peer. OK this is good to know. I agree the DMA engine is probably one of the GPU components

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-10 Thread Stephen Bates
Hi Jerome >Hopes this helps understanding the big picture. I over simplify thing and >devils is in the details. This was a great primer thanks for putting it together. An LWN.net article perhaps ;-)?? Stephen

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-11 Thread Stephen Bates
>I find this hard to believe. There's always the possibility that some >part of the system doesn't support ACS so if the PCI bus addresses and >IOVA overlap there's a good chance that P2P and ATS won't work at all on >some hardware. I tend to agree but this comes down to how

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-11 Thread Stephen Bates
All > Alex (or anyone else) can you point to where IOVA addresses are generated? A case of RTFM perhaps (though a pointer to the code would still be appreciated). https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt Some exceptions to IOVA --- Interrupt ranges are not

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-08 Thread Stephen Bates
Hi Christian > AMD APUs mandatory need the ACS flag set for the GPU integrated in the > CPU when IOMMU is enabled or otherwise you will break SVM. OK but in this case aren't you losing (many of) the benefits of P2P since all DMAs will now get routed up to the IOMMU before being passed

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-08 Thread Stephen Bates
Hi Dan >It seems unwieldy that this is a compile time option and not a runtime >option. Can't we have a kernel command line option to opt-in to this >behavior rather than require a wholly separate kernel image? I think because of the security implications associated with p2pdma and

Re: [PATCH] genalloc: Make the avail variable an atomic64_t

2017-10-26 Thread Stephen Bates
> We have atomic_long_t for that. Please use it instead. It will be > 64-bit on 64-bit archs, and 32-bit on 32-bit archs, which seems to > fit your purpose here. Thanks you Mathieu! Yes atomic_long_t looks perfect for this and addresses Daniel’s concerns for 32 bit systems. I’ll prepare a v2

Re: [PATCH] nvme-pci: Fix incorrect use of CMB size to calculate q_depth

2018-02-06 Thread Stephen Bates
> On Feb 6, 2018, at 8:02 AM, Keith Busch wrote: > >> On Mon, Feb 05, 2018 at 03:32:23PM -0700, sba...@raithlin.com wrote: >> >> -if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) { >> +if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) { > > Is this a

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> > Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would > > save an extra PCI transfer as the NVME card could just take the data > > out of it's own memory. However, at this time, cards with CMB buffers > > don't seem to be available. > Can you describe what would be the plan

Re: [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests

2018-03-01 Thread Stephen Bates
> Any plans adding the capability to nvme-rdma? Should be > straight-forward... In theory, the use-case would be rdma backend > fabric behind. Shouldn't be hard to test either... Nice idea Sagi. Yes we have been starting to look at that. Though again we would probably want to impose the

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Stephen Bates
Hi Sinan >If hardware doesn't support it, blacklisting should have been the right >path and I still think that you should remove all switch business from the > code. >I did not hear enough justification for having a switch requirement >for P2P. We disagree. As does the

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Stephen Bates
>> It sounds like you have very tight hardware expectations for this to work >> at this moment. You also don't want to generalize this code for others and >> address the shortcomings. > No, that's the way the community has pushed this work Hi Sinan Thanks for all the input. As Logan has pointed

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-14 Thread Stephen Bates
>I assume you want to exclude Root Ports because of multi-function > devices and the "route to self" error. I was hoping for a reference > to that so I could learn more about it. Apologies Bjorn. This slipped through my net. I will try and get you a reference for RTS in the next couple of

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-14 Thread Stephen Bates
> P2P over PCI/PCI-X is quite common in devices like raid controllers. Hi Dan Do you mean between PCIe devices below the RAID controller? Isn't it pretty novel to be able to support PCIe EPs below a RAID controller (as opposed to SCSI based devices)? > It would be useful if those

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-04-13 Thread Stephen Bates
> I'll see if I can get our PCI SIG people to follow this through Hi Jonathan Can you let me know if this moves forward within PCI-SIG? I would like to track it. I can see this being doable between Root Ports that reside in the same Root Complex but might become more challenging to

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-24 Thread Stephen Bates
> That would be very nice but many devices do not support the internal > route. But Logan in the NVMe case we are discussing movement within a single function (i.e. from a NVMe namespace to a NVMe CMB on the same function). Bjorn is discussing movement between two functions (PFs or VFs) in the

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-22 Thread Stephen Bates
> I've seen the response that peers directly below a Root Port could not > DMA to each other through the Root Port because of the "route to self" > issue, and I'm not disputing that. Bjorn You asked me for a reference to RTS in the PCIe specification. As luck would have it I ended up in an

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
>> We'd prefer to have a generic way to get p2pmem instead of restricting >> ourselves to only using CMBs. We did work in the past where the P2P memory >> was part of an IB adapter and not the NVMe card. So this won't work if it's >> an NVMe only interface. > It just seems like it it

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Stephen Bates
> your kernel provider needs to decide whether they favor device assignment or > p2p Thanks Alex! The hardware requirements for P2P (switch, high performance EPs) are such that we really only expect CONFIG_P2P_DMA to be enabled in specific instances and in those instances the users have made a

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Stephen Bates
> I'm pretty sure the spec disallows routing-to-self so doing a P2P > transaction in that sense isn't going to work unless the device > specifically supports it and intercepts the traffic before it gets to > the port. This is correct. Unless the device intercepts the TLP before it hits the

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Stephen Bates
> The intention of HMM is to be useful for all device memory that wish > to have struct page for various reasons. Hi Jermone and thanks for your input! Understood. We have looked at HMM in the past and long term I definitely would like to consider how we can add P2P functionality to HMM for

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-05 Thread Stephen Bates
>Yes i need to document that some more in hmm.txt... Hi Jermone, thanks for the explanation. Can I suggest you update hmm.txt with what you sent out? > I am about to send RFC for nouveau, i am still working out some bugs. Great. I will keep an eye out for it. An example user of hmm will

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> I agree, I don't think this series should target anything other than > using p2p memory located in one of the devices expected to participate > in the p2p trasnaction for a first pass.. I disagree. There is definitely interest in using a NVMe CMB as a bounce buffer and in deploying

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Stephen Bates
Thanks for the detailed review Bjorn! >> >> + Enabling this option will also disable ACS on all ports behind >> + any PCIe switch. This effictively puts all devices behind any >> + switch into the same IOMMU group. > > Does this really mean "all devices behind the same Root

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Stephen Bates
>> So Oliver (CC) was having issues getting any of that to work for us. >> >> The problem is that acccording to him (I didn't double check the latest >> patches) you effectively hotplug the PCIe memory into the system when >> creating struct pages. >> >> This cannot possibly work for us. First

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> We don't want to lump these all together without knowing which region you're > allocating from, right? In all seriousness I do agree with you on these Keith in the long term. We would consider adding property flags for the memory as it is added to the p2p core and then the allocator could

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> There's a meaningful difference between writing to an NVMe CMB vs PMR When the PMR spec becomes public we can discuss how best to integrate it into the P2P framework (if at all) ;-). Stephen

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> No, locality matters. If you have a bunch of NICs and bunch of drives > and the allocator chooses to put all P2P memory on a single drive your > performance will suck horribly even if all the traffic is offloaded. Sagi brought this up earlier in his comments about the _find_ function.

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-02 Thread Stephen Bates
>http://nvmexpress.org/wp-content/uploads/NVM-Express-1.3-Ratified-TPs.zip @Keith - my apologies. @Christoph - thanks for the link So my understanding of when the technical content surrounding new NVMe Technical Proposals (TPs) was wrong. I though the TP content could only be discussed

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-02 Thread Stephen Bates
> It seems people miss-understand HMM :( Hi Jerome Your unhappy face emoticon made me sad so I went off to (re)read up on HMM. Along the way I came up with a couple of things. While hmm.txt is really nice to read it makes no mention of DEVICE_PRIVATE and DEVICE_PUBLIC. It also gives no

Re: lib/genalloc

2018-11-01 Thread Stephen Bates
>I use gen_pool_first_fit_align() as pool allocation algorithm allocating >buffers with requested alignment. But if a chunk base address is not >aligned to the requested alignment(from some reason), the returned >address is not aligned too. Alexey Can you try using

Re: Linux RDMA mini-conf at Plumbers 2018

2018-10-01 Thread Stephen Bates
Hi Jason and Leon > This year we expect to have close to a day set aside for RDMA related > topics. Including up to half a day for the thorny general kernel issues > related to get_user_pages(), particularly as exasperated by RDMA. Looks like a great set of topics. > RDMA and PCI peer

Re: [PATCH 5/5] RISC-V: Implement sparsemem

2018-10-11 Thread Stephen Bates
Palmer > I don't really know anything about this, but you're welcome to add a > >Reviewed-by: Palmer Dabbelt Thanks. I think it would be good to get someone who's familiar with linux/mm to take a look. > if you think it'll help. I'm assuming you're targeting a different tree for

Re: [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests

2018-03-01 Thread Stephen Bates
> Any plans adding the capability to nvme-rdma? Should be > straight-forward... In theory, the use-case would be rdma backend > fabric behind. Shouldn't be hard to test either... Nice idea Sagi. Yes we have been starting to look at that. Though again we would probably want to impose the

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> > Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would > > save an extra PCI transfer as the NVME card could just take the data > > out of it's own memory. However, at this time, cards with CMB buffers > > don't seem to be available. > Can you describe what would be the plan

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Stephen Bates
>> So Oliver (CC) was having issues getting any of that to work for us. >> >> The problem is that acccording to him (I didn't double check the latest >> patches) you effectively hotplug the PCIe memory into the system when >> creating struct pages. >> >> This cannot possibly work for us. First

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Stephen Bates
Thanks for the detailed review Bjorn! >> >> + Enabling this option will also disable ACS on all ports behind >> + any PCIe switch. This effictively puts all devices behind any >> + switch into the same IOMMU group. > > Does this really mean "all devices behind the same Root

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> I agree, I don't think this series should target anything other than > using p2p memory located in one of the devices expected to participate > in the p2p trasnaction for a first pass.. I disagree. There is definitely interest in using a NVMe CMB as a bounce buffer and in deploying

Re: [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-03-01 Thread Stephen Bates
> your kernel provider needs to decide whether they favor device assignment or > p2p Thanks Alex! The hardware requirements for P2P (switch, high performance EPs) are such that we really only expect CONFIG_P2P_DMA to be enabled in specific instances and in those instances the users have made a

Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory

2018-03-01 Thread Stephen Bates
> The intention of HMM is to be useful for all device memory that wish > to have struct page for various reasons. Hi Jermone and thanks for your input! Understood. We have looked at HMM in the past and long term I definitely would like to consider how we can add P2P functionality to HMM for

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
>> We'd prefer to have a generic way to get p2pmem instead of restricting >> ourselves to only using CMBs. We did work in the past where the P2P memory >> was part of an IB adapter and not the NVMe card. So this won't work if it's >> an NVMe only interface. > It just seems like it it

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Stephen Bates
> I'm pretty sure the spec disallows routing-to-self so doing a P2P > transaction in that sense isn't going to work unless the device > specifically supports it and intercepts the traffic before it gets to > the port. This is correct. Unless the device intercepts the TLP before it hits the

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> No, locality matters. If you have a bunch of NICs and bunch of drives > and the allocator chooses to put all P2P memory on a single drive your > performance will suck horribly even if all the traffic is offloaded. Sagi brought this up earlier in his comments about the _find_ function.

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> There's a meaningful difference between writing to an NVMe CMB vs PMR When the PMR spec becomes public we can discuss how best to integrate it into the P2P framework (if at all) ;-). Stephen

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Stephen Bates
> We don't want to lump these all together without knowing which region you're > allocating from, right? In all seriousness I do agree with you on these Keith in the long term. We would consider adding property flags for the memory as it is added to the p2p core and then the allocator could

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Stephen Bates
>> It sounds like you have very tight hardware expectations for this to work >> at this moment. You also don't want to generalize this code for others and >> address the shortcomings. > No, that's the way the community has pushed this work Hi Sinan Thanks for all the input. As Logan has pointed

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Stephen Bates
Hi Sinan >If hardware doesn't support it, blacklisting should have been the right >path and I still think that you should remove all switch business from the > code. >I did not hear enough justification for having a switch requirement >for P2P. We disagree. As does the

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-14 Thread Stephen Bates
>I assume you want to exclude Root Ports because of multi-function > devices and the "route to self" error. I was hoping for a reference > to that so I could learn more about it. Apologies Bjorn. This slipped through my net. I will try and get you a reference for RTS in the next couple of

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-14 Thread Stephen Bates
> P2P over PCI/PCI-X is quite common in devices like raid controllers. Hi Dan Do you mean between PCIe devices below the RAID controller? Isn't it pretty novel to be able to support PCIe EPs below a RAID controller (as opposed to SCSI based devices)? > It would be useful if those

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-22 Thread Stephen Bates
> I've seen the response that peers directly below a Root Port could not > DMA to each other through the Root Port because of the "route to self" > issue, and I'm not disputing that. Bjorn You asked me for a reference to RTS in the PCIe specification. As luck would have it I ended up in an

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-24 Thread Stephen Bates
> That would be very nice but many devices do not support the internal > route. But Logan in the NVMe case we are discussing movement within a single function (i.e. from a NVMe namespace to a NVMe CMB on the same function). Bjorn is discussing movement between two functions (PFs or VFs) in the

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-10 Thread Stephen Bates
Hi Christian > Why would a switch not identify that as a peer address? We use the PASID >together with ATS to identify the address space which a transaction >should use. I think you are conflating two types of TLPs here. If the device supports ATS then it will issue a TR TLP to obtain

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-10 Thread Stephen Bates
Hi Jerome > As it is tie to PASID this is done using IOMMU so looks for caller > of amd_iommu_bind_pasid() or intel_svm_bind_mm() in GPU the existing > user is the AMD GPU driver see: Ah thanks. This cleared things up for me. A quick search shows there are still no users of

  1   2   >