Re: Enabling peer to peer device transactions for PCIe devices

2017-10-26 Thread Petrosyan, Ludwig
;, "Paul Blinzer" <paul.blin...@amd.com>, > "Christian Koenig" <christian.koe...@amd.com>, > "Suravee Suthikulpanit" <suravee.suthikulpa...@amd.com>, "Ben Sander" > <ben.san...@amd.com> > Sent: Tuesday, 24 October, 2017 16:58:24 > Subject: RE: Enabling peer to peer device transactions for PCIe devices > Please don't top post, write shorter lines, and add the odd blank line. > Big blocks of text are hard to read quickly. > OK this time I am very short. peer2peer works Ludwig

RE: Enabling peer to peer device transactions for PCIe devices

2017-10-24 Thread David Laight
Please don't top post, write shorter lines, and add the odd blank line. Big blocks of text are hard to read quickly. > From: Petrosyan, Ludwig [mailto:ludwig.petros...@desy.de] > Yes I agree it has to be started with the write transaction, according of > PCIe standard all write > transaction are

Re: Enabling peer to peer device transactions for PCIe devices

2017-10-23 Thread Petrosyan, Ludwig
.blin...@amd.com>, "Christian Koenig" <christian.koe...@amd.com>, "Suravee Suthikulpanit" <suravee.suthikulpa...@amd.com>, "Ben Sander" <ben.san...@amd.com> Sent: Tuesday, 24 October, 2017 00:04:26 Subject: Re: Enabling peer to peer device transact

Re: Enabling peer to peer device transactions for PCIe devices

2017-10-23 Thread Logan Gunthorpe
On 23/10/17 10:08 AM, David Laight wrote: It is also worth checking that the hardware actually supports p2p transfers. Writes are more likely to be supported then reads. ISTR that some intel cpus support some p2p writes, but there could easily be errata against them. Ludwig mentioned a PCIe

RE: Enabling peer to peer device transactions for PCIe devices

2017-10-23 Thread David Laight
From: Petrosyan Ludwig > Sent: 22 October 2017 07:14 > Could be I have done is stupid... > But at first sight it has to be simple: > The PCIe Write transactions are address routed, so if in the packet header > the other endpoint address > is written the TLP has to be routed (by PCIe Switch to the

Re: Enabling peer to peer device transactions for PCIe devices

2017-10-22 Thread Logan Gunthorpe
On 22/10/17 12:13 AM, Petrosyan, Ludwig wrote: > But at first sight it has to be simple: > The PCIe Write transactions are address routed, so if in the packet header > the other endpoint address is written the TLP has to be routed (by PCIe > Switch to the endpoint), the DMA reading from the end

Re: Enabling peer to peer device transactions for PCIe devices

2017-10-22 Thread Petrosyan, Ludwig
itch, Serguei" <serguei.sagalovi...@amd.com>, "Blinzer, Paul" <paul.blin...@amd.com>, "Koenig, Christian" <christian.koe...@amd.com>, "Suthikulpanit, Suravee" <suravee.suthikulpa...@amd.com>, "Sander, Ben" <ben.san...@amd.

Re: Enabling peer to peer device transactions for PCIe devices

2017-10-20 Thread Logan Gunthorpe
Hi Ludwig, P2P transactions are still *very* experimental at the moment and take a lot of expertise to get working in a general setup. It will definitely require changes to the kernel, including the drivers of all the devices you are trying to make talk to eachother. If you're up for it you

Re: Enabling peer to peer device transactions for PCIe devices

2017-10-20 Thread Ludwig Petrosyan
Dear Linux kernel group my name is Ludwig Petrosyan I am working in DESY (Germany) we are responsible for the control system of  all accelerators in DESY. For a 7-8 years we have switched to MTCA.4 systems and using PCIe as a central Bus. I am mostly responsible for the Linux drivers of the

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-13 Thread Christian König
Am 12.01.2017 um 16:11 schrieb Jerome Glisse: On Wed, Jan 11, 2017 at 10:54:39PM -0600, Stephen Bates wrote: On Fri, January 6, 2017 4:10 pm, Logan Gunthorpe wrote: On 06/01/17 11:26 AM, Jason Gunthorpe wrote: Make a generic API for all of this and you'd have my vote.. IMHO, you must

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-12 Thread Logan Gunthorpe
On 11/01/17 09:54 PM, Stephen Bates wrote: > The iopmem patchset addressed all the use cases above and while it is not > an in kernel API it could have been modified to be one reasonably easily. > As Logan states the driver can then choose to pass the VMAs to user-space > in a manner that makes

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-12 Thread Jason Gunthorpe
On Thu, Jan 12, 2017 at 10:11:29AM -0500, Jerome Glisse wrote: > On Wed, Jan 11, 2017 at 10:54:39PM -0600, Stephen Bates wrote: > > > What we want is for RDMA, O_DIRECT, etc to just work with special VMAs > > > (ie. at least those backed with ZONE_DEVICE memory). Then > > > GPU/NVME/DAX/whatever

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-12 Thread Jerome Glisse
On Wed, Jan 11, 2017 at 10:54:39PM -0600, Stephen Bates wrote: > On Fri, January 6, 2017 4:10 pm, Logan Gunthorpe wrote: > > > > > > On 06/01/17 11:26 AM, Jason Gunthorpe wrote: > > > > > >> Make a generic API for all of this and you'd have my vote.. > >> > >> > >> IMHO, you must support basic

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-11 Thread Stephen Bates
On Fri, January 6, 2017 4:10 pm, Logan Gunthorpe wrote: > > > On 06/01/17 11:26 AM, Jason Gunthorpe wrote: > > >> Make a generic API for all of this and you'd have my vote.. >> >> >> IMHO, you must support basic pinning semantics - that is necessary to >> support generic short lived DMA (eg

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-06 Thread Logan Gunthorpe
On 06/01/17 11:26 AM, Jason Gunthorpe wrote: > Make a generic API for all of this and you'd have my vote.. > > IMHO, you must support basic pinning semantics - that is necessary to > support generic short lived DMA (eg filesystem, etc). That hardware > can clearly do that if it can support

RE: Enabling peer to peer device transactions for PCIe devices

2017-01-06 Thread Deucher, Alexander
ubject: Re: Enabling peer to peer device transactions for PCIe devices > > On Fri, Jan 06, 2017 at 12:37:22PM -0500, Jerome Glisse wrote: > > On Fri, Jan 06, 2017 at 11:56:30AM -0500, Serguei Sagalovitch wrote: > > > On 2017-01-05 08:58 PM, Jerome Glisse wrote: > > > &g

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-06 Thread Jason Gunthorpe
On Fri, Jan 06, 2017 at 12:37:22PM -0500, Jerome Glisse wrote: > On Fri, Jan 06, 2017 at 11:56:30AM -0500, Serguei Sagalovitch wrote: > > On 2017-01-05 08:58 PM, Jerome Glisse wrote: > > > On Thu, Jan 05, 2017 at 05:30:34PM -0700, Jason Gunthorpe wrote: > > > > On Thu, Jan 05, 2017 at 06:23:52PM

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-06 Thread Jerome Glisse
On Fri, Jan 06, 2017 at 11:56:30AM -0500, Serguei Sagalovitch wrote: > On 2017-01-05 08:58 PM, Jerome Glisse wrote: > > On Thu, Jan 05, 2017 at 05:30:34PM -0700, Jason Gunthorpe wrote: > > > On Thu, Jan 05, 2017 at 06:23:52PM -0500, Jerome Glisse wrote: > > > > > > > > I still don't understand

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-06 Thread Serguei Sagalovitch
On 2017-01-05 08:58 PM, Jerome Glisse wrote: On Thu, Jan 05, 2017 at 05:30:34PM -0700, Jason Gunthorpe wrote: On Thu, Jan 05, 2017 at 06:23:52PM -0500, Jerome Glisse wrote: I still don't understand what you driving at - you've said in both cases a user VMA exists. In the former case no,

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-06 Thread Henrique Almeida
Hello, I've been watching this thread not as a kernel developer, but as an user interested in doing peer-to-peer access between network card and GPU. I believe that merging raw direct access with vma overcomplicates things for our use case. We'll have a very large camera streaming data at high

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jerome Glisse
On Thu, Jan 05, 2017 at 05:30:34PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 06:23:52PM -0500, Jerome Glisse wrote: > > > > I still don't understand what you driving at - you've said in both > > > cases a user VMA exists. > > > > In the former case no, there is no VMA directly but

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Serguei Sagalovitch
On 2017-01-05 07:30 PM, Jason Gunthorpe wrote: but I am opposed to the idea we need two API paths that the *driver* has to figure out. That is fundamentally not what I want as a driver developer. Give me a common API to convert '__user *' to a scatter list and pin the pages.

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 06:23:52PM -0500, Jerome Glisse wrote: > > I still don't understand what you driving at - you've said in both > > cases a user VMA exists. > > In the former case no, there is no VMA directly but if you want one than > a device can provide one. But such VMA is useless as

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jerome Glisse
On Thu, Jan 05, 2017 at 03:42:15PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 03:19:36PM -0500, Jerome Glisse wrote: > > > > Always having a VMA changes the discussion - the question is how to > > > create a VMA that reprensents IO device memory, and how do DMA > > > consumers

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 03:19:36PM -0500, Jerome Glisse wrote: > > Always having a VMA changes the discussion - the question is how to > > create a VMA that reprensents IO device memory, and how do DMA > > consumers extract the correct information from that VMA to pass to the > > kernel DMA API

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jerome Glisse
On Thu, Jan 05, 2017 at 01:07:19PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 02:54:24PM -0500, Jerome Glisse wrote: > > > Mellanox and NVidia support peer to peer with what they market a > > GPUDirect. It only works without IOMMU. It is probably not upstream : > > > >

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 02:54:24PM -0500, Jerome Glisse wrote: > Mellanox and NVidia support peer to peer with what they market a > GPUDirect. It only works without IOMMU. It is probably not upstream : > > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg21402.html > > I thought it

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jerome Glisse
On Thu, Jan 05, 2017 at 12:01:13PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 05, 2017 at 01:39:29PM -0500, Jerome Glisse wrote: > > > 1) peer-to-peer because of userspace specific API like NVidia GPU > > direct (AMD is pushing its own similar API i just can't remember > > marketing

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 01:39:29PM -0500, Jerome Glisse wrote: > 1) peer-to-peer because of userspace specific API like NVidia GPU > direct (AMD is pushing its own similar API i just can't remember > marketing name). This does not happen through a vma, this happens > through

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jerome Glisse
Sorry to revive this thread but it fells through my filters and i miss it. I have been going through it and i think the discussion has been hinder by the fact that distinct problems were merge while they should be address separately. First for peer-to-peer we need to be clear on how this happens.

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Dan Williams
On Tue, Dec 6, 2016 at 1:47 PM, Logan Gunthorpe wrote: > Hey, > >> Okay, so clearly this needs a kernel side NVMe specific allocator >> and locking so users don't step on each other.. > > Yup, ideally. That's why device dax isn't ideal for this application: it > doesn't

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Logan Gunthorpe
Hey, > Okay, so clearly this needs a kernel side NVMe specific allocator > and locking so users don't step on each other.. Yup, ideally. That's why device dax isn't ideal for this application: it doesn't provide any way to prevent users from stepping on each other. > Or as Christoph says some

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Jason Gunthorpe
On Tue, Dec 06, 2016 at 09:51:15AM -0700, Logan Gunthorpe wrote: > Hey, > > On 06/12/16 09:38 AM, Jason Gunthorpe wrote: > >>> I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial > >>> to accomplish in sysfs through /sys/dev/char to find the sysfs path of the > >>> device-dax

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Christoph Hellwig
On Tue, Dec 06, 2016 at 09:38:50AM -0700, Jason Gunthorpe wrote: > > > I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial > > > to accomplish in sysfs through /sys/dev/char to find the sysfs path of the > > > device-dax instance under the nvme device, or if you already have the

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Logan Gunthorpe
Hey, On 06/12/16 09:38 AM, Jason Gunthorpe wrote: >>> I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial >>> to accomplish in sysfs through /sys/dev/char to find the sysfs path of the >>> device-dax instance under the nvme device, or if you already have the nvme >>> sysfs path

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Jason Gunthorpe
> > I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial > > to accomplish in sysfs through /sys/dev/char to find the sysfs path of the > > device-dax instance under the nvme device, or if you already have the nvme > > sysfs path the dax instance(s) will appear under the "dax"

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Stephen Bates
>>> I've already recommended that iopmem not be a block device and >>> instead be a device-dax instance. I also don't think it should claim >>> the PCI ID, rather the driver that wants to map one of its bars this >>> way can register the memory region with the device-dax core. >>> >>> I'm not sure

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Christoph Hellwig
On Mon, Dec 05, 2016 at 12:46:14PM -0700, Jason Gunthorpe wrote: > In any event the allocator still needs to track which regions are in > use and be able to hook 'free' from userspace. That does suggest it > should be integrated into the nvme driver and not a bolt on driver.. Two totally

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Logan Gunthorpe
On 05/12/16 12:46 PM, Jason Gunthorpe wrote: NVMe might have to deal with pci-e hot-unplug, which is a similar problem-class to the GPU case.. Sure, but if the NVMe device gets hot-unplugged it means that all the CMB mappings are useless and need to be torn down. This probably means

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 12:27:20PM -0700, Logan Gunthorpe wrote: > > > On 05/12/16 12:14 PM, Jason Gunthorpe wrote: > >But CMB sounds much more like the GPU case where there is a > >specialized allocator handing out the BAR to consumers, so I'm not > >sure a general purpose chardev makes a lot

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Logan Gunthorpe
On 05/12/16 12:14 PM, Jason Gunthorpe wrote: But CMB sounds much more like the GPU case where there is a specialized allocator handing out the BAR to consumers, so I'm not sure a general purpose chardev makes a lot of sense? I don't think it will ever need to be as complicated as the GPU

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 10:48:58AM -0800, Dan Williams wrote: > On Mon, Dec 5, 2016 at 10:39 AM, Logan Gunthorpe wrote: > > On 05/12/16 11:08 AM, Dan Williams wrote: > >> > >> I've already recommended that iopmem not be a block device and instead > >> be a device-dax

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 10:39 AM, Logan Gunthorpe wrote: > On 05/12/16 11:08 AM, Dan Williams wrote: >> >> I've already recommended that iopmem not be a block device and instead >> be a device-dax instance. I also don't think it should claim the PCI >> ID, rather the driver

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Logan Gunthorpe
On 05/12/16 11:08 AM, Dan Williams wrote: I've already recommended that iopmem not be a block device and instead be a device-dax instance. I also don't think it should claim the PCI ID, rather the driver that wants to map one of its bars this way can register the memory region with the

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 10:02 AM, Jason Gunthorpe wrote: > On Mon, Dec 05, 2016 at 09:40:38AM -0800, Dan Williams wrote: > >> > If it is kernel only with physical addresess we don't need a uAPI for >> > it, so I'm not sure #1 is at all related to iopmem. >> > >> >

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 09:40:38AM -0800, Dan Williams wrote: > > If it is kernel only with physical addresess we don't need a uAPI for > > it, so I'm not sure #1 is at all related to iopmem. > > > > Most people who want #1 probably can just mmap > > /sys/../pci/../resourceX to get a user handle

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 9:18 AM, Jason Gunthorpe wrote: > On Sun, Dec 04, 2016 at 07:23:00AM -0600, Stephen Bates wrote: >> Hi All >> >> This has been a great thread (thanks to Alex for kicking it off) and I >> wanted to jump in and maybe try and put some summary

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Sun, Dec 04, 2016 at 07:23:00AM -0600, Stephen Bates wrote: > Hi All > > This has been a great thread (thanks to Alex for kicking it off) and I > wanted to jump in and maybe try and put some summary around the > discussion. I also wanted to propose we include this as a topic for LFS/MM >

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Stephen Bates
Hi All This has been a great thread (thanks to Alex for kicking it off) and I wanted to jump in and maybe try and put some summary around the discussion. I also wanted to propose we include this as a topic for LFS/MM because I think we need more discussion on the best way to add this

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Haggai Eran
On 11/30/2016 6:23 PM, Jason Gunthorpe wrote: >> and O_DIRECT operations that access GPU memory. > This goes through user space so there is still a VMA.. > >> Also, HMM's migration between two GPUs could use peer to peer in the >> kernel, although that is intended to be handled by the GPU driver

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Stephen Bates
>> >> The NVMe fabrics stuff could probably make use of this. It's an >> in-kernel system to allow remote access to an NVMe device over RDMA. So >> they ought to be able to optimize their transfers by DMAing directly to >> the NVMe's CMB -- no userspace interface would be required but there >>

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Haggai Eran
On 11/30/2016 8:01 PM, Logan Gunthorpe wrote: > > > On 30/11/16 09:23 AM, Jason Gunthorpe wrote: >>> Two cases I can think of are RDMA access to an NVMe device's controller >>> memory buffer, >> >> I'm not sure on the use model there.. > > The NVMe fabrics stuff could probably make use of this.

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-04 Thread Haggai Eran
On 11/30/2016 7:28 PM, Serguei Sagalovitch wrote: > On 2016-11-30 11:23 AM, Jason Gunthorpe wrote: >>> Yes, that sounds fine. Can we simply kill the process from the GPU driver? >>> Or do we need to extend the OOM killer to manage GPU pages? >> I don't know.. > We could use send_sig_info to send

RE: Enabling peer to peer device transactions for PCIe devices

2016-11-30 Thread Deucher, Alexander
; Bridgman, > John; Deucher, Alexander; Linux-media@vger.kernel.org; > dan.j.willi...@intel.com; log...@deltatee.com; dri- > de...@lists.freedesktop.org; Max Gurtovoy; linux-...@vger.kernel.org; > Sagalovitch, Serguei; Blinzer, Paul; Kuehling, Felix; Sander, Ben > Subject:

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-30 Thread Logan Gunthorpe
On 30/11/16 09:23 AM, Jason Gunthorpe wrote: >> Two cases I can think of are RDMA access to an NVMe device's controller >> memory buffer, > > I'm not sure on the use model there.. The NVMe fabrics stuff could probably make use of this. It's an in-kernel system to allow remote access to an NVMe

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-30 Thread Serguei Sagalovitch
On 2016-11-30 11:23 AM, Jason Gunthorpe wrote: Yes, that sounds fine. Can we simply kill the process from the GPU driver? Or do we need to extend the OOM killer to manage GPU pages? I don't know.. We could use send_sig_info to send signal from kernel to user space. So theoretically GPU

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-30 Thread Jason Gunthorpe
On Wed, Nov 30, 2016 at 12:45:58PM +0200, Haggai Eran wrote: > > That just forces applications to handle horrible unexpected > > failures. If this sort of thing is needed for correctness then OOM > > kill the offending process, don't corrupt its operation. > Yes, that sounds fine. Can we simply

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-30 Thread Haggai Eran
On 11/28/2016 9:02 PM, Jason Gunthorpe wrote: > On Mon, Nov 28, 2016 at 06:19:40PM +, Haggai Eran wrote: GPU memory. We create a non-ODP MR pointing to VRAM but rely on user-space and the GPU not to migrate it. If they do, the MR gets destroyed immediately. >>> That sounds

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Jason Gunthorpe
On Mon, Nov 28, 2016 at 04:55:23PM -0500, Serguei Sagalovitch wrote: > >We haven't touch this in a long time and perhaps it changed, but there > >definitely was a call back in the PeerDirect API to allow the GPU to > >invalidate the mapping. That's what we don't want. > I assume that you are

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Serguei Sagalovitch
On 2016-11-28 04:36 PM, Logan Gunthorpe wrote: On 28/11/16 12:35 PM, Serguei Sagalovitch wrote: As soon as PeerDirect mapping is called then GPU must not "move" the such memory. It is by PeerDirect design. It is similar how it is works with system memory and RDMA MR: when "get_user_pages" is

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Logan Gunthorpe
On 28/11/16 12:35 PM, Serguei Sagalovitch wrote: > As soon as PeerDirect mapping is called then GPU must not "move" the > such memory. It is by PeerDirect design. It is similar how it is works > with system memory and RDMA MR: when "get_user_pages" is called then the > memory is pinned. We

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Haggai Eran
On Mon, 2016-11-28 at 09:57 -0700, Jason Gunthorpe wrote: > On Sun, Nov 27, 2016 at 04:02:16PM +0200, Haggai Eran wrote: > > I think blocking mmu notifiers against something that is basically > > controlled by user-space can be problematic. This can block things > > like > > memory reclaim. If you

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Serguei Sagalovitch
On 2016-11-28 01:20 PM, Logan Gunthorpe wrote: On 28/11/16 09:57 AM, Jason Gunthorpe wrote: On PeerDirect, we have some kind of a middle-ground solution for pinning GPU memory. We create a non-ODP MR pointing to VRAM but rely on user-space and the GPU not to migrate it. If they do, the MR gets

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Jason Gunthorpe
On Mon, Nov 28, 2016 at 06:19:40PM +, Haggai Eran wrote: > > > GPU memory. We create a non-ODP MR pointing to VRAM but rely on > > > user-space and the GPU not to migrate it. If they do, the MR gets > > > destroyed immediately. > > That sounds horrible. How can that possibly work? What if the

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Haggai Eran
On Mon, 2016-11-28 at 09:48 -0500, Serguei Sagalovitch wrote: > On 2016-11-27 09:02 AM, Haggai Eran wrote > > > > On PeerDirect, we have some kind of a middle-ground solution for > > pinning > > GPU memory. We create a non-ODP MR pointing to VRAM but rely on > > user-space and the GPU not to

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Logan Gunthorpe
On 28/11/16 09:57 AM, Jason Gunthorpe wrote: >> On PeerDirect, we have some kind of a middle-ground solution for pinning >> GPU memory. We create a non-ODP MR pointing to VRAM but rely on >> user-space and the GPU not to migrate it. If they do, the MR gets >> destroyed immediately. > > That

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Jason Gunthorpe
On Sun, Nov 27, 2016 at 04:02:16PM +0200, Haggai Eran wrote: > > Like in ODP, MMU notifiers/HMM are used to monitor for translation > > changes. If a change comes in the GPU driver checks if an executing > > command is touching those pages and blocks the MMU notifier until the > > command

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Serguei Sagalovitch
On 2016-11-27 09:02 AM, Haggai Eran wrote On PeerDirect, we have some kind of a middle-ground solution for pinning GPU memory. We create a non-ODP MR pointing to VRAM but rely on user-space and the GPU not to migrate it. If they do, the MR gets destroyed immediately. This should work on legacy

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-27 Thread zhoucm1
+Qiang, who is working on it. On 2016年11月27日 22:07, Christian König wrote: Am 27.11.2016 um 15:02 schrieb Haggai Eran: On 11/25/2016 9:32 PM, Jason Gunthorpe wrote: On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote: Like you say below we have to handle short lived in the usual

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-27 Thread Haggai Eran
On 11/25/2016 9:32 PM, Jason Gunthorpe wrote: > On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote: > >>> Like you say below we have to handle short lived in the usual way, and >>> that covers basically every device except IB MRs, including the >>> command queue on a NVMe drive. >>

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-27 Thread Haggai Eran
On 11/25/2016 9:32 PM, Jason Gunthorpe wrote: > On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote: > >>> Like you say below we have to handle short lived in the usual way, and >>> that covers basically every device except IB MRs, including the >>> command queue on a NVMe drive. >> >>

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-27 Thread Christian König
Am 27.11.2016 um 15:02 schrieb Haggai Eran: On 11/25/2016 9:32 PM, Jason Gunthorpe wrote: On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote: Like you say below we have to handle short lived in the usual way, and that covers basically every device except IB MRs, including the

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Alex Deucher
On Fri, Nov 25, 2016 at 2:34 PM, Jason Gunthorpe wrote: > On Fri, Nov 25, 2016 at 12:16:30PM -0500, Serguei Sagalovitch wrote: > >> b) Allocation may not have CPU address at all - only GPU one. > > But you don't expect RDMA to work in the case, right? > > GPU

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 09:40:10PM +0100, Christian König wrote: > We call this "userptr" and it's just a combination of get_user_pages() on > command submission and making sure the returned list of pages stays valid > using a MMU notifier. Doesn't that still pin the page? > The "big" problem

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Felix Kuehling
On 16-11-25 12:20 PM, Serguei Sagalovitch wrote: > >> A white list may end up being rather complicated if it has to cover >> different CPU generations and system architectures. I feel this is a >> decision user space could easily make. >> >> Logan > I agreed that it is better to leave up to user

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Felix Kuehling
On 16-11-25 03:40 PM, Christian König wrote: > Am 25.11.2016 um 20:32 schrieb Jason Gunthorpe: >> This assumes the commands are fairly short lived of course, the >> expectation of the mmu notifiers is that a flush is reasonably prompt > > Correct, this is another problem. GFX command submissions

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Serguei Sagalovitch
On 2016-11-25 03:26 PM, Felix Kuehling wrote: On 16-11-25 12:20 PM, Serguei Sagalovitch wrote: A white list may end up being rather complicated if it has to cover different CPU generations and system architectures. I feel this is a decision user space could easily make. Logan I agreed that

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Christian König
Am 25.11.2016 um 20:32 schrieb Jason Gunthorpe: On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote: Like you say below we have to handle short lived in the usual way, and that covers basically every device except IB MRs, including the command queue on a NVMe drive. Well a problem

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 02:49:50PM -0500, Serguei Sagalovitch wrote: > GPU could perfectly access all VRAM. It is only issue for p2p without > special interconnect and CPU access. Strictly speaking as long as we > have "bus address" we could have RDMA but I agreed that for > RDMA we

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Serguei Sagalovitch
On 2016-11-25 02:34 PM, Jason Gunthorpe wrote: On Fri, Nov 25, 2016 at 12:16:30PM -0500, Serguei Sagalovitch wrote: b) Allocation may not have CPU address at all - only GPU one. But you don't expect RDMA to work in the case, right? GPU people need to stop doing this windowed memory stuff

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Thu, Nov 24, 2016 at 11:58:17PM -0800, Christoph Hellwig wrote: > On Thu, Nov 24, 2016 at 11:11:34AM -0700, Logan Gunthorpe wrote: > > * Regular DAX in the FS doesn't work at this time because the FS can > > move the file you think your transfer to out from under you. Though I > > understand

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 12:16:30PM -0500, Serguei Sagalovitch wrote: > b) Allocation may not have CPU address at all - only GPU one. But you don't expect RDMA to work in the case, right? GPU people need to stop doing this windowed memory stuff :) Jason -- To unsubscribe from this list: send

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote: > >Like you say below we have to handle short lived in the usual way, and > >that covers basically every device except IB MRs, including the > >command queue on a NVMe drive. > > Well a problem which wasn't mentioned so far is that

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Serguei Sagalovitch
On 2016-11-25 08:22 AM, Christian König wrote: Serguei, what is your plan in GPU land for migration? Ie if I have a CPU mapped page and the GPU moves it to VRAM, it becomes non-cachable - do you still allow the CPU to access it? Or do you swap it back to cachable memory if the CPU touches it?

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Serguei Sagalovitch
Well, I guess there's some consensus building to do. The existing options are: * Device DAX: which could work but the problem I see with it is that it only allows one application to do these transfers. Or there would have to be some user-space coordination to figure which application gets what

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Serguei Sagalovitch
A white list may end up being rather complicated if it has to cover different CPU generations and system architectures. I feel this is a decision user space could easily make. Logan I agreed that it is better to leave up to user space to check what is working and what is not. I found that

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Logan Gunthorpe
On 25/11/16 06:06 AM, Christian König wrote: > Well Serguei send me a couple of documents about QPI when we started to > discuss this internally as well and that's exactly one of the cases I > had in mind when writing this. > > If I understood it correctly for such systems P2P is technical

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Christian König
Am 24.11.2016 um 18:55 schrieb Logan Gunthorpe: Hey, On 24/11/16 02:45 AM, Christian König wrote: E.g. it can happen that PCI device A exports it's BAR using ZONE_DEVICE. Not PCI device B (a SATA device) can directly read/write to it because it is on the same bus segment, but PCI device C (a

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Christian König
Am 24.11.2016 um 17:42 schrieb Jason Gunthorpe: On Wed, Nov 23, 2016 at 06:25:21PM -0700, Logan Gunthorpe wrote: On 23/11/16 02:55 PM, Jason Gunthorpe wrote: Only ODP hardware allows changing the DMA address on the fly, and it works at the page table level. We do not need special handling for

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Christoph Hellwig
On Thu, Nov 24, 2016 at 11:11:34AM -0700, Logan Gunthorpe wrote: > * Regular DAX in the FS doesn't work at this time because the FS can > move the file you think your transfer to out from under you. Though I > understand there's been some work with XFS to solve that issue. The file system will

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Logan Gunthorpe
On 24/11/16 09:42 AM, Jason Gunthorpe wrote: > There are three cases to worry about: > - Coherent long lived page table mirroring (RDMA ODP MR) > - Non-coherent long lived page table mirroring (RDMA MR) > - Short lived DMA mapping (everything else) > > Like you say below we have to handle

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Logan Gunthorpe
Hey, On 24/11/16 02:45 AM, Christian König wrote: > E.g. it can happen that PCI device A exports it's BAR using ZONE_DEVICE. > Not PCI device B (a SATA device) can directly read/write to it because > it is on the same bus segment, but PCI device C (a network card for > example) can't because it

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Serguei Sagalovitch
On 2016-11-24 11:26 AM, Jason Gunthorpe wrote: On Thu, Nov 24, 2016 at 10:45:18AM +0100, Christian König wrote: Am 24.11.2016 um 00:25 schrieb Jason Gunthorpe: There is certainly nothing about the hardware that cares about ZONE_DEVICE vs System memory. Well that is clearly not so simple.

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 06:25:21PM -0700, Logan Gunthorpe wrote: > > > On 23/11/16 02:55 PM, Jason Gunthorpe wrote: > >>> Only ODP hardware allows changing the DMA address on the fly, and it > >>> works at the page table level. We do not need special handling for > >>> RDMA. > >> > >> I am aware

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Jason Gunthorpe
On Thu, Nov 24, 2016 at 10:45:18AM +0100, Christian König wrote: > Am 24.11.2016 um 00:25 schrieb Jason Gunthorpe: > >There is certainly nothing about the hardware that cares > >about ZONE_DEVICE vs System memory. > Well that is clearly not so simple. When your ZONE_DEVICE pages describe a > PCI

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Jason Gunthorpe
On Thu, Nov 24, 2016 at 12:40:37AM +, Sagalovitch, Serguei wrote: > On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > > > Perhaps I am not following what Serguei is asking for, but I > > understood the desire was for a complex GPU allocator that could > > migrate pages

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Christian König
Am 24.11.2016 um 00:25 schrieb Jason Gunthorpe: There is certainly nothing about the hardware that cares about ZONE_DEVICE vs System memory. Well that is clearly not so simple. When your ZONE_DEVICE pages describe a PCI BAR and another PCI device initiates a DMA to this address the DMA

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Bart Van Assche
On 11/23/2016 09:13 AM, Logan Gunthorpe wrote: IMO any memory that has been registered for a P2P transaction should be locked from being evicted. So if there's a get_user_pages call it needs to be pinned until the put_page. The main issue being with the RDMA case: handling an eviction when a

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Logan Gunthorpe
On 23/11/16 02:55 PM, Jason Gunthorpe wrote: >>> Only ODP hardware allows changing the DMA address on the fly, and it >>> works at the page table level. We do not need special handling for >>> RDMA. >> >> I am aware of ODP but, noted by others, it doesn't provide a general >> solution to the

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Sagalovitch, Serguei
On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > Perhaps I am not following what Serguei is asking for, but I > understood the desire was for a complex GPU allocator that could > migrate pages between GPU and CPU memory under control of the GPU > driver, among other things. The

  1   2   >