Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function

2017-04-27 Thread Jason Gunthorpe
On Thu, Apr 27, 2017 at 05:03:45PM -0600, Logan Gunthorpe wrote: > > > On 27/04/17 04:11 PM, Jason Gunthorpe wrote: > > On Thu, Apr 27, 2017 at 03:53:37PM -0600, Logan Gunthorpe wrote: > > Well, that is in the current form, with more users it would make sense > > to o

Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function

2017-04-27 Thread Jason Gunthorpe
On Thu, Apr 27, 2017 at 03:53:37PM -0600, Logan Gunthorpe wrote: > On 27/04/17 02:53 PM, Jason Gunthorpe wrote: > > blkfront is one of the drivers I looked at, and it appears to only be > > memcpying with the bvec_data pointer, so I wonder why it does not use > > sg_

Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function

2017-04-27 Thread Jason Gunthorpe
On Thu, Apr 27, 2017 at 02:19:24PM -0600, Logan Gunthorpe wrote: > > > On 26/04/17 01:37 AM, Roger Pau Monné wrote: > > On Tue, Apr 25, 2017 at 12:21:02PM -0600, Logan Gunthorpe wrote: > >> Straightforward conversion to the new helper, except due to the lack > >> of error path, we have to use

Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions

2017-04-27 Thread Jason Gunthorpe
On Thu, Apr 27, 2017 at 08:53:38AM +0200, Christoph Hellwig wrote: > > The main difficulty we > > have now is that neither of those functions are expected to fail and we > > need them to be able to in cases where the page doesn't map to system > > RAM. This patch series is trying to address it

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-12 Thread Jason Gunthorpe
On Thu, Jan 12, 2017 at 10:11:29AM -0500, Jerome Glisse wrote: > On Wed, Jan 11, 2017 at 10:54:39PM -0600, Stephen Bates wrote: > > > What we want is for RDMA, O_DIRECT, etc to just work with special VMAs > > > (ie. at least those backed with ZONE_DEVICE memory). Then > > > GPU/NVME/DAX/whatever

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-06 Thread Jason Gunthorpe
On Fri, Jan 06, 2017 at 12:37:22PM -0500, Jerome Glisse wrote: > On Fri, Jan 06, 2017 at 11:56:30AM -0500, Serguei Sagalovitch wrote: > > On 2017-01-05 08:58 PM, Jerome Glisse wrote: > > > On Thu, Jan 05, 2017 at 05:30:34PM -0700, Jason Gunthorpe wrote: > > > > On T

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 06:23:52PM -0500, Jerome Glisse wrote: > > I still don't understand what you driving at - you've said in both > > cases a user VMA exists. > > In the former case no, there is no VMA directly but if you want one than > a device can provide one. But such VMA is useless as

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 03:19:36PM -0500, Jerome Glisse wrote: > > Always having a VMA changes the discussion - the question is how to > > create a VMA that reprensents IO device memory, and how do DMA > > consumers extract the correct information from that VMA to pass to the > > kernel DMA API

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 02:54:24PM -0500, Jerome Glisse wrote: > Mellanox and NVidia support peer to peer with what they market a > GPUDirect. It only works without IOMMU. It is probably not upstream : > > https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg21402.html > > I thought it

Re: Enabling peer to peer device transactions for PCIe devices

2017-01-05 Thread Jason Gunthorpe
On Thu, Jan 05, 2017 at 01:39:29PM -0500, Jerome Glisse wrote: > 1) peer-to-peer because of userspace specific API like NVidia GPU > direct (AMD is pushing its own similar API i just can't remember > marketing name). This does not happen through a vma, this happens > through

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Jason Gunthorpe
On Tue, Dec 06, 2016 at 09:51:15AM -0700, Logan Gunthorpe wrote: > Hey, > > On 06/12/16 09:38 AM, Jason Gunthorpe wrote: > >>> I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial > >>> to accomplish in sysfs through /sys/dev/char to find the s

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-06 Thread Jason Gunthorpe
> > I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial > > to accomplish in sysfs through /sys/dev/char to find the sysfs path of the > > device-dax instance under the nvme device, or if you already have the nvme > > sysfs path the dax instance(s) will appear under the "dax"

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 12:27:20PM -0700, Logan Gunthorpe wrote: > > > On 05/12/16 12:14 PM, Jason Gunthorpe wrote: > >But CMB sounds much more like the GPU case where there is a > >specialized allocator handing out the BAR to consumers, so I'm not > >sure a general

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 10:48:58AM -0800, Dan Williams wrote: > On Mon, Dec 5, 2016 at 10:39 AM, Logan Gunthorpe wrote: > > On 05/12/16 11:08 AM, Dan Williams wrote: > >> > >> I've already recommended that iopmem not be a block device and instead > >> be a device-dax

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 09:40:38AM -0800, Dan Williams wrote: > > If it is kernel only with physical addresess we don't need a uAPI for > > it, so I'm not sure #1 is at all related to iopmem. > > > > Most people who want #1 probably can just mmap > > /sys/../pci/../resourceX to get a user handle

Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Sun, Dec 04, 2016 at 07:23:00AM -0600, Stephen Bates wrote: > Hi All > > This has been a great thread (thanks to Alex for kicking it off) and I > wanted to jump in and maybe try and put some summary around the > discussion. I also wanted to propose we include this as a topic for LFS/MM >

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-30 Thread Jason Gunthorpe
On Wed, Nov 30, 2016 at 12:45:58PM +0200, Haggai Eran wrote: > > That just forces applications to handle horrible unexpected > > failures. If this sort of thing is needed for correctness then OOM > > kill the offending process, don't corrupt its operation. > Yes, that sounds fine. Can we simply

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Jason Gunthorpe
On Mon, Nov 28, 2016 at 04:55:23PM -0500, Serguei Sagalovitch wrote: > >We haven't touch this in a long time and perhaps it changed, but there > >definitely was a call back in the PeerDirect API to allow the GPU to > >invalidate the mapping. That's what we don't want. > I assume that you are

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Jason Gunthorpe
On Mon, Nov 28, 2016 at 06:19:40PM +, Haggai Eran wrote: > > > GPU memory. We create a non-ODP MR pointing to VRAM but rely on > > > user-space and the GPU not to migrate it. If they do, the MR gets > > > destroyed immediately. > > That sounds horrible. How can that possibly work? What if the

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-28 Thread Jason Gunthorpe
On Sun, Nov 27, 2016 at 04:02:16PM +0200, Haggai Eran wrote: > > Like in ODP, MMU notifiers/HMM are used to monitor for translation > > changes. If a change comes in the GPU driver checks if an executing > > command is touching those pages and blocks the MMU notifier until the > > command

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 09:40:10PM +0100, Christian König wrote: > We call this "userptr" and it's just a combination of get_user_pages() on > command submission and making sure the returned list of pages stays valid > using a MMU notifier. Doesn't that still pin the page? > The "big" problem

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 02:49:50PM -0500, Serguei Sagalovitch wrote: > GPU could perfectly access all VRAM. It is only issue for p2p without > special interconnect and CPU access. Strictly speaking as long as we > have "bus address" we could have RDMA but I agreed that for > RDMA we

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Thu, Nov 24, 2016 at 11:58:17PM -0800, Christoph Hellwig wrote: > On Thu, Nov 24, 2016 at 11:11:34AM -0700, Logan Gunthorpe wrote: > > * Regular DAX in the FS doesn't work at this time because the FS can > > move the file you think your transfer to out from under you. Though I > > understand

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 12:16:30PM -0500, Serguei Sagalovitch wrote: > b) Allocation may not have CPU address at all - only GPU one. But you don't expect RDMA to work in the case, right? GPU people need to stop doing this windowed memory stuff :) Jason -- To unsubscribe from this list: send

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Jason Gunthorpe
On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote: > >Like you say below we have to handle short lived in the usual way, and > >that covers basically every device except IB MRs, including the > >command queue on a NVMe drive. > > Well a problem which wasn't mentioned so far is that

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 06:25:21PM -0700, Logan Gunthorpe wrote: > > > On 23/11/16 02:55 PM, Jason Gunthorpe wrote: > >>> Only ODP hardware allows changing the DMA address on the fly, and it > >>> works at the page table level. We do not need special handling for

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Jason Gunthorpe
On Thu, Nov 24, 2016 at 10:45:18AM +0100, Christian König wrote: > Am 24.11.2016 um 00:25 schrieb Jason Gunthorpe: > >There is certainly nothing about the hardware that cares > >about ZONE_DEVICE vs System memory. > Well that is clearly not so simple. When your ZONE_DEVICE page

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-24 Thread Jason Gunthorpe
On Thu, Nov 24, 2016 at 12:40:37AM +, Sagalovitch, Serguei wrote: > On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > > > Perhaps I am not following what Serguei is asking for, but I > > understood the desire was for a complex GPU allocator that could > > migrate pages

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:42:12PM -0800, Dan Williams wrote: > > The crucial part for this discussion is the ability to fence and block > > DMA for a specific range. This is the hardware capability that lets > > page migration happen: fence DMA, migrate page, update page > > table in HCA, unblock

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > > As I said, there is no possible special handling. Standard IB hardware > > does not support changing the DMA address once a MR is created. Forget > > about doing that. > > Yeah, that's essentially the point I was trying to make.

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:58:38PM -0500, Serguei Sagalovitch wrote: >We do not want to have "highly" dynamic translation due to >performance cost. We need to support "overcommit" but would >like to minimize impact. To support RDMA MRs for GPU/VRAM/PCIe >device memory (which is

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:14:40PM -0500, Serguei Sagalovitch wrote: > > On 2016-11-23 02:05 PM, Jason Gunthorpe wrote: > >As Bart says, it would be best to be combined with something like > >Mellanox's ODP MRs, which allows a page to be evicted and then trigger > >

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 10:40:47AM -0800, Dan Williams wrote: > I don't think that was designed for the case where the backing memory > is a special/static physical address range rather than anonymous > "System RAM", right? The hardware doesn't care where the memory is. ODP is just a generic

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 10:13:03AM -0700, Logan Gunthorpe wrote: > an MR would be very tricky. The MR may be relied upon by another host > and the kernel would have to inform user-space the MR was invalid then > user-space would have to tell the remote application. As Bart says, it would be best

Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-22 Thread Jason Gunthorpe
On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote: On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote: On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote: Mike, do you think the time is right to just remove the iPath driver? With PAT now

Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-22 Thread Jason Gunthorpe
On Wed, Apr 22, 2015 at 02:53:11PM -0400, Doug Ledford wrote: To be precise, the split is that ipath powers the old HTX bus cards that only work in AMD systems, qib is all PCI-e cards. I still have a few HTX cards, but I no longer have any systems with HTX slots, so we haven't even used this

Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-21 Thread Jason Gunthorpe
On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote: Mike, do you think the time is right to just remove the iPath driver? With PAT now being default the driver effectively won't work with write-combining on modern kernels. Even if systems are old they likely had PAT support,

Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-21 Thread Jason Gunthorpe
On Wed, Apr 22, 2015 at 12:46:01AM +0200, Luis R. Rodriguez wrote: are talking about annotating the qib driver as known to be broken without PAT and since the ipath driver needs considerable work to be ported to use PAT (the This only seems to be true for one of the chips that driver