Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-25 Thread Stephen Bates
>> Yes, that's why I used 'significant'. One good thing is that given resources >> it can easily be done in parallel with other development, and will give >> additional >> insight of some form. > >Yup, well if someone wants to start working on an emulated RDMA device >that actually simulates

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-25 Thread Stephen Bates
> My first reflex when reading this thread was to think that this whole domain > lends it self excellently to testing via Qemu. Could it be that doing this in > the opposite direction might be a safer approach in the long run even though > (significant) more work up-front? While the idea of

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-25 Thread Logan Gunthorpe
On 25/04/17 12:30 AM, Knut Omang wrote: > Yes, that's why I used 'significant'. One good thing is that given resources > it can easily be done in parallel with other development, and will give > additional > insight of some form. Yup, well if someone wants to start working on an emulated RDMA

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-25 Thread Knut Omang
On Mon, 2017-04-24 at 10:14 -0600, Logan Gunthorpe wrote: > > On 24/04/17 01:36 AM, Knut Omang wrote: > > My first reflex when reading this thread was to think that this whole domain > > lends it self excellently to testing via Qemu. Could it be that doing this > > in  > > the opposite direction

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-24 Thread Logan Gunthorpe
On 24/04/17 01:36 AM, Knut Omang wrote: > My first reflex when reading this thread was to think that this whole domain > lends it self excellently to testing via Qemu. Could it be that doing this in > the opposite direction might be a safer approach in the long run even though > (significant)

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-24 Thread Knut Omang
On Mon, 2017-04-17 at 08:31 +1000, Benjamin Herrenschmidt wrote: > On Sun, 2017-04-16 at 10:34 -0600, Logan Gunthorpe wrote: > >  > > On 16/04/17 09:53 AM, Dan Williams wrote: > > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > > > context about the physical address in

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-20 Thread Dan Williams
On Thu, Apr 20, 2017 at 4:07 PM, Stephen Bates wrote: >>> Yes, this makes sense I think we really just want to distinguish host >>> memory or not in terms of the dev_pagemap type. >> >>> I would like to see mutually exclusive flags for host memory (or not) and >>>

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-20 Thread Stephen Bates
>> Yes, this makes sense I think we really just want to distinguish host >> memory or not in terms of the dev_pagemap type. > >> I would like to see mutually exclusive flags for host memory (or not) and >> persistence (or not). >> > > Why persistence? It has zero meaning to the mm. I like the

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-20 Thread Dan Williams
On Thu, Apr 20, 2017 at 1:43 PM, Stephen Bates wrote: > >> Yes, this makes sense I think we really just want to distinguish host >> memory or not in terms of the dev_pagemap type. > > I would like to see mutually exclusive flags for host memory (or not) and > persistence

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-20 Thread Stephen Bates
> Yes, this makes sense I think we really just want to distinguish host > memory or not in terms of the dev_pagemap type. I would like to see mutually exclusive flags for host memory (or not) and persistence (or not). Stephen

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 3:55 PM, Logan Gunthorpe wrote: > > > On 19/04/17 02:48 PM, Jason Gunthorpe wrote: >> On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote: >> But.. it could point to a GPU and the GPU struct device could have a proxy dma_ops like

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 02:48 PM, Jason Gunthorpe wrote: > On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote: > >>> But.. it could point to a GPU and the GPU struct device could have a >>> proxy dma_ops like Dan pointed out. >> >> Seems a bit awkward to me that in order for the intended use

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote: > > But.. it could point to a GPU and the GPU struct device could have a > > proxy dma_ops like Dan pointed out. > > Seems a bit awkward to me that in order for the intended use case, you > have to proxy the dma_ops. I'd probably

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 01:31 PM, Jason Gunthorpe wrote: > Try it with VT-D turned on. It shouldn't work or there is a notable > security hole in your platform.. Ah, ok. >>> const struct dma_map_ops *comp_ops = get_dma_ops(completer); >>> const struct dma_map_ops *init_ops =

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 01:02:49PM -0600, Logan Gunthorpe wrote: > > > On 19/04/17 12:32 PM, Jason Gunthorpe wrote: > > On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote: > > Not entirely, it would have to call through the whole process > > including the arch_p2p_cross_segment()..

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 12:32 PM, Jason Gunthorpe wrote: > On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote: > Not entirely, it would have to call through the whole process > including the arch_p2p_cross_segment().. Hmm, yes. Though it's still not clear what, if anything,

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 11:41 AM, Logan Gunthorpe wrote: > > > On 19/04/17 12:30 PM, Dan Williams wrote: >> Letting others users do the container_of() arrangement means that >> struct page_map needs to become public and move into struct >> dev_pagemap directly. > > Ah, yes, I

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 12:30 PM, Dan Williams wrote: > Letting others users do the container_of() arrangement means that > struct page_map needs to become public and move into struct > dev_pagemap directly. Ah, yes, I got a bit turned around by that and failed to notice that page_map and dev_pagemap are

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote: > I'm just spit balling here but if HMM wanted to use unaddressable memory > as a DMA target, it could set that function to create a window ine gpu > memory, then call the pci_p2p_same_segment and return the result as the > dma

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 11:19 AM, Logan Gunthorpe wrote: > > > On 19/04/17 12:11 PM, Logan Gunthorpe wrote: >> >> >> On 19/04/17 11:41 AM, Dan Williams wrote: >>> No, not quite ;-). I still don't think we should require the non-HMM >>> to pass NULL for all the HMM arguments.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 12:11 PM, Logan Gunthorpe wrote: > > > On 19/04/17 11:41 AM, Dan Williams wrote: >> No, not quite ;-). I still don't think we should require the non-HMM >> to pass NULL for all the HMM arguments. What I like about Logan's >> proposal is to have a separate create and register steps

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 11:41 AM, Dan Williams wrote: > No, not quite ;-). I still don't think we should require the non-HMM > to pass NULL for all the HMM arguments. What I like about Logan's > proposal is to have a separate create and register steps dev_pagemap. > That way call paths that don't care about

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 11:14 AM, Jason Gunthorpe wrote: > I don't see a use for the dma_map function pointer at this point.. Yes, it is kind of like designing for the future. I just find it a little odd calling the pci functions in the iommu. > It doesn't make alot of sense for the completor of the DMA

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 10:32 AM, Jerome Glisse wrote: > On Wed, Apr 19, 2017 at 10:01:23AM -0700, Dan Williams wrote: >> On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe wrote: >> > >> > >> > On 19/04/17 09:55 AM, Jason Gunthorpe wrote: >> >> I was

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jerome Glisse
On Wed, Apr 19, 2017 at 10:01:23AM -0700, Dan Williams wrote: > On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe wrote: > > > > > > On 19/04/17 09:55 AM, Jason Gunthorpe wrote: > >> I was thinking only this one would be supported with a core code > >> helper.. > > > >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 10:48:51AM -0600, Logan Gunthorpe wrote: > The pci_enable_p2p_bar function would then just need to call > devm_memremap_pages with the dma_map callback set to a function that > does the segment check and the offset calculation. I don't see a use for the dma_map function

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe wrote: > > > On 19/04/17 09:55 AM, Jason Gunthorpe wrote: >> I was thinking only this one would be supported with a core code >> helper.. > > Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a > type flag

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe
On 19/04/17 09:55 AM, Jason Gunthorpe wrote: > I was thinking only this one would be supported with a core code > helper.. Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a type flag to the dev_pagemap structure which would be very useful to us. We could add another

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 11:20:06AM +1000, Benjamin Herrenschmidt wrote: > That helper wouldn't perform the actual iommu mapping. It would simply > return something along the lines of: > > - "use that alternate bus address and don't map in the iommu" I was thinking only this one would be

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 16:24 -0600, Jason Gunthorpe wrote: > Basically, all this list processing is a huge overhead compared to > just putting a helper call in the existing sg iteration loop of the > actual op.  Particularly if the actual op is a no-op like no-mmu x86 > would use. Yes, I'm leaning

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 17:21 -0600, Jason Gunthorpe wrote: > Splitting the sgl is different from iommu batching. > > As an example, an O_DIRECT write of 1 MB with a single 4K P2P page in > the middle. > > The optimum behavior is to allocate a 1MB-4K iommu range and fill it > with the CPU memory.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 15:22 -0600, Jason Gunthorpe wrote: > On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: > > > I think this opens an even bigger can of worms.. > > > > No, I don't think it does. You'd only shim when the target page is > > backed by a device, not host memory, and

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 15:03 -0600, Jason Gunthorpe wrote: > I don't follow, when does get_dma_ops() return a p2p aware provider? > It has no way to know if the DMA is going to involve p2p, get_dma_ops > is called with the device initiating the DMA. > > So you'd always return the P2P shim on a

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 14:48 -0600, Logan Gunthorpe wrote: > > ...and that dma_map goes through get_dma_ops(), so I don't see the conflict? > > The main conflict is in dma_map_sg which only does get_dma_ops once but > the sg may contain memory of different types. We can handle that in our

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 12:00 -0600, Jason Gunthorpe wrote: > - All platforms can succeed if the PCI devices are under the same >   'segment', but where segments begin is somewhat platform specific >   knowledge. (this is 'same switch' idea Logan has talked about) We also need to be careful whether

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 10:27 -0700, Dan Williams wrote: > > FWIW, RDMA probably wouldn't want to use a p2mem device either, we > > already have APIs that map BAR memory to user space, and would like to > > keep using them. A 'enable P2P for bar' helper function sounds better > > to me. > > ...and

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 03:51:27PM -0700, Dan Williams wrote: > > This really seems like much less trouble than trying to wrapper all > > the arch's dma ops, and doesn't have the wonky restrictions. > > I don't think the root bus iommu drivers have any business knowing or > caring about dma

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 04:24 PM, Jason Gunthorpe wrote: > Try and write a stacked map_sg function like you describe and you will > see how horrible it quickly becomes. Yes, unfortunately, I have to agree with this statement completely. > Since dma mapping is a performance path we must be careful not to >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 3:56 PM, Logan Gunthorpe wrote: > > > On 18/04/17 04:50 PM, Dan Williams wrote: >> On Tue, Apr 18, 2017 at 3:48 PM, Logan Gunthorpe wrote: >>> >>> >>> On 18/04/17 04:28 PM, Dan Williams wrote: Unlike the pci bus address

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 04:50 PM, Dan Williams wrote: > On Tue, Apr 18, 2017 at 3:48 PM, Logan Gunthorpe wrote: >> >> >> On 18/04/17 04:28 PM, Dan Williams wrote: >>> Unlike the pci bus address offset case which I think is fundamental to >>> support since shipping archs do this today,

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 3:46 PM, Benjamin Herrenschmidt wrote: > On Tue, 2017-04-18 at 10:27 -0700, Dan Williams wrote: >> > FWIW, RDMA probably wouldn't want to use a p2mem device either, we >> > already have APIs that map BAR memory to user space, and would like to >>

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 3:48 PM, Logan Gunthorpe wrote: > > > On 18/04/17 04:28 PM, Dan Williams wrote: >> Unlike the pci bus address offset case which I think is fundamental to >> support since shipping archs do this today, I think it is ok to say >> p2p is restricted to a

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 04:28 PM, Dan Williams wrote: > Unlike the pci bus address offset case which I think is fundamental to > support since shipping archs do this today, I think it is ok to say > p2p is restricted to a single sgl that gets to talk to host memory or > a single device. That said, what's

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 03:28:17PM -0700, Dan Williams wrote: > Unlike the pci bus address offset case which I think is fundamental to > support since shipping archs do this toda But we can support this by modifying those arch's unique dma_ops directly. Eg as I explained, my

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 3:15 PM, Logan Gunthorpe wrote: > > > On 18/04/17 03:36 PM, Dan Williams wrote: >> On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe >> wrote: >>> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: > I

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 03:31:58PM -0600, Logan Gunthorpe wrote: > 1) It means that sg_has_p2p has to walk the entire sg and check every > page. Then map_sg_p2p/map_sg has to walk it again and repeat the check > then do some operation per page. If anyone is concerned about the > dma_map

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 03:36 PM, Dan Williams wrote: > On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe > wrote: >> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: I think this opens an even bigger can of worms.. >>> >>> No, I don't think it does. You'd

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe wrote: > On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: >> > I think this opens an even bigger can of worms.. >> >> No, I don't think it does. You'd only shim when the target page is >> backed by a

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 03:03 PM, Jason Gunthorpe wrote: > What about something more incremental like this instead: > - dma_ops will set map_sg_p2p == map_sg when they are updated to > support p2p, otherwise DMA on P2P pages will fail for those ops. > - When all ops support p2p we remove the if and

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: > > I think this opens an even bigger can of worms.. > > No, I don't think it does. You'd only shim when the target page is > backed by a device, not host memory, and you can figure this out by a > is_zone_device_page()-style lookup.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 2:03 PM, Jason Gunthorpe wrote: > On Tue, Apr 18, 2017 at 12:48:35PM -0700, Dan Williams wrote: > >> > Yes, I noticed this problem too and that makes sense. It just means >> > every dma_ops will probably need to be modified to either

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 12:48:35PM -0700, Dan Williams wrote: > > Yes, I noticed this problem too and that makes sense. It just means > > every dma_ops will probably need to be modified to either support p2p > > pages or fail on them. Though, the only real difficulty there is that it > > will be

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 02:31 PM, Dan Williams wrote: > On Tue, Apr 18, 2017 at 1:29 PM, Jerome Glisse wrote: >>> On Tue, Apr 18, 2017 at 12:35 PM, Logan Gunthorpe >>> wrote: On 18/04/17 01:01 PM, Jason Gunthorpe wrote: > Ultimately every dma_ops

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 1:29 PM, Jerome Glisse wrote: >> On Tue, Apr 18, 2017 at 12:35 PM, Logan Gunthorpe >> wrote: >> > >> > >> > On 18/04/17 01:01 PM, Jason Gunthorpe wrote: >> >> Ultimately every dma_ops will need special code to support P2P with >>

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jerome Glisse
> On Tue, Apr 18, 2017 at 12:35 PM, Logan Gunthorpe > wrote: > > > > > > On 18/04/17 01:01 PM, Jason Gunthorpe wrote: > >> Ultimately every dma_ops will need special code to support P2P with > >> the special hardware that ops is controlling, so it makes some sense > >> to

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 01:48 PM, Jason Gunthorpe wrote: > I think this is why progress on this keeps getting stuck - every > solution is a lot of work. Yup! There's also a ton of work just to get the iomem safety issues addressed. Let alone the dma mapping issues. > You could try to do a dummy mapping /

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 01:35:32PM -0600, Logan Gunthorpe wrote: > > Ultimately every dma_ops will need special code to support P2P with > > the special hardware that ops is controlling, so it makes some sense > > to start by pushing the check down there in the first place. This > > advice is

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 12:35 PM, Logan Gunthorpe wrote: > > > On 18/04/17 01:01 PM, Jason Gunthorpe wrote: >> Ultimately every dma_ops will need special code to support P2P with >> the special hardware that ops is controlling, so it makes some sense >> to start by pushing

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 01:01 PM, Jason Gunthorpe wrote: > Ultimately every dma_ops will need special code to support P2P with > the special hardware that ops is controlling, so it makes some sense > to start by pushing the check down there in the first place. This > advice is partially motivated by how

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 12:30:59PM -0600, Logan Gunthorpe wrote: > > - The dma ops provider must be able to tell if source memory is bar > >mapped and recover the pci device backing the mapping. > > Do you mean to say that every dma-ops provider needs to be taught about > p2p backed pages?

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 11:00 AM, Jason Gunthorpe wrote: > On Tue, Apr 18, 2017 at 10:27:47AM -0700, Dan Williams wrote: >> > FWIW, RDMA probably wouldn't want to use a p2mem device either, we >> > already have APIs that map BAR memory to user space, and would

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 18/04/17 10:45 AM, Jason Gunthorpe wrote: > From Ben's comments, I would think that the 'first class' support that > is needed here is simply a function to return the 'struct device' > backing a CPU address range. Yes, and Dan's get_dev_pagemap suggestion gets us 90% of the way there. It's

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Tue, Apr 18, 2017 at 10:27:47AM -0700, Dan Williams wrote: > > FWIW, RDMA probably wouldn't want to use a p2mem device either, we > > already have APIs that map BAR memory to user space, and would like to > > keep using them. A 'enable P2P for bar' helper function sounds better > > to me. > >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Dan Williams
On Tue, Apr 18, 2017 at 9:45 AM, Jason Gunthorpe wrote: > On Mon, Apr 17, 2017 at 08:23:16AM +1000, Benjamin Herrenschmidt wrote: > >> Thanks :-) There's a reason why I'm insisting on this. We have constant >> requests for this today. We have hacks in the GPU

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Jason Gunthorpe
On Mon, Apr 17, 2017 at 08:23:16AM +1000, Benjamin Herrenschmidt wrote: > Thanks :-) There's a reason why I'm insisting on this. We have constant > requests for this today. We have hacks in the GPU drivers to do it for > GPUs behind a switch, but those are just that, ad-hoc hacks in the >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 23:43 -0600, Logan Gunthorpe wrote: > > On 17/04/17 03:11 PM, Benjamin Herrenschmidt wrote: > > Is it ? Again, you create a "concept" the user may have no idea about, > > "p2pmem memory". So now any kind of memory buffer on a device can could > > be use for p2p but also

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Logan Gunthorpe
On 17/04/17 12:04 PM, Jerome Glisse wrote: > I disagree here. I would rather see Peer-to-Peer mapping as a form > of helper so that device driver can opt-in for multiple mecanisms > concurrently. Like HMM and p2p. I'm not against moving some of the common stuff into a library. It sounds like

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Logan Gunthorpe
On 17/04/17 03:11 PM, Benjamin Herrenschmidt wrote: > Is it ? Again, you create a "concept" the user may have no idea about, > "p2pmem memory". So now any kind of memory buffer on a device can could > be use for p2p but also potentially a bunch of other things becomes > special and called

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Logan Gunthorpe
On 17/04/17 11:04 AM, Dan Williams wrote: >> Yes, in this scheme, it needs an additional p2pmem child. Why is that an >> issue? It certainly makes it a lot easier for the user to understand the >> p2pmem memory in the system (through the sysfs tree) and reason about >> the topology and when to

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 10:52 -0600, Logan Gunthorpe wrote: > > On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote: > > But is it ? For example take a GPU, does it, in your scheme, need an > > additional "p2pmem" child ? Why can't the GPU driver just use some > > helper to instantiate the necessary

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Jerome Glisse
On Mon, Apr 17, 2017 at 10:52:29AM -0600, Logan Gunthorpe wrote: > > > On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote: > > But is it ? For example take a GPU, does it, in your scheme, need an > > additional "p2pmem" child ? Why can't the GPU driver just use some > > helper to instantiate the

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Dan Williams
On Mon, Apr 17, 2017 at 9:52 AM, Logan Gunthorpe wrote: > > > On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote: >> But is it ? For example take a GPU, does it, in your scheme, need an >> additional "p2pmem" child ? Why can't the GPU driver just use some >> helper to

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Logan Gunthorpe
On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote: > But is it ? For example take a GPU, does it, in your scheme, need an > additional "p2pmem" child ? Why can't the GPU driver just use some > helper to instantiate the necessary struct pages ? What does having an > actual "struct device" child

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 23:13 -0600, Logan Gunthorpe wrote: > > > > > I'm still not 100% why do you need a "p2mem device" mind you ... > > Well, you don't "need" it but it is a design choice that I think makes a > lot of sense for the following reasons: > > 1) p2pmem is in fact a device on the

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Logan Gunthorpe
On 16/04/17 04:32 PM, Benjamin Herrenschmidt wrote: >> I'll consider this. Given the fact I can use your existing >> get_dev_pagemap infrastructure to look up the p2pmem device this >> probably isn't as hard as I thought it would be anyway (we probably >> don't even need a page flag). We'd just

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 10:34 -0600, Logan Gunthorpe wrote: > > On 16/04/17 09:53 AM, Dan Williams wrote: > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > > context about the physical address in question. I'm thinking you can > > hang bus address translation data off of

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 10:47 -0600, Logan Gunthorpe wrote: > > I think you need to give other archs a chance to support this with a > > design that considers the offset case as a first class citizen rather > > than an afterthought. > > I'll consider this. Given the fact I can use your existing >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 08:53 -0700, Dan Williams wrote: > > Just thinking out loud ... I don't have a firm idea or a design. But > > peer to peer is definitely a problem we need to tackle generically, the > > demand for it keeps coming up. > > ZONE_DEVICE allows you to redirect via

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 08:44 -0700, Dan Williams wrote: > The difference is that there was nothing fundamental in the core > design of pmem + DAX that prevented other archs from growing pmem > support. Indeed. In fact we have work in progress support for pmem on power using experimental HW. > THP

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Logan Gunthorpe
On 16/04/17 09:44 AM, Dan Williams wrote: > I think we very much want the dma mapping layer to be in the way. > It's the only sane semantic we have to communicate this translation. Yes, I wasn't proposing bypassing that layer, per say. I just meant that the layer would, in the end, have to

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Logan Gunthorpe
On 16/04/17 09:53 AM, Dan Williams wrote: > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > context about the physical address in question. I'm thinking you can > hang bus address translation data off of that structure. This seems > vaguely similar to what HMM is doing.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Dan Williams
On Sat, Apr 15, 2017 at 8:01 PM, Benjamin Herrenschmidt wrote: > On Sat, 2017-04-15 at 15:09 -0700, Dan Williams wrote: >> I'm wondering, since this is limited to support behind a single >> switch, if you could have a software-iommu hanging off that switch >> device

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Dan Williams
On Sat, Apr 15, 2017 at 10:36 PM, Logan Gunthorpe wrote: > > > On 15/04/17 04:17 PM, Benjamin Herrenschmidt wrote: >> You can't. If the iommu is on, everything is remapped. Or do you mean >> to have dma_map_* not do a remapping ? > > Well, yes, you'd have to change the code

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Logan Gunthorpe
On 15/04/17 04:17 PM, Benjamin Herrenschmidt wrote: > You can't. If the iommu is on, everything is remapped. Or do you mean > to have dma_map_* not do a remapping ? Well, yes, you'd have to change the code so that iomem pages do not get remapped and the raw BAR address is passed to the DMA

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Logan Gunthorpe
On 15/04/17 09:01 PM, Benjamin Herrenschmidt wrote: > Are ZONE_DEVICE pages identifiable based on the struct page alone ? (a > flag ?) Well you can't use ZONE_DEVICE as an indicator. They may be regular RAM, (eg. pmem). It would need a separate flag indicating it is backed by iomem. Logan

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Benjamin Herrenschmidt
On Sat, 2017-04-15 at 15:09 -0700, Dan Williams wrote: > I'm wondering, since this is limited to support behind a single > switch, if you could have a software-iommu hanging off that switch > device object that knows how to catch and translate the non-zero > offset bus address case. We have

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Benjamin Herrenschmidt
On Sat, 2017-04-15 at 11:41 -0600, Logan Gunthorpe wrote: > Thanks, Benjamin, for the summary of some of the issues. > > On 14/04/17 04:07 PM, Benjamin Herrenschmidt wrote > > So I assume the p2p code provides a way to address that too via special > > dma_ops ? Or wrappers ? > > Not at this

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Dan Williams
On Sat, Apr 15, 2017 at 10:41 AM, Logan Gunthorpe wrote: > Thanks, Benjamin, for the summary of some of the issues. > > On 14/04/17 04:07 PM, Benjamin Herrenschmidt wrote >> So I assume the p2p code provides a way to address that too via special >> dma_ops ? Or wrappers ? > >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Logan Gunthorpe
Thanks, Benjamin, for the summary of some of the issues. On 14/04/17 04:07 PM, Benjamin Herrenschmidt wrote > So I assume the p2p code provides a way to address that too via special > dma_ops ? Or wrappers ? Not at this time. We will probably need a way to ensure the iommus do not attempt to

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Fri, 2017-04-14 at 14:04 -0500, Bjorn Helgaas wrote: > I'm a little hesitant about excluding offset support, so I'd like to > hear more about this. > > Is the issue related to PCI BARs that are not completely addressable > by the CPU?  If so, that sounds like a first-class issue that should >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Bjorn Helgaas
On Fri, Apr 14, 2017 at 11:30:14AM -0600, Logan Gunthorpe wrote: > On 14/04/17 05:37 AM, Benjamin Herrenschmidt wrote: > > I object to designing a subsystem that by design cannot work on whole > > categories of architectures out there. > > Hardly. That's extreme. We'd design a subsystem that

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Logan Gunthorpe
On 14/04/17 05:37 AM, Benjamin Herrenschmidt wrote: > I object to designing a subsystem that by design cannot work on whole > categories of architectures out there. Hardly. That's extreme. We'd design a subsystem that works for the easy cases and needs more work to support the offset cases. It

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 22:40 -0600, Logan Gunthorpe wrote: > > On 13/04/17 10:16 PM, Jason Gunthorpe wrote: > > I'd suggest just detecting if there is any translation in bus > > addresses anywhere and just hard disabling P2P on such systems. > > That's a fantastic suggestion. It simplifies things

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Fri, 2017-04-14 at 21:37 +1000, Benjamin Herrenschmidt wrote: > On Thu, 2017-04-13 at 22:40 -0600, Logan Gunthorpe wrote: > > > > On 13/04/17 10:16 PM, Jason Gunthorpe wrote: > > > I'd suggest just detecting if there is any translation in bus > > > addresses anywhere and just hard disabling

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 22:16 -0600, Jason Gunthorpe wrote: > > Any caller of pci_add_resource_offset() uses CPU addresses different from > > the PCI bus addresses (unless the offset is zero, of course).  All ACPI > > platforms also support this translation (see "translation_offset"), though > > in

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-13 Thread Logan Gunthorpe
On 13/04/17 10:16 PM, Jason Gunthorpe wrote: > I'd suggest just detecting if there is any translation in bus > addresses anywhere and just hard disabling P2P on such systems. That's a fantastic suggestion. It simplifies things significantly. Unless there are any significant objections I think I

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-13 Thread Jason Gunthorpe
On Thu, Apr 13, 2017 at 06:26:31PM -0500, Bjorn Helgaas wrote: > > Ah, thanks for the tip! On my system, this translation returns the same > > address so it was not necessary. And, yes, that means this would have to > > find its way into the dma mapping routine somehow. This means we'll > >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-13 Thread Bjorn Helgaas
On Thu, Apr 13, 2017 at 03:22:06PM -0600, Logan Gunthorpe wrote: > > > On 12/04/17 03:55 PM, Benjamin Herrenschmidt wrote: > > Look at pcibios_resource_to_bus() and pcibios_bus_to_resource(). They > > will perform the conversion between the struct resource content (CPU > > physical address) and

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-13 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 15:22 -0600, Logan Gunthorpe wrote: > > On 12/04/17 03:55 PM, Benjamin Herrenschmidt wrote: > > Look at pcibios_resource_to_bus() and pcibios_bus_to_resource(). They > > will perform the conversion between the struct resource content (CPU > > physical address) and the actual

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-13 Thread Logan Gunthorpe
On 12/04/17 03:55 PM, Benjamin Herrenschmidt wrote: > Look at pcibios_resource_to_bus() and pcibios_bus_to_resource(). They > will perform the conversion between the struct resource content (CPU > physical address) and the actual PCI bus side address. Ah, thanks for the tip! On my system, this

  1   2   >