Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Fri, Apr 20, 2018 at 12:44:01PM +0200, Christian König wrote: > > > What we need is an sg_alloc_table_from_resources(dev, resources, > > > num_resources) which does the handling common to all drivers. > > A structure that contains > > > > {page,offset,len} + {dma_addr+dma_len} > > > > is not a good container for storing > > > > {virt addr, dma_addr, len} > > > > no matter what interface you build arond it. > > Why not? I mean at least for my use case we actually don't need the virtual > address. If you don't need the virtual address you need scatterlist even list. > What we need is {dma_addr+dma_len} in a consistent interface which can come > from both {page,offset,len} as well as {resource, len}. Ok. > What I actually don't need is separate handling for system memory and > resources, but that would we get exactly when we don't use sg_table. At the very lowest level they will need to be handled differently for many architectures, the questions is at what point we'll do the branching out. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
Am 20.04.2018 um 12:17 schrieb Christoph Hellwig: On Fri, Apr 20, 2018 at 10:58:50AM +0200, Christian König wrote: Yes there's a bit a layering violation insofar that drivers really shouldn't each have their own copy of "how do I convert a piece of dma memory into dma-buf", but that doesn't render the interface a bad idea. Completely agree on that. What we need is an sg_alloc_table_from_resources(dev, resources, num_resources) which does the handling common to all drivers. A structure that contains {page,offset,len} + {dma_addr+dma_len} is not a good container for storing {virt addr, dma_addr, len} no matter what interface you build arond it. Why not? I mean at least for my use case we actually don't need the virtual address. What we need is {dma_addr+dma_len} in a consistent interface which can come from both {page,offset,len} as well as {resource, len}. What I actually don't need is separate handling for system memory and resources, but that would we get exactly when we don't use sg_table. Christian. And that is discounting all the problems around mapping coherent allocations for other devices, or the iommu merging problem we are having another thread on. So let's come up with a better high level interface first, and then worrty about how to implement it in the low-level dma-mapping interface second. Especially given that my consolidation of the dma_map_ops implementation is in full stream and there shoudn't be all that many to bother with. So first question: Do you actually care about having multiple pairs of the above, or instead of all chunks just deal with a single of the above? In that case we really should not need that many new interfaces as dma_map_resource will be all you need anyway. Christian. -Daniel ---end quoted text--- ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Fri, Apr 20, 2018 at 10:58:50AM +0200, Christian König wrote: > > Yes there's a bit a layering violation insofar that drivers really > > shouldn't each have their own copy of "how do I convert a piece of dma > > memory into dma-buf", but that doesn't render the interface a bad idea. > > Completely agree on that. > > What we need is an sg_alloc_table_from_resources(dev, resources, > num_resources) which does the handling common to all drivers. A structure that contains {page,offset,len} + {dma_addr+dma_len} is not a good container for storing {virt addr, dma_addr, len} no matter what interface you build arond it. And that is discounting all the problems around mapping coherent allocations for other devices, or the iommu merging problem we are having another thread on. So let's come up with a better high level interface first, and then worrty about how to implement it in the low-level dma-mapping interface second. Especially given that my consolidation of the dma_map_ops implementation is in full stream and there shoudn't be all that many to bother with. So first question: Do you actually care about having multiple pairs of the above, or instead of all chunks just deal with a single of the above? In that case we really should not need that many new interfaces as dma_map_resource will be all you need anyway. > > Christian. > > > -Daniel > ---end quoted text--- ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
Am 20.04.2018 um 09:13 schrieb Daniel Vetter: On Thu, Apr 19, 2018 at 01:16:57AM -0700, Christoph Hellwig wrote: On Mon, Apr 16, 2018 at 03:38:56PM +0200, Daniel Vetter wrote: We've broken that assumption in i915 years ago. Not struct page backed gpu memory is very real. Of course we'll never feed such a strange sg table to a driver which doesn't understand it, but allowing sg_page == NULL works perfectly fine. At least for gpu drivers. For GPU drivers on x86 with no dma coherency problems, sure. But not all the world is x86. We already have problems due to dmabugs use of the awkward get_sgtable interface (see the common on arm_dma_get_sgtable that I fully agree with), and doing this for memory that doesn't have a struct page at all will make things even worse. x86 dma isn't coherent either, if you're a GPU :-) Flushing gpu caches tends to be too expensive, so there's pci-e support and chipset support to forgo it. Plus drivers flushing caches themselves. The dma_get_sgtable thing is indeed fun, right solution would probably be to push the dma-buf export down into the dma layer. The comment for arm_dma_get_sgtable is also not a realy concern, because dma-buf also abstracts away the flushing (or well is supposed to), so there really shouldn't be anyone calling the streaming apis on the returned sg table. That's why dma-buf gives you an sg table that's mapped already. If that's not acceptable then I guess we could go over the entire tree and frob all the gpu related code to switch over to a new struct sg_table_might_not_be_struct_page_backed, including all the other functions we added over the past few years to iterate over sg tables. But seems slightly silly, given that sg tables seem to do exactly what we need. It isn't silly. We will have to do some surgery like that anyway because the current APIs don't work. So relax, sit back and come up with an API that solves the existing issues and serves us well in the future. So we should just implement a copy of sg table for dma-buf, since I still think it does exactly what we need for gpus? Yes there's a bit a layering violation insofar that drivers really shouldn't each have their own copy of "how do I convert a piece of dma memory into dma-buf", but that doesn't render the interface a bad idea. Completely agree on that. What we need is an sg_alloc_table_from_resources(dev, resources, num_resources) which does the handling common to all drivers. Christian. -Daniel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Thu, Apr 19, 2018 at 01:16:57AM -0700, Christoph Hellwig wrote: > On Mon, Apr 16, 2018 at 03:38:56PM +0200, Daniel Vetter wrote: > > We've broken that assumption in i915 years ago. Not struct page backed > > gpu memory is very real. > > > > Of course we'll never feed such a strange sg table to a driver which > > doesn't understand it, but allowing sg_page == NULL works perfectly > > fine. At least for gpu drivers. > > For GPU drivers on x86 with no dma coherency problems, sure. But not > all the world is x86. We already have problems due to dmabugs use > of the awkward get_sgtable interface (see the common on > arm_dma_get_sgtable that I fully agree with), and doing this for memory > that doesn't have a struct page at all will make things even worse. x86 dma isn't coherent either, if you're a GPU :-) Flushing gpu caches tends to be too expensive, so there's pci-e support and chipset support to forgo it. Plus drivers flushing caches themselves. The dma_get_sgtable thing is indeed fun, right solution would probably be to push the dma-buf export down into the dma layer. The comment for arm_dma_get_sgtable is also not a realy concern, because dma-buf also abstracts away the flushing (or well is supposed to), so there really shouldn't be anyone calling the streaming apis on the returned sg table. That's why dma-buf gives you an sg table that's mapped already. > > If that's not acceptable then I guess we could go over the entire tree > > and frob all the gpu related code to switch over to a new struct > > sg_table_might_not_be_struct_page_backed, including all the other > > functions we added over the past few years to iterate over sg tables. > > But seems slightly silly, given that sg tables seem to do exactly what > > we need. > > It isn't silly. We will have to do some surgery like that anyway > because the current APIs don't work. So relax, sit back and come up > with an API that solves the existing issues and serves us well in > the future. So we should just implement a copy of sg table for dma-buf, since I still think it does exactly what we need for gpus? Yes there's a bit a layering violation insofar that drivers really shouldn't each have their own copy of "how do I convert a piece of dma memory into dma-buf", but that doesn't render the interface a bad idea. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Mon, Apr 16, 2018 at 03:38:56PM +0200, Daniel Vetter wrote: > We've broken that assumption in i915 years ago. Not struct page backed > gpu memory is very real. > > Of course we'll never feed such a strange sg table to a driver which > doesn't understand it, but allowing sg_page == NULL works perfectly > fine. At least for gpu drivers. For GPU drivers on x86 with no dma coherency problems, sure. But not all the world is x86. We already have problems due to dmabugs use of the awkward get_sgtable interface (see the common on arm_dma_get_sgtable that I fully agree with), and doing this for memory that doesn't have a struct page at all will make things even worse. > If that's not acceptable then I guess we could go over the entire tree > and frob all the gpu related code to switch over to a new struct > sg_table_might_not_be_struct_page_backed, including all the other > functions we added over the past few years to iterate over sg tables. > But seems slightly silly, given that sg tables seem to do exactly what > we need. It isn't silly. We will have to do some surgery like that anyway because the current APIs don't work. So relax, sit back and come up with an API that solves the existing issues and serves us well in the future. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Mon, Apr 16, 2018 at 2:39 PM, Christoph Hellwig wrote: > On Tue, Apr 03, 2018 at 08:08:32PM +0200, Daniel Vetter wrote: >> I did not mean you should dma_map_sg/page. I just meant that using >> dma_map_resource to fill only the dma address part of the sg table seems >> perfectly sufficient. > > But that is not how the interface work, especially facing sg_dma_len. > >> Assuming you get an sg table that's been mapping by calling dma_map_sg was >> always a bit a case of bending the abstraction to avoid typing code. The >> only thing an importer ever should have done is look at the dma addresses >> in that sg table, nothing else. > > The scatterlist is not a very good abstraction unfortunately, but it > it is spread all over the kernel. And we do expect that anyone who > gets passed a scatterlist can use sg_page() or sg_virt() (which calls > sg_page()) on it. Your changes would break that, and will cause major > trouble because of that. > > If you want to expose p2p memory returned from dma_map_resource in > dmabuf do not use scatterlists for this please, but with a new interface > that explicitly passes a virtual address, a dma address and a length > and make it very clear that virt_to_page will not work on the virtual > address. We've broken that assumption in i915 years ago. Not struct page backed gpu memory is very real. Of course we'll never feed such a strange sg table to a driver which doesn't understand it, but allowing sg_page == NULL works perfectly fine. At least for gpu drivers. If that's not acceptable then I guess we could go over the entire tree and frob all the gpu related code to switch over to a new struct sg_table_might_not_be_struct_page_backed, including all the other functions we added over the past few years to iterate over sg tables. But seems slightly silly, given that sg tables seem to do exactly what we need. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Tue, Apr 03, 2018 at 08:08:32PM +0200, Daniel Vetter wrote: > I did not mean you should dma_map_sg/page. I just meant that using > dma_map_resource to fill only the dma address part of the sg table seems > perfectly sufficient. But that is not how the interface work, especially facing sg_dma_len. > Assuming you get an sg table that's been mapping by calling dma_map_sg was > always a bit a case of bending the abstraction to avoid typing code. The > only thing an importer ever should have done is look at the dma addresses > in that sg table, nothing else. The scatterlist is not a very good abstraction unfortunately, but it it is spread all over the kernel. And we do expect that anyone who gets passed a scatterlist can use sg_page() or sg_virt() (which calls sg_page()) on it. Your changes would break that, and will cause major trouble because of that. If you want to expose p2p memory returned from dma_map_resource in dmabuf do not use scatterlists for this please, but with a new interface that explicitly passes a virtual address, a dma address and a length and make it very clear that virt_to_page will not work on the virtual address. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Tue, Apr 03, 2018 at 01:06:45PM -0400, Jerome Glisse wrote: > On Tue, Apr 03, 2018 at 11:09:09AM +0200, Daniel Vetter wrote: > > On Thu, Mar 29, 2018 at 01:34:24PM +0200, Christian König wrote: > > > Am 29.03.2018 um 08:57 schrieb Daniel Vetter: > > > > On Sun, Mar 25, 2018 at 12:59:56PM +0200, Christian König wrote: > > > > > Add a peer2peer flag noting that the importer can deal with device > > > > > resources which are not backed by pages. > > > > > > > > > > Signed-off-by: Christian König > > > > Um strictly speaking they all should, but ttm never bothered to use the > > > > real interfaces but just hacked around the provided sg list, grabbing > > > > the > > > > underlying struct pages, then rebuilding&remapping the sg list again. > > > > > > Actually that isn't correct. TTM converts them to a dma address array > > > because drivers need it like this (at least nouveau, radeon and amdgpu). > > > > > > I've fixed radeon and amdgpu to be able to deal without it and mailed with > > > Ben about nouveau, but the outcome is they don't really know. > > > > > > TTM itself doesn't have any need for the pages on imported BOs (you can't > > > mmap them anyway), the real underlying problem is that sg tables doesn't > > > provide what drivers need. > > > > > > I think we could rather easily fix sg tables, but that is a totally > > > separate > > > task. > > > > Looking at patch 8, the sg table seems perfectly sufficient to convey the > > right dma addresses to the importer. Ofcourse the exporter has to set up > > the right kind of iommu mappings to make this work. > > > > > > The entire point of using sg lists was exactly to allow this use case of > > > > peer2peer dma (or well in general have special exporters which managed > > > > memory/IO ranges not backed by struct page). So essentially you're > > > > having > > > > a "I'm totally not broken flag" here. > > > > > > No, independent of needed struct page pointers we need to note if the > > > exporter can handle peer2peer stuff from the hardware side in general. > > > > > > So what I've did is just to set peer2peer allowed on the importer because > > > of > > > the driver needs and clear it in the exporter if the hardware can't handle > > > that. > > > > The only thing the importer seems to do is call the > > pci_peer_traffic_supported, which the exporter could call too. What am I > > missing (since the sturct_page stuff sounds like it's fixed already by > > you)? > > -Daniel > > AFAIK Logan patchset require to register and initialize struct page > for the device memory you want to map (export from exporter point of > view). > > With GPU this isn't something we want, struct page is >~= 2^6 so for > 4GB GPU = 2^6*2^32/2^12 = 2^26 = 64MB of RAM > 8GB GPU = 2^6*2^33/2^12 = 2^27 = 128MB of RAM > 16GB GPU = 2^6*2^34/2^12 = 2^28 = 256MB of RAM > 32GB GPU = 2^6*2^34/2^12 = 2^29 = 512MB of RAM > > All this is mostly wasted as only a small sub-set (that can not be > constraint to specific range) will ever be exported at any point in > time. For GPU work load this is hardly justifiable, even for HMM i > do not plan to register all those pages. > > Hence why i argue that dma_map_resource() like use by Christian is > good enough for us. People that care about SG can fix that but i > rather not have to depend on that and waste system memory. I did not mean you should dma_map_sg/page. I just meant that using dma_map_resource to fill only the dma address part of the sg table seems perfectly sufficient. And that's exactly why the importer gets an already mapped sg table, so that it doesn't have to call dma_map_sg on something that dma_map_sg can't handle. Assuming you get an sg table that's been mapping by calling dma_map_sg was always a bit a case of bending the abstraction to avoid typing code. The only thing an importer ever should have done is look at the dma addresses in that sg table, nothing else. And p2p seems to perfectly fit into this (surprise, it was meant to). That's why I suggested we annotate the broken importers who assume the sg table is mapped using dma_map_sg or has a struct_page backing the memory (but there doesn't seem to be any left it seems) instead of annotating the ones that aren't broken with a flag that's confusing - you could also have a dma-buf sgt that points at some other memory that doesn't have struct pages backing it. Aside: At least internally in i915 we've been using this forever for our own private/stolen memory. Unfortunately no other device can access that range of memory, which is why we don't allow it to be imported to anything but i915 itself. But if that hw restriction doesn't exist, it'd would work. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Tue, Apr 03, 2018 at 11:09:09AM +0200, Daniel Vetter wrote: > On Thu, Mar 29, 2018 at 01:34:24PM +0200, Christian König wrote: > > Am 29.03.2018 um 08:57 schrieb Daniel Vetter: > > > On Sun, Mar 25, 2018 at 12:59:56PM +0200, Christian König wrote: > > > > Add a peer2peer flag noting that the importer can deal with device > > > > resources which are not backed by pages. > > > > > > > > Signed-off-by: Christian König > > > Um strictly speaking they all should, but ttm never bothered to use the > > > real interfaces but just hacked around the provided sg list, grabbing the > > > underlying struct pages, then rebuilding&remapping the sg list again. > > > > Actually that isn't correct. TTM converts them to a dma address array > > because drivers need it like this (at least nouveau, radeon and amdgpu). > > > > I've fixed radeon and amdgpu to be able to deal without it and mailed with > > Ben about nouveau, but the outcome is they don't really know. > > > > TTM itself doesn't have any need for the pages on imported BOs (you can't > > mmap them anyway), the real underlying problem is that sg tables doesn't > > provide what drivers need. > > > > I think we could rather easily fix sg tables, but that is a totally separate > > task. > > Looking at patch 8, the sg table seems perfectly sufficient to convey the > right dma addresses to the importer. Ofcourse the exporter has to set up > the right kind of iommu mappings to make this work. > > > > The entire point of using sg lists was exactly to allow this use case of > > > peer2peer dma (or well in general have special exporters which managed > > > memory/IO ranges not backed by struct page). So essentially you're having > > > a "I'm totally not broken flag" here. > > > > No, independent of needed struct page pointers we need to note if the > > exporter can handle peer2peer stuff from the hardware side in general. > > > > So what I've did is just to set peer2peer allowed on the importer because of > > the driver needs and clear it in the exporter if the hardware can't handle > > that. > > The only thing the importer seems to do is call the > pci_peer_traffic_supported, which the exporter could call too. What am I > missing (since the sturct_page stuff sounds like it's fixed already by > you)? > -Daniel AFAIK Logan patchset require to register and initialize struct page for the device memory you want to map (export from exporter point of view). With GPU this isn't something we want, struct page is >~= 2^6 so for 4GB GPU = 2^6*2^32/2^12 = 2^26 = 64MB of RAM 8GB GPU = 2^6*2^33/2^12 = 2^27 = 128MB of RAM 16GB GPU = 2^6*2^34/2^12 = 2^28 = 256MB of RAM 32GB GPU = 2^6*2^34/2^12 = 2^29 = 512MB of RAM All this is mostly wasted as only a small sub-set (that can not be constraint to specific range) will ever be exported at any point in time. For GPU work load this is hardly justifiable, even for HMM i do not plan to register all those pages. Hence why i argue that dma_map_resource() like use by Christian is good enough for us. People that care about SG can fix that but i rather not have to depend on that and waste system memory. Cheers, Jérôme ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Thu, Mar 29, 2018 at 01:34:24PM +0200, Christian König wrote: > Am 29.03.2018 um 08:57 schrieb Daniel Vetter: > > On Sun, Mar 25, 2018 at 12:59:56PM +0200, Christian König wrote: > > > Add a peer2peer flag noting that the importer can deal with device > > > resources which are not backed by pages. > > > > > > Signed-off-by: Christian König > > Um strictly speaking they all should, but ttm never bothered to use the > > real interfaces but just hacked around the provided sg list, grabbing the > > underlying struct pages, then rebuilding&remapping the sg list again. > > Actually that isn't correct. TTM converts them to a dma address array > because drivers need it like this (at least nouveau, radeon and amdgpu). > > I've fixed radeon and amdgpu to be able to deal without it and mailed with > Ben about nouveau, but the outcome is they don't really know. > > TTM itself doesn't have any need for the pages on imported BOs (you can't > mmap them anyway), the real underlying problem is that sg tables doesn't > provide what drivers need. > > I think we could rather easily fix sg tables, but that is a totally separate > task. Looking at patch 8, the sg table seems perfectly sufficient to convey the right dma addresses to the importer. Ofcourse the exporter has to set up the right kind of iommu mappings to make this work. > > The entire point of using sg lists was exactly to allow this use case of > > peer2peer dma (or well in general have special exporters which managed > > memory/IO ranges not backed by struct page). So essentially you're having > > a "I'm totally not broken flag" here. > > No, independent of needed struct page pointers we need to note if the > exporter can handle peer2peer stuff from the hardware side in general. > > So what I've did is just to set peer2peer allowed on the importer because of > the driver needs and clear it in the exporter if the hardware can't handle > that. The only thing the importer seems to do is call the pci_peer_traffic_supported, which the exporter could call too. What am I missing (since the sturct_page stuff sounds like it's fixed already by you)? -Daniel > > I think a better approach would be if we add a requires_struct_page or so, > > and annotate the current importers accordingly. Or we just fix them up (it > > is all in shared ttm code after all, I think everyone else got this > > right). > > I would rather not bed on that. > > Christian. > > > -Daniel > > > > > --- > > > drivers/dma-buf/dma-buf.c | 1 + > > > include/linux/dma-buf.h | 4 > > > 2 files changed, 5 insertions(+) > > > > > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > > > index ffaa2f9a9c2c..f420225f93c6 100644 > > > --- a/drivers/dma-buf/dma-buf.c > > > +++ b/drivers/dma-buf/dma-buf.c > > > @@ -565,6 +565,7 @@ struct dma_buf_attachment *dma_buf_attach(const > > > struct dma_buf_attach_info *info > > > attach->dev = info->dev; > > > attach->dmabuf = dmabuf; > > > + attach->peer2peer = info->peer2peer; > > > attach->priv = info->priv; > > > attach->invalidate = info->invalidate; > > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > > > index 15dd8598bff1..1ef50bd9bc5b 100644 > > > --- a/include/linux/dma-buf.h > > > +++ b/include/linux/dma-buf.h > > > @@ -313,6 +313,7 @@ struct dma_buf { > > >* @dmabuf: buffer for this attachment. > > >* @dev: device attached to the buffer. > > >* @node: list of dma_buf_attachment. > > > + * @peer2peer: true if the importer can handle peer resources without > > > pages. > > >* @priv: exporter specific attachment data. > > >* > > >* This structure holds the attachment information between the dma_buf > > > buffer > > > @@ -328,6 +329,7 @@ struct dma_buf_attachment { > > > struct dma_buf *dmabuf; > > > struct device *dev; > > > struct list_head node; > > > + bool peer2peer; > > > void *priv; > > > /** > > > @@ -392,6 +394,7 @@ struct dma_buf_export_info { > > >* @dmabuf: the exported dma_buf > > >* @dev:the device which wants to import the attachment > > >* @priv: private data of importer to this attachment > > > + * @peer2peer: true if the importer can handle peer resources without > > > pages > > >* @invalidate: callback to use for invalidating mappings > > >* > > >* This structure holds the information required to attach to a buffer. > > > Used > > > @@ -401,6 +404,7 @@ struct dma_buf_attach_info { > > > struct dma_buf *dmabuf; > > > struct device *dev; > > > void *priv; > > > + bool peer2peer; > > > void (*invalidate)(struct dma_buf_attachment *attach); > > > }; > > > -- > > > 2.14.1 > > > > > > ___ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > _
Re: [PATCH 4/8] dma-buf: add peer2peer flag
Am 29.03.2018 um 08:57 schrieb Daniel Vetter: On Sun, Mar 25, 2018 at 12:59:56PM +0200, Christian König wrote: Add a peer2peer flag noting that the importer can deal with device resources which are not backed by pages. Signed-off-by: Christian König Um strictly speaking they all should, but ttm never bothered to use the real interfaces but just hacked around the provided sg list, grabbing the underlying struct pages, then rebuilding&remapping the sg list again. Actually that isn't correct. TTM converts them to a dma address array because drivers need it like this (at least nouveau, radeon and amdgpu). I've fixed radeon and amdgpu to be able to deal without it and mailed with Ben about nouveau, but the outcome is they don't really know. TTM itself doesn't have any need for the pages on imported BOs (you can't mmap them anyway), the real underlying problem is that sg tables doesn't provide what drivers need. I think we could rather easily fix sg tables, but that is a totally separate task. The entire point of using sg lists was exactly to allow this use case of peer2peer dma (or well in general have special exporters which managed memory/IO ranges not backed by struct page). So essentially you're having a "I'm totally not broken flag" here. No, independent of needed struct page pointers we need to note if the exporter can handle peer2peer stuff from the hardware side in general. So what I've did is just to set peer2peer allowed on the importer because of the driver needs and clear it in the exporter if the hardware can't handle that. I think a better approach would be if we add a requires_struct_page or so, and annotate the current importers accordingly. Or we just fix them up (it is all in shared ttm code after all, I think everyone else got this right). I would rather not bed on that. Christian. -Daniel --- drivers/dma-buf/dma-buf.c | 1 + include/linux/dma-buf.h | 4 2 files changed, 5 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index ffaa2f9a9c2c..f420225f93c6 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -565,6 +565,7 @@ struct dma_buf_attachment *dma_buf_attach(const struct dma_buf_attach_info *info attach->dev = info->dev; attach->dmabuf = dmabuf; + attach->peer2peer = info->peer2peer; attach->priv = info->priv; attach->invalidate = info->invalidate; diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 15dd8598bff1..1ef50bd9bc5b 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -313,6 +313,7 @@ struct dma_buf { * @dmabuf: buffer for this attachment. * @dev: device attached to the buffer. * @node: list of dma_buf_attachment. + * @peer2peer: true if the importer can handle peer resources without pages. * @priv: exporter specific attachment data. * * This structure holds the attachment information between the dma_buf buffer @@ -328,6 +329,7 @@ struct dma_buf_attachment { struct dma_buf *dmabuf; struct device *dev; struct list_head node; + bool peer2peer; void *priv; /** @@ -392,6 +394,7 @@ struct dma_buf_export_info { * @dmabuf: the exported dma_buf * @dev: the device which wants to import the attachment * @priv: private data of importer to this attachment + * @peer2peer: true if the importer can handle peer resources without pages * @invalidate: callback to use for invalidating mappings * * This structure holds the information required to attach to a buffer. Used @@ -401,6 +404,7 @@ struct dma_buf_attach_info { struct dma_buf *dmabuf; struct device *dev; void *priv; + bool peer2peer; void (*invalidate)(struct dma_buf_attachment *attach); }; -- 2.14.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/8] dma-buf: add peer2peer flag
On Sun, Mar 25, 2018 at 12:59:56PM +0200, Christian König wrote: > Add a peer2peer flag noting that the importer can deal with device > resources which are not backed by pages. > > Signed-off-by: Christian König Um strictly speaking they all should, but ttm never bothered to use the real interfaces but just hacked around the provided sg list, grabbing the underlying struct pages, then rebuilding&remapping the sg list again. The entire point of using sg lists was exactly to allow this use case of peer2peer dma (or well in general have special exporters which managed memory/IO ranges not backed by struct page). So essentially you're having a "I'm totally not broken flag" here. I think a better approach would be if we add a requires_struct_page or so, and annotate the current importers accordingly. Or we just fix them up (it is all in shared ttm code after all, I think everyone else got this right). -Daniel > --- > drivers/dma-buf/dma-buf.c | 1 + > include/linux/dma-buf.h | 4 > 2 files changed, 5 insertions(+) > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index ffaa2f9a9c2c..f420225f93c6 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -565,6 +565,7 @@ struct dma_buf_attachment *dma_buf_attach(const struct > dma_buf_attach_info *info > > attach->dev = info->dev; > attach->dmabuf = dmabuf; > + attach->peer2peer = info->peer2peer; > attach->priv = info->priv; > attach->invalidate = info->invalidate; > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > index 15dd8598bff1..1ef50bd9bc5b 100644 > --- a/include/linux/dma-buf.h > +++ b/include/linux/dma-buf.h > @@ -313,6 +313,7 @@ struct dma_buf { > * @dmabuf: buffer for this attachment. > * @dev: device attached to the buffer. > * @node: list of dma_buf_attachment. > + * @peer2peer: true if the importer can handle peer resources without pages. > * @priv: exporter specific attachment data. > * > * This structure holds the attachment information between the dma_buf buffer > @@ -328,6 +329,7 @@ struct dma_buf_attachment { > struct dma_buf *dmabuf; > struct device *dev; > struct list_head node; > + bool peer2peer; > void *priv; > > /** > @@ -392,6 +394,7 @@ struct dma_buf_export_info { > * @dmabuf: the exported dma_buf > * @dev: the device which wants to import the attachment > * @priv:private data of importer to this attachment > + * @peer2peer: true if the importer can handle peer resources without > pages > * @invalidate: callback to use for invalidating mappings > * > * This structure holds the information required to attach to a buffer. Used > @@ -401,6 +404,7 @@ struct dma_buf_attach_info { > struct dma_buf *dmabuf; > struct device *dev; > void *priv; > + bool peer2peer; > void (*invalidate)(struct dma_buf_attachment *attach); > }; > > -- > 2.14.1 > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel