Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Thu, Dec 20, 2018 at 12:23:46PM +0900, Tomasz Figa wrote: > I haven't been following the problems with virtually tagged cases, > would you mind sharing some background, so that we can consider it > when adding non-consistent allocations to VB2? The problem exists at least partially with the current consistent allocator, and we need to fix it. My non-coherent series does not have it, but we would add it if we allowed virtual remapping. The problem with get_sgtable is that it creates aliasing of kernel virtual addresses used to access memory and thus the cache. We have the mapping return from dma_alloc_*, which in case of a remap contains a vmap/ioremap style address that is different from the kernel direct mapping address you get from using page_address/kmap on the pages backing that mapping. (assuming you even have pages - in a few corner cases we don't and the whole interface concept breaks down). This creates various problems as the the scatterlist returned from get_stable now gives a second way to access this memory through direct mapping addresses in the pages contained in it, but as soon as we do that we: a) don't use the nocache mapping used by the coherent allocator if that is on a per-mapping basis (which it is for many architectures), so you do get data in the cache even when that might not be assumed b) if the data returned from dma_alloc_coherent was not actually a remap but a special pool of non-cached address the cache flushing instructions might be invalid and caused problems c) any cache flushing now operates on just those direct mappings, which in case of the non-coherent allocator and access through the remapped address does the wrong thing for virtually tagged caches
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Wed, Dec 19, 2018 at 11:51 PM Christoph Hellwig wrote: > > On Wed, Dec 19, 2018 at 05:18:35PM +0900, Tomasz Figa wrote: > > The existing code that deals with dma_alloc_attrs() without > > DMA_ATTR_NON_CONSISTENT would just call dma_get_sgtable_attrs() like > > here: > > I know. And dma_get_sgtable_attrs is fundamentally flawed and we > need to kill this interface as it just can't worked with virtually > tagged cases. It is a prime example for an interface that looks > nice and simple but is plain wrong. Got it, thanks. I haven't been following the problems with virtually tagged cases, would you mind sharing some background, so that we can consider it when adding non-consistent allocations to VB2? Best regards, Tomasz
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Wed, Dec 19, 2018 at 05:18:35PM +0900, Tomasz Figa wrote: > The existing code that deals with dma_alloc_attrs() without > DMA_ATTR_NON_CONSISTENT would just call dma_get_sgtable_attrs() like > here: I know. And dma_get_sgtable_attrs is fundamentally flawed and we need to kill this interface as it just can't worked with virtually tagged cases. It is a prime example for an interface that looks nice and simple but is plain wrong.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Wed, Dec 19, 2018 at 4:51 PM Christoph Hellwig wrote: > > On Tue, Dec 18, 2018 at 06:48:03PM +0900, Tomasz Figa wrote: > > > So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT > > > in a loop with a suitably small chunk size, then stuff the results into > > > a scatterlist and map that again for the device share with if you don't > > > want a single contigous region. You just have to either deal with > > > non-contigous access from the kernel or use vmap and the right vmap > > > cache flushing helpers. > > > > The point is that you didn't have to do this small chunk loop without > > DMA_ATTR_NON_CONSISTENT, so it's at least inconsistent now and not > > sure why it could be better than just a loop of alloc_page(). > > You have to do it if you want to map the addresses for a second device. > The existing code that deals with dma_alloc_attrs() without DMA_ATTR_NON_CONSISTENT would just call dma_get_sgtable_attrs() like here: https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L366 and then dma_map_sg() for the other device like here; https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L283 > > > I would advice against new non-consistent users until this series > > > goes through, mostly because dma_cache_sync is such an amazing bad > > > API. Otherwise things will just work at the allocation side, you'll > > > just need to be careful to transfer ownership between the cpu and > > > the device(s) carefully using the dma_sync_* APIs. > > > > Just to clarify, the actual code isn't very likely to surface any time > > soon. so I assume it would be after this series lands. > > > > We will however need an API that can transparently handle both cases > > of contiguous (without IOMMU) and page-by-page allocations (with > > IOMMU) behind the scenes, like the current dma_alloc_attrs() without > > DMA_ATTR_NON_CONSISTENT. > > Is the use case to then share the memory between multiples devices > or just for a single device? The latter case is generally easy, the > former is rather more painful. The former, but the convention has been to assume that the userspace will choose the right (the most constrained typically) device to allocate from or otherwise handle the import failure (e.g. by falling back to copying into a buffer allocated from the importer). Best regards, Tomasz
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Tue, Dec 18, 2018 at 06:48:03PM +0900, Tomasz Figa wrote: > > So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT > > in a loop with a suitably small chunk size, then stuff the results into > > a scatterlist and map that again for the device share with if you don't > > want a single contigous region. You just have to either deal with > > non-contigous access from the kernel or use vmap and the right vmap > > cache flushing helpers. > > The point is that you didn't have to do this small chunk loop without > DMA_ATTR_NON_CONSISTENT, so it's at least inconsistent now and not > sure why it could be better than just a loop of alloc_page(). You have to do it if you want to map the addresses for a second device. > > I would advice against new non-consistent users until this series > > goes through, mostly because dma_cache_sync is such an amazing bad > > API. Otherwise things will just work at the allocation side, you'll > > just need to be careful to transfer ownership between the cpu and > > the device(s) carefully using the dma_sync_* APIs. > > Just to clarify, the actual code isn't very likely to surface any time > soon. so I assume it would be after this series lands. > > We will however need an API that can transparently handle both cases > of contiguous (without IOMMU) and page-by-page allocations (with > IOMMU) behind the scenes, like the current dma_alloc_attrs() without > DMA_ATTR_NON_CONSISTENT. Is the use case to then share the memory between multiples devices or just for a single device? The latter case is generally easy, the former is rather more painful.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Tue, Dec 18, 2018 at 4:38 PM Christoph Hellwig wrote: > > On Tue, Dec 18, 2018 at 04:22:43PM +0900, Tomasz Figa wrote: > > It kind of limits the usability of this API, since it enforces > > contiguous allocations even for big sizes even for devices behind > > IOMMU (contrary to the case when DMA_ATTR_NON_CONSISTENT is not set), > > but given that it's just a temporary solution for devices like these > > USB cameras, I guess that's fine. > > The problem is that you can't have flexibility and simplicity at the > same time. Once you use kernel virtual address remapping you need to > be prepared to have multiple segments. > > So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT > in a loop with a suitably small chunk size, then stuff the results into > a scatterlist and map that again for the device share with if you don't > want a single contigous region. You just have to either deal with > non-contigous access from the kernel or use vmap and the right vmap > cache flushing helpers. The point is that you didn't have to do this small chunk loop without DMA_ATTR_NON_CONSISTENT, so it's at least inconsistent now and not sure why it could be better than just a loop of alloc_page(). > > > Note that in V4L2 we use the DMA API extensively, so that we don't > > need to embed any device-specific or integration-specific knowledge in > > the framework. Right now we're using dma_alloc_attrs() with > > driver-provided attrs [1], but current driver never request > > non-consistent memory. We're however thinking about making it possible > > to allocate non-consistent memory. What would you suggest for this? > > > > [1] > > https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L139 > > I would advice against new non-consistent users until this series > goes through, mostly because dma_cache_sync is such an amazing bad > API. Otherwise things will just work at the allocation side, you'll > just need to be careful to transfer ownership between the cpu and > the device(s) carefully using the dma_sync_* APIs. Just to clarify, the actual code isn't very likely to surface any time soon. so I assume it would be after this series lands. We will however need an API that can transparently handle both cases of contiguous (without IOMMU) and page-by-page allocations (with IOMMU) behind the scenes, like the current dma_alloc_attrs() without DMA_ATTR_NON_CONSISTENT. Best regards, Tomasz
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Tue, Dec 18, 2018 at 04:22:43PM +0900, Tomasz Figa wrote: > It kind of limits the usability of this API, since it enforces > contiguous allocations even for big sizes even for devices behind > IOMMU (contrary to the case when DMA_ATTR_NON_CONSISTENT is not set), > but given that it's just a temporary solution for devices like these > USB cameras, I guess that's fine. The problem is that you can't have flexibility and simplicity at the same time. Once you use kernel virtual address remapping you need to be prepared to have multiple segments. So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT in a loop with a suitably small chunk size, then stuff the results into a scatterlist and map that again for the device share with if you don't want a single contigous region. You just have to either deal with non-contigous access from the kernel or use vmap and the right vmap cache flushing helpers. > Note that in V4L2 we use the DMA API extensively, so that we don't > need to embed any device-specific or integration-specific knowledge in > the framework. Right now we're using dma_alloc_attrs() with > driver-provided attrs [1], but current driver never request > non-consistent memory. We're however thinking about making it possible > to allocate non-consistent memory. What would you suggest for this? > > [1] > https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L139 I would advice against new non-consistent users until this series goes through, mostly because dma_cache_sync is such an amazing bad API. Otherwise things will just work at the allocation side, you'll just need to be careful to transfer ownership between the cpu and the device(s) carefully using the dma_sync_* APIs.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Fri, Dec 14, 2018 at 9:36 PM Christoph Hellwig wrote: > > On Fri, Dec 14, 2018 at 12:12:38PM +0900, Tomasz Figa wrote: > > > If the buffer always is physically contiguous, as it is in the currently > > > posted series, we can always map it with a single dma_map_single call > > > (if the hardware can handle that in a single segment is a different > > > question, but out of scope here). > > > > Are you sure the buffer is always physically contiguous? At least the > > ARM IOMMU dma_ops [1] and the DMA-IOMMU dma_ops [2] will simply > > allocate pages without any continuity guarantees and remap the pages > > into a contiguous kernel VA (unless DMA_ATTR_NO_KERNEL_MAPPING is > > given, which makes them return an opaque cookie instead of the kernel > > VA). > > > > [1] > > http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/arch/arm/mm/dma-mapping.c#l1291 > > [2] > > http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/drivers/iommu/dma-iommu.c#l450 > > We never end up in this allocator for the new DMA_ATTR_NON_CONSISTENT > case, and that is intentional. It kind of limits the usability of this API, since it enforces contiguous allocations even for big sizes even for devices behind IOMMU (contrary to the case when DMA_ATTR_NON_CONSISTENT is not set), but given that it's just a temporary solution for devices like these USB cameras, I guess that's fine. Note that in V4L2 we use the DMA API extensively, so that we don't need to embed any device-specific or integration-specific knowledge in the framework. Right now we're using dma_alloc_attrs() with driver-provided attrs [1], but current driver never request non-consistent memory. We're however thinking about making it possible to allocate non-consistent memory. What would you suggest for this? [1] https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L139 Best regards, Tomasz
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Fri, Dec 14, 2018 at 12:12:38PM +0900, Tomasz Figa wrote: > > If the buffer always is physically contiguous, as it is in the currently > > posted series, we can always map it with a single dma_map_single call > > (if the hardware can handle that in a single segment is a different > > question, but out of scope here). > > Are you sure the buffer is always physically contiguous? At least the > ARM IOMMU dma_ops [1] and the DMA-IOMMU dma_ops [2] will simply > allocate pages without any continuity guarantees and remap the pages > into a contiguous kernel VA (unless DMA_ATTR_NO_KERNEL_MAPPING is > given, which makes them return an opaque cookie instead of the kernel > VA). > > [1] > http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/arch/arm/mm/dma-mapping.c#l1291 > [2] > http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/drivers/iommu/dma-iommu.c#l450 We never end up in this allocator for the new DMA_ATTR_NON_CONSISTENT case, and that is intentional.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Thu, Dec 13, 2018 at 11:03 PM Christoph Hellwig wrote: > > On Thu, Dec 13, 2018 at 12:13:29PM +0900, Tomasz Figa wrote: > > Putting aside the problem of memory without struct page, one thing to > > note here that what is a contiguous DMA range for device X, may not be > > mappable contiguously for device Y and it would still need something > > like a scatter list to fully describe the buffer. > > I think we need to define contiguous here. > > If the buffer always is physically contiguous, as it is in the currently > posted series, we can always map it with a single dma_map_single call > (if the hardware can handle that in a single segment is a different > question, but out of scope here). Are you sure the buffer is always physically contiguous? At least the ARM IOMMU dma_ops [1] and the DMA-IOMMU dma_ops [2] will simply allocate pages without any continuity guarantees and remap the pages into a contiguous kernel VA (unless DMA_ATTR_NO_KERNEL_MAPPING is given, which makes them return an opaque cookie instead of the kernel VA). [1] http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/arch/arm/mm/dma-mapping.c#l1291 [2] http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/drivers/iommu/dma-iommu.c#l450 > > If on the other hand we have multiple discontiguous physical address > range that are mapped using the iommu and vmap this interface doesn't > work anyway. > > But in that case you should just do multiple allocations and then use > dma_map_sg coalescing on the hardware side, and vmap [1] if really > needed. I guess for this we want to gurantee that dma_alloc_attrs > with the DMA_ATTR_NON_CONSISTENT allows virt_to_page to be used on > the return value, which the currently posted implementation does, > although I'm a it reluctant about the API guarantee. > > > > Do we already have a structure that would work for this purposes? I'd > > assume that we need something like the existing scatterlist but with > > page links replaced with something that doesn't require the memory to > > have struct page, possibly just PFN? > > The problem is that just the PFN / physical address isn't enough for > most use cases as you also need a kernel virtual address. But moving > struct scatterlist to store a pfn instead of a struct page would be > pretty nice for various reasons anyway. > > > > > > > > > It would also be great to use that opportunity to get rid of all the > > > code duplication of almost the same dmabug provides backed by the > > > DMA API. > > > > Could you sched some more light on this? I'm curious what is the code > > duplication you're referring to. > > It seems like a lot of the dmabuf ops are just slight various of > dma_alloc + dma_get_sttable + dma_map_sg / dma_unmap_sg. There must be > a void to not duplicate that over and over. Device/kernel/userspace maps/unmaps shouldn't really be exporter-specific indeed, as long as one provides a uniform way of describing a buffer and then have dma_map_*() work on that. Possibly a part that manages the CPU cache maintenance either. There is still some space for some special device caches (or other synchronization), though. > > [1] and use invalidate_kernel_vmap_range and flush_kernel_vmap_range > to manually take care of cache flushing.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Thu, Dec 13, 2018 at 12:13:29PM +0900, Tomasz Figa wrote: > Putting aside the problem of memory without struct page, one thing to > note here that what is a contiguous DMA range for device X, may not be > mappable contiguously for device Y and it would still need something > like a scatter list to fully describe the buffer. I think we need to define contiguous here. If the buffer always is physically contiguous, as it is in the currently posted series, we can always map it with a single dma_map_single call (if the hardware can handle that in a single segment is a different question, but out of scope here). If on the other hand we have multiple discontiguous physical address range that are mapped using the iommu and vmap this interface doesn't work anyway. But in that case you should just do multiple allocations and then use dma_map_sg coalescing on the hardware side, and vmap [1] if really needed. I guess for this we want to gurantee that dma_alloc_attrs with the DMA_ATTR_NON_CONSISTENT allows virt_to_page to be used on the return value, which the currently posted implementation does, although I'm a it reluctant about the API guarantee. > Do we already have a structure that would work for this purposes? I'd > assume that we need something like the existing scatterlist but with > page links replaced with something that doesn't require the memory to > have struct page, possibly just PFN? The problem is that just the PFN / physical address isn't enough for most use cases as you also need a kernel virtual address. But moving struct scatterlist to store a pfn instead of a struct page would be pretty nice for various reasons anyway. > > > > > It would also be great to use that opportunity to get rid of all the > > code duplication of almost the same dmabug provides backed by the > > DMA API. > > Could you sched some more light on this? I'm curious what is the code > duplication you're referring to. It seems like a lot of the dmabuf ops are just slight various of dma_alloc + dma_get_sttable + dma_map_sg / dma_unmap_sg. There must be a void to not duplicate that over and over. [1] and use invalidate_kernel_vmap_range and flush_kernel_vmap_range to manually take care of cache flushing.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Wed, Dec 12, 2018 at 10:54 PM Christoph Hellwig wrote: > > On Wed, Dec 12, 2018 at 06:34:25PM +0900, Tomasz Figa wrote: > > The typical DMA-buf import/export flow is as follows: > > 1) Driver X allocates buffer A using this API for device x and gets a > > DMA address inside x's DMA (or IOVA) address space. > > 2) Driver X creates a dma_buf D(A), backed by buffer A and gives the > > user space process a file descriptor FD(A) referring to it. > > 3) Driver Y gets FD(A) from the user space and needs to map it into > > the DMA/IOVA address space of device y. It doe it by calling > > dma_buf_map_attachment() which returns an sg_table describing the > > mapping. > > And just as I said last time I think we need to fix the dmabuf code > to not rely on struct scatterlist. struct scatterlist is an interface > that is fundamentally page based, while the dma coherent allocator > only gives your a kernel virtual and dma address (and the option to > map the buffer into userspace). So we need to fix to get the interface > right as we already have DMAable memory withour a struct page and we > are bound to get more of those. Nevermind all the caching implications > even if we have a struct page. Putting aside the problem of memory without struct page, one thing to note here that what is a contiguous DMA range for device X, may not be mappable contiguously for device Y and it would still need something like a scatter list to fully describe the buffer. Do we already have a structure that would work for this purposes? I'd assume that we need something like the existing scatterlist but with page links replaced with something that doesn't require the memory to have struct page, possibly just PFN? > > It would also be great to use that opportunity to get rid of all the > code duplication of almost the same dmabug provides backed by the > DMA API. Could you sched some more light on this? I'm curious what is the code duplication you're referring to. Best regards, Tomasz
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Wed, Dec 12, 2018 at 06:34:25PM +0900, Tomasz Figa wrote: > The typical DMA-buf import/export flow is as follows: > 1) Driver X allocates buffer A using this API for device x and gets a > DMA address inside x's DMA (or IOVA) address space. > 2) Driver X creates a dma_buf D(A), backed by buffer A and gives the > user space process a file descriptor FD(A) referring to it. > 3) Driver Y gets FD(A) from the user space and needs to map it into > the DMA/IOVA address space of device y. It doe it by calling > dma_buf_map_attachment() which returns an sg_table describing the > mapping. And just as I said last time I think we need to fix the dmabuf code to not rely on struct scatterlist. struct scatterlist is an interface that is fundamentally page based, while the dma coherent allocator only gives your a kernel virtual and dma address (and the option to map the buffer into userspace). So we need to fix to get the interface right as we already have DMAable memory withour a struct page and we are bound to get more of those. Nevermind all the caching implications even if we have a struct page. It would also be great to use that opportunity to get rid of all the code duplication of almost the same dmabug provides backed by the DMA API.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Wed, Dec 12, 2018 at 6:09 PM Christoph Hellwig wrote: > > On Wed, Dec 12, 2018 at 05:57:02PM +0900, Tomasz Figa wrote: > > How about dma_sync_sg_*()? I'd expect some drivers to export/import > > such memory via sg, since that's the typical way of describing memory > > in DMA-buf. > > The way it is implemented dma_sync_sg_* would just work, however there > really should be no need to have sglists for buffers created by this > API. The typical DMA-buf import/export flow is as follows: 1) Driver X allocates buffer A using this API for device x and gets a DMA address inside x's DMA (or IOVA) address space. 2) Driver X creates a dma_buf D(A), backed by buffer A and gives the user space process a file descriptor FD(A) referring to it. 3) Driver Y gets FD(A) from the user space and needs to map it into the DMA/IOVA address space of device y. It doe it by calling dma_buf_map_attachment() which returns an sg_table describing the mapping. And then I realized that actually there is no need for the importer (driver Y) to call dma_sync_*() on its own, since the exporter (driver X) is expected to map and sync in its .map_dma_buf() dma_buf_ops callback. But there is still a need to have an sg_table created for the buffer, because it's what dma_buf_map_attachment() returns. > > > Sounds good to me. Thanks for working on this. I'd be happy to be on > > CC and help with review when you post the patches later. > > The patches were already posted here: > > https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031982.html Okay, thanks. I can see it in my inbox. Best regards, Tomasz
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Wed, Dec 12, 2018 at 05:57:02PM +0900, Tomasz Figa wrote: > How about dma_sync_sg_*()? I'd expect some drivers to export/import > such memory via sg, since that's the typical way of describing memory > in DMA-buf. The way it is implemented dma_sync_sg_* would just work, however there really should be no need to have sglists for buffers created by this API. > Sounds good to me. Thanks for working on this. I'd be happy to be on > CC and help with review when you post the patches later. The patches were already posted here: https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031982.html
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
Hi Christoph, On Sat, Dec 8, 2018 at 12:25 AM Christoph Hellwig wrote: > > Folks, can you take a look at this tree and see if this is useful > for USB: > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator > > The idea is that you use dma_alloc_attrs with the DMA_ATTR_NON_CONSISTENT > now that I've made sure it is avaiable everywhere [1], and we can use > dma_sync_single_* on it. How about dma_sync_sg_*()? I'd expect some drivers to export/import such memory via sg, since that's the typical way of describing memory in DMA-buf. > > The only special case USB will need are the HCD_LOCAL_MEM devices, for > which we must use dma_alloc_coherent (or dma_alloc_attrs without > DMA_ATTR_NON_CONSISTENT) and must skip the dma_sync_single_* calls, > so we'll probably need USB subsystem wrappers for those calls. > > [1] except powerpc in this tree - I have another series to make powerpc > use the generic dma noncoherent code, which would cover it. Sounds good to me. Thanks for working on this. I'd be happy to be on CC and help with review when you post the patches later. Best regards, Tomasz
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
Folks, can you take a look at this tree and see if this is useful for USB: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator The idea is that you use dma_alloc_attrs with the DMA_ATTR_NON_CONSISTENT now that I've made sure it is avaiable everywhere [1], and we can use dma_sync_single_* on it. The only special case USB will need are the HCD_LOCAL_MEM devices, for which we must use dma_alloc_coherent (or dma_alloc_attrs without DMA_ATTR_NON_CONSISTENT) and must skip the dma_sync_single_* calls, so we'll probably need USB subsystem wrappers for those calls. [1] except powerpc in this tree - I have another series to make powerpc use the generic dma noncoherent code, which would cover it.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
On Tue, 2018-09-11 at 21:58 +0300, Matwey V. Kornilov wrote: > вт, 28 авг. 2018 г. в 10:17, Matwey V. Kornilov : > > > > вт, 21 авг. 2018 г. в 20:06, Matwey V. Kornilov : > > > > > > DMA cocherency slows the transfer down on systems without hardware > > > coherent DMA. > > > Instead we use noncocherent DMA memory and explicit sync at data receive > > > handler. > > > > > > Based on previous commit the following performance benchmarks have been > > > carried out. Average memcpy() data transfer rate (rate) and handler > > > completion time (time) have been measured when running video stream at > > > 640x480 resolution at 10fps. > > > > > > x86_64 based system (Intel Core i5-3470). This platform has hardware > > > coherent DMA support and proposed change doesn't make big difference here. > > > > > > * kmalloc:rate = (2.0 +- 0.4) GBps > > >time = (5.0 +- 3.0) usec > > > * usb_alloc_coherent: rate = (3.4 +- 1.2) GBps > > >time = (3.5 +- 3.0) usec > > > > > > We see that the measurements agree within error ranges in this case. > > > So theoretically predicted performance downgrade cannot be reliably > > > measured here. > > > > > > armv7l based system (TI AM335x BeagleBone Black @ 300MHz). This platform > > > has no hardware coherent DMA support. DMA coherence is implemented via > > > disabled page caching that slows down memcpy() due to memory controller > > > behaviour. > > > > > > * kmalloc:rate = (114 +- 5) MBps > > >time = (84 +- 4) usec > > > * usb_alloc_coherent: rate = (28.1 +- 0.1) MBps > > >time = (341 +- 2) usec > > > > > > Note, that quantative difference leads (this commit leads to 4 times > > > acceleration) to qualitative behavior change in this case. As it was > > > stated before, the video stream cannot be successfully received at AM335x > > > platforms with MUSB based USB host controller due to performance issues > > > [1]. > > > > > > [1] https://www.spinics.net/lists/linux-usb/msg165735.html > > > > > > Signed-off-by: Matwey V. Kornilov > > > > Ping > > We are already using slab buffers in em28xx, so I think this change makes sense, until we have proper a non-coherent allocation API. Acked-by: Ezequiel Garcia Thanks a lot Matwey for your work.
Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer
вт, 21 авг. 2018 г. в 20:06, Matwey V. Kornilov : > > DMA cocherency slows the transfer down on systems without hardware > coherent DMA. > Instead we use noncocherent DMA memory and explicit sync at data receive > handler. > > Based on previous commit the following performance benchmarks have been > carried out. Average memcpy() data transfer rate (rate) and handler > completion time (time) have been measured when running video stream at > 640x480 resolution at 10fps. > > x86_64 based system (Intel Core i5-3470). This platform has hardware > coherent DMA support and proposed change doesn't make big difference here. > > * kmalloc:rate = (2.0 +- 0.4) GBps >time = (5.0 +- 3.0) usec > * usb_alloc_coherent: rate = (3.4 +- 1.2) GBps >time = (3.5 +- 3.0) usec > > We see that the measurements agree within error ranges in this case. > So theoretically predicted performance downgrade cannot be reliably > measured here. > > armv7l based system (TI AM335x BeagleBone Black @ 300MHz). This platform > has no hardware coherent DMA support. DMA coherence is implemented via > disabled page caching that slows down memcpy() due to memory controller > behaviour. > > * kmalloc:rate = (114 +- 5) MBps >time = (84 +- 4) usec > * usb_alloc_coherent: rate = (28.1 +- 0.1) MBps >time = (341 +- 2) usec > > Note, that quantative difference leads (this commit leads to 4 times > acceleration) to qualitative behavior change in this case. As it was > stated before, the video stream cannot be successfully received at AM335x > platforms with MUSB based USB host controller due to performance issues > [1]. > > [1] https://www.spinics.net/lists/linux-usb/msg165735.html > > Signed-off-by: Matwey V. Kornilov Ping > --- > drivers/media/usb/pwc/pwc-if.c | 57 > -- > 1 file changed, 44 insertions(+), 13 deletions(-) > > diff --git a/drivers/media/usb/pwc/pwc-if.c b/drivers/media/usb/pwc/pwc-if.c > index 72d2897a4b9f..1360722ab423 100644 > --- a/drivers/media/usb/pwc/pwc-if.c > +++ b/drivers/media/usb/pwc/pwc-if.c > @@ -159,6 +159,32 @@ static const struct video_device pwc_template = { > /***/ > /* Private functions */ > > +static void *pwc_alloc_urb_buffer(struct device *dev, > + size_t size, dma_addr_t *dma_handle) > +{ > + void *buffer = kmalloc(size, GFP_KERNEL); > + > + if (!buffer) > + return NULL; > + > + *dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE); > + if (dma_mapping_error(dev, *dma_handle)) { > + kfree(buffer); > + return NULL; > + } > + > + return buffer; > +} > + > +static void pwc_free_urb_buffer(struct device *dev, > + size_t size, > + void *buffer, > + dma_addr_t dma_handle) > +{ > + dma_unmap_single(dev, dma_handle, size, DMA_FROM_DEVICE); > + kfree(buffer); > +} > + > static struct pwc_frame_buf *pwc_get_next_fill_buf(struct pwc_device *pdev) > { > unsigned long flags = 0; > @@ -306,6 +332,11 @@ static void pwc_isoc_handler(struct urb *urb) > /* Reset ISOC error counter. We did get here, after all. */ > pdev->visoc_errors = 0; > > + dma_sync_single_for_cpu(&urb->dev->dev, > + urb->transfer_dma, > + urb->transfer_buffer_length, > + DMA_FROM_DEVICE); > + > /* vsync: 0 = don't copy data > 1 = sync-hunt > 2 = synched > @@ -428,16 +459,15 @@ static int pwc_isoc_init(struct pwc_device *pdev) > urb->dev = udev; > urb->pipe = usb_rcvisocpipe(udev, pdev->vendpoint); > urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP; > - urb->transfer_buffer = usb_alloc_coherent(udev, > - ISO_BUFFER_SIZE, > - GFP_KERNEL, > - &urb->transfer_dma); > + urb->transfer_buffer_length = ISO_BUFFER_SIZE; > + urb->transfer_buffer = pwc_alloc_urb_buffer(&udev->dev, > + > urb->transfer_buffer_length, > + > &urb->transfer_dma); > if (urb->transfer_buffer == NULL) { > PWC_ERROR("Failed to allocate urb buffer %d\n", i); > pwc_isoc_cleanup(pdev); > return -ENOMEM; > } > - urb->transfer_buffer_length = ISO_BUFFER_SIZE; > urb->compl