Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-21 Thread Christoph Hellwig
On Thu, Dec 20, 2018 at 12:23:46PM +0900, Tomasz Figa wrote:
> I haven't been following the problems with virtually tagged cases,
> would you mind sharing some background, so that we can consider it
> when adding non-consistent allocations to VB2?

The problem exists at least partially with the current consistent
allocator, and we need to fix it.  My non-coherent series does not have
it, but we would add it if we allowed virtual remapping.

The problem with get_sgtable is that it creates aliasing of kernel
virtual addresses used to access memory and thus the cache.  We have
the mapping return from dma_alloc_*, which in case of a remap contains
a vmap/ioremap style address that is different from the kernel direct
mapping address you get from using page_address/kmap on the pages
backing that mapping.  (assuming you even have pages - in a few corner
cases we don't and the whole interface concept breaks down).

This creates various problems as the the scatterlist returned from
get_stable now gives a second way to access this memory through direct
mapping addresses in the pages contained in it, but as soon as we do
that we:

 a) don't use the nocache mapping used by the coherent allocator if that
is on a per-mapping basis (which it is for many architectures), so
you do get data in the cache even when that might not be assumed
 b) if the data returned from dma_alloc_coherent was not actually a
remap but a special pool of non-cached address the cache flushing
instructions might be invalid and caused problems
 c) any cache flushing now operates on just those direct mappings, which
in case of the non-coherent allocator and access through the
remapped address does the wrong thing for virtually tagged caches


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-19 Thread Tomasz Figa
On Wed, Dec 19, 2018 at 11:51 PM Christoph Hellwig  wrote:
>
> On Wed, Dec 19, 2018 at 05:18:35PM +0900, Tomasz Figa wrote:
> > The existing code that deals with dma_alloc_attrs() without
> > DMA_ATTR_NON_CONSISTENT would just call dma_get_sgtable_attrs() like
> > here:
>
> I know.  And dma_get_sgtable_attrs is fundamentally flawed and we
> need to kill this interface as it just can't worked with virtually
> tagged cases.  It is a prime example for an interface that looks
> nice and simple but is plain wrong.

Got it, thanks.

I haven't been following the problems with virtually tagged cases,
would you mind sharing some background, so that we can consider it
when adding non-consistent allocations to VB2?

Best regards,
Tomasz


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-19 Thread Christoph Hellwig
On Wed, Dec 19, 2018 at 05:18:35PM +0900, Tomasz Figa wrote:
> The existing code that deals with dma_alloc_attrs() without
> DMA_ATTR_NON_CONSISTENT would just call dma_get_sgtable_attrs() like
> here:

I know.  And dma_get_sgtable_attrs is fundamentally flawed and we
need to kill this interface as it just can't worked with virtually
tagged cases.  It is a prime example for an interface that looks
nice and simple but is plain wrong.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-19 Thread Tomasz Figa
On Wed, Dec 19, 2018 at 4:51 PM Christoph Hellwig  wrote:
>
> On Tue, Dec 18, 2018 at 06:48:03PM +0900, Tomasz Figa wrote:
> > > So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT
> > > in a loop with a suitably small chunk size, then stuff the results into
> > > a scatterlist and map that again for the device share with if you don't
> > > want a single contigous region.  You just have to either deal with
> > > non-contigous access from the kernel or use vmap and the right vmap
> > > cache flushing helpers.
> >
> > The point is that you didn't have to do this small chunk loop without
> > DMA_ATTR_NON_CONSISTENT, so it's at least inconsistent now and not
> > sure why it could be better than just a loop of alloc_page().
>
> You have to do it if you want to map the addresses for a second device.
>

The existing code that deals with dma_alloc_attrs() without
DMA_ATTR_NON_CONSISTENT would just call dma_get_sgtable_attrs() like
here:

https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L366

and then dma_map_sg() for the other device like here;

https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L283

> > > I would advice against new non-consistent users until this series
> > > goes through, mostly because dma_cache_sync is such an amazing bad
> > > API.  Otherwise things will just work at the allocation side, you'll
> > > just need to be careful to transfer ownership between the cpu and
> > > the device(s) carefully using the dma_sync_* APIs.
> >
> > Just to clarify, the actual code isn't very likely to surface any time
> > soon. so I assume it would be after this series lands.
> >
> > We will however need an API that can transparently handle both cases
> > of contiguous (without IOMMU) and page-by-page allocations (with
> > IOMMU) behind the scenes, like the current dma_alloc_attrs() without
> > DMA_ATTR_NON_CONSISTENT.
>
> Is the use case to then share the memory between multiples devices
> or just for a single device?  The latter case is generally easy, the
> former is rather more painful.

The former, but the convention has been to assume that the userspace
will choose the right (the most constrained typically) device to
allocate from or otherwise handle the import failure (e.g. by falling
back to copying into a buffer allocated from the importer).

Best regards,
Tomasz


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-18 Thread Christoph Hellwig
On Tue, Dec 18, 2018 at 06:48:03PM +0900, Tomasz Figa wrote:
> > So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT
> > in a loop with a suitably small chunk size, then stuff the results into
> > a scatterlist and map that again for the device share with if you don't
> > want a single contigous region.  You just have to either deal with
> > non-contigous access from the kernel or use vmap and the right vmap
> > cache flushing helpers.
> 
> The point is that you didn't have to do this small chunk loop without
> DMA_ATTR_NON_CONSISTENT, so it's at least inconsistent now and not
> sure why it could be better than just a loop of alloc_page().

You have to do it if you want to map the addresses for a second device.

> > I would advice against new non-consistent users until this series
> > goes through, mostly because dma_cache_sync is such an amazing bad
> > API.  Otherwise things will just work at the allocation side, you'll
> > just need to be careful to transfer ownership between the cpu and
> > the device(s) carefully using the dma_sync_* APIs.
> 
> Just to clarify, the actual code isn't very likely to surface any time
> soon. so I assume it would be after this series lands.
> 
> We will however need an API that can transparently handle both cases
> of contiguous (without IOMMU) and page-by-page allocations (with
> IOMMU) behind the scenes, like the current dma_alloc_attrs() without
> DMA_ATTR_NON_CONSISTENT.

Is the use case to then share the memory between multiples devices
or just for a single device?  The latter case is generally easy, the
former is rather more painful.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-18 Thread Tomasz Figa
On Tue, Dec 18, 2018 at 4:38 PM Christoph Hellwig  wrote:
>
> On Tue, Dec 18, 2018 at 04:22:43PM +0900, Tomasz Figa wrote:
> > It kind of limits the usability of this API, since it enforces
> > contiguous allocations even for big sizes even for devices behind
> > IOMMU (contrary to the case when DMA_ATTR_NON_CONSISTENT is not set),
> > but given that it's just a temporary solution for devices like these
> > USB cameras, I guess that's fine.
>
> The problem is that you can't have flexibility and simplicity at the
> same time.  Once you use kernel virtual address remapping you need to
> be prepared to have multiple segments.
>
> So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT
> in a loop with a suitably small chunk size, then stuff the results into
> a scatterlist and map that again for the device share with if you don't
> want a single contigous region.  You just have to either deal with
> non-contigous access from the kernel or use vmap and the right vmap
> cache flushing helpers.

The point is that you didn't have to do this small chunk loop without
DMA_ATTR_NON_CONSISTENT, so it's at least inconsistent now and not
sure why it could be better than just a loop of alloc_page().

>
> > Note that in V4L2 we use the DMA API extensively, so that we don't
> > need to embed any device-specific or integration-specific knowledge in
> > the framework. Right now we're using dma_alloc_attrs() with
> > driver-provided attrs [1], but current driver never request
> > non-consistent memory. We're however thinking about making it possible
> > to allocate non-consistent memory. What would you suggest for this?
> >
> > [1] 
> > https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L139
>
> I would advice against new non-consistent users until this series
> goes through, mostly because dma_cache_sync is such an amazing bad
> API.  Otherwise things will just work at the allocation side, you'll
> just need to be careful to transfer ownership between the cpu and
> the device(s) carefully using the dma_sync_* APIs.

Just to clarify, the actual code isn't very likely to surface any time
soon. so I assume it would be after this series lands.

We will however need an API that can transparently handle both cases
of contiguous (without IOMMU) and page-by-page allocations (with
IOMMU) behind the scenes, like the current dma_alloc_attrs() without
DMA_ATTR_NON_CONSISTENT.

Best regards,
Tomasz


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-17 Thread Christoph Hellwig
On Tue, Dec 18, 2018 at 04:22:43PM +0900, Tomasz Figa wrote:
> It kind of limits the usability of this API, since it enforces
> contiguous allocations even for big sizes even for devices behind
> IOMMU (contrary to the case when DMA_ATTR_NON_CONSISTENT is not set),
> but given that it's just a temporary solution for devices like these
> USB cameras, I guess that's fine.

The problem is that you can't have flexibility and simplicity at the
same time.  Once you use kernel virtual address remapping you need to
be prepared to have multiple segments.

So as I said you can call dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT
in a loop with a suitably small chunk size, then stuff the results into
a scatterlist and map that again for the device share with if you don't
want a single contigous region.  You just have to either deal with
non-contigous access from the kernel or use vmap and the right vmap
cache flushing helpers.

> Note that in V4L2 we use the DMA API extensively, so that we don't
> need to embed any device-specific or integration-specific knowledge in
> the framework. Right now we're using dma_alloc_attrs() with
> driver-provided attrs [1], but current driver never request
> non-consistent memory. We're however thinking about making it possible
> to allocate non-consistent memory. What would you suggest for this?
> 
> [1] 
> https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L139

I would advice against new non-consistent users until this series
goes through, mostly because dma_cache_sync is such an amazing bad
API.  Otherwise things will just work at the allocation side, you'll
just need to be careful to transfer ownership between the cpu and
the device(s) carefully using the dma_sync_* APIs.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-17 Thread Tomasz Figa
On Fri, Dec 14, 2018 at 9:36 PM Christoph Hellwig  wrote:
>
> On Fri, Dec 14, 2018 at 12:12:38PM +0900, Tomasz Figa wrote:
> > > If the buffer always is physically contiguous, as it is in the currently
> > > posted series, we can always map it with a single dma_map_single call
> > > (if the hardware can handle that in a single segment is a different
> > > question, but out of scope here).
> >
> > Are you sure the buffer is always physically contiguous? At least the
> > ARM IOMMU dma_ops [1] and the DMA-IOMMU dma_ops [2] will simply
> > allocate pages without any continuity guarantees and remap the pages
> > into a contiguous kernel VA (unless DMA_ATTR_NO_KERNEL_MAPPING is
> > given, which makes them return an opaque cookie instead of the kernel
> > VA).
> >
> > [1] 
> > http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/arch/arm/mm/dma-mapping.c#l1291
> > [2] 
> > http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/drivers/iommu/dma-iommu.c#l450
>
> We never end up in this allocator for the new DMA_ATTR_NON_CONSISTENT
> case, and that is intentional.

It kind of limits the usability of this API, since it enforces
contiguous allocations even for big sizes even for devices behind
IOMMU (contrary to the case when DMA_ATTR_NON_CONSISTENT is not set),
but given that it's just a temporary solution for devices like these
USB cameras, I guess that's fine.

Note that in V4L2 we use the DMA API extensively, so that we don't
need to embed any device-specific or integration-specific knowledge in
the framework. Right now we're using dma_alloc_attrs() with
driver-provided attrs [1], but current driver never request
non-consistent memory. We're however thinking about making it possible
to allocate non-consistent memory. What would you suggest for this?

[1] 
https://elixir.bootlin.com/linux/v4.20-rc7/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L139

Best regards,
Tomasz


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-14 Thread Christoph Hellwig
On Fri, Dec 14, 2018 at 12:12:38PM +0900, Tomasz Figa wrote:
> > If the buffer always is physically contiguous, as it is in the currently
> > posted series, we can always map it with a single dma_map_single call
> > (if the hardware can handle that in a single segment is a different
> > question, but out of scope here).
> 
> Are you sure the buffer is always physically contiguous? At least the
> ARM IOMMU dma_ops [1] and the DMA-IOMMU dma_ops [2] will simply
> allocate pages without any continuity guarantees and remap the pages
> into a contiguous kernel VA (unless DMA_ATTR_NO_KERNEL_MAPPING is
> given, which makes them return an opaque cookie instead of the kernel
> VA).
> 
> [1] 
> http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/arch/arm/mm/dma-mapping.c#l1291
> [2] 
> http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/drivers/iommu/dma-iommu.c#l450

We never end up in this allocator for the new DMA_ATTR_NON_CONSISTENT
case, and that is intentional.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-13 Thread Tomasz Figa
On Thu, Dec 13, 2018 at 11:03 PM Christoph Hellwig  wrote:
>
> On Thu, Dec 13, 2018 at 12:13:29PM +0900, Tomasz Figa wrote:
> > Putting aside the problem of memory without struct page, one thing to
> > note here that what is a contiguous DMA range for device X, may not be
> > mappable contiguously for device Y and it would still need something
> > like a scatter list to fully describe the buffer.
>
> I think we need to define contiguous here.
>
> If the buffer always is physically contiguous, as it is in the currently
> posted series, we can always map it with a single dma_map_single call
> (if the hardware can handle that in a single segment is a different
> question, but out of scope here).

Are you sure the buffer is always physically contiguous? At least the
ARM IOMMU dma_ops [1] and the DMA-IOMMU dma_ops [2] will simply
allocate pages without any continuity guarantees and remap the pages
into a contiguous kernel VA (unless DMA_ATTR_NO_KERNEL_MAPPING is
given, which makes them return an opaque cookie instead of the kernel
VA).

[1] 
http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/arch/arm/mm/dma-mapping.c#l1291
[2] 
http://git.infradead.org/users/hch/misc.git/blob/2dbb028e4a3017e1b71a6ae3828a3548545eba24:/drivers/iommu/dma-iommu.c#l450

>
> If on the other hand we have multiple discontiguous physical address
> range that are mapped using the iommu and vmap this interface doesn't
> work anyway.
>
> But in that case you should just do multiple allocations and then use
> dma_map_sg coalescing on the hardware side, and vmap [1] if really
> needed.  I guess for this we want to gurantee that dma_alloc_attrs
> with the DMA_ATTR_NON_CONSISTENT allows virt_to_page to be used on
> the return value, which the currently posted implementation does,
> although I'm a it reluctant about the API guarantee.
>
>
> > Do we already have a structure that would work for this purposes? I'd
> > assume that we need something like the existing scatterlist but with
> > page links replaced with something that doesn't require the memory to
> > have struct page, possibly just PFN?
>
> The problem is that just the PFN / physical address isn't enough for
> most use cases as you also need a kernel virtual address.  But moving
> struct scatterlist to store a pfn instead of a struct page would be
> pretty nice for various reasons anyway.
>
> >
> > >
> > > It would also be great to use that opportunity to get rid of all the
> > > code duplication of almost the same dmabug provides backed by the
> > > DMA API.
> >
> > Could you sched some more light on this? I'm curious what is the code
> > duplication you're referring to.
>
> It seems like a lot of the dmabuf ops are just slight various of
> dma_alloc + dma_get_sttable + dma_map_sg / dma_unmap_sg.  There must be
> a void to not duplicate that over and over.

Device/kernel/userspace maps/unmaps shouldn't really be
exporter-specific indeed, as long as one provides a uniform way of
describing a buffer and then have dma_map_*() work on that. Possibly a
part that manages the CPU cache maintenance either. There is still
some space for some special device caches (or other synchronization),
though.

>
> [1] and use invalidate_kernel_vmap_range and flush_kernel_vmap_range
> to manually take care of cache flushing.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-13 Thread Christoph Hellwig
On Thu, Dec 13, 2018 at 12:13:29PM +0900, Tomasz Figa wrote:
> Putting aside the problem of memory without struct page, one thing to
> note here that what is a contiguous DMA range for device X, may not be
> mappable contiguously for device Y and it would still need something
> like a scatter list to fully describe the buffer.

I think we need to define contiguous here.

If the buffer always is physically contiguous, as it is in the currently
posted series, we can always map it with a single dma_map_single call
(if the hardware can handle that in a single segment is a different
question, but out of scope here).

If on the other hand we have multiple discontiguous physical address
range that are mapped using the iommu and vmap this interface doesn't
work anyway.

But in that case you should just do multiple allocations and then use
dma_map_sg coalescing on the hardware side, and vmap [1] if really
needed.  I guess for this we want to gurantee that dma_alloc_attrs
with the DMA_ATTR_NON_CONSISTENT allows virt_to_page to be used on
the return value, which the currently posted implementation does,
although I'm a it reluctant about the API guarantee.


> Do we already have a structure that would work for this purposes? I'd
> assume that we need something like the existing scatterlist but with
> page links replaced with something that doesn't require the memory to
> have struct page, possibly just PFN?

The problem is that just the PFN / physical address isn't enough for
most use cases as you also need a kernel virtual address.  But moving
struct scatterlist to store a pfn instead of a struct page would be
pretty nice for various reasons anyway.

> 
> >
> > It would also be great to use that opportunity to get rid of all the
> > code duplication of almost the same dmabug provides backed by the
> > DMA API.
> 
> Could you sched some more light on this? I'm curious what is the code
> duplication you're referring to.

It seems like a lot of the dmabuf ops are just slight various of
dma_alloc + dma_get_sttable + dma_map_sg / dma_unmap_sg.  There must be
a void to not duplicate that over and over.

[1] and use invalidate_kernel_vmap_range and flush_kernel_vmap_range
to manually take care of cache flushing.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-12 Thread Tomasz Figa
On Wed, Dec 12, 2018 at 10:54 PM Christoph Hellwig  wrote:
>
> On Wed, Dec 12, 2018 at 06:34:25PM +0900, Tomasz Figa wrote:
> > The typical DMA-buf import/export flow is as follows:
> > 1) Driver X allocates buffer A using this API for device x and gets a
> > DMA address inside x's DMA (or IOVA) address space.
> > 2) Driver X creates a dma_buf D(A), backed by buffer A and gives the
> > user space process a file descriptor FD(A) referring to it.
> > 3) Driver Y gets FD(A) from the user space and needs to map it into
> > the DMA/IOVA address space of device y. It doe it by calling
> > dma_buf_map_attachment() which returns an sg_table describing the
> > mapping.
>
> And just as I said last time I think we need to fix the dmabuf code
> to not rely on struct scatterlist.  struct scatterlist is an interface
> that is fundamentally page based, while the dma coherent allocator
> only gives your a kernel virtual and dma address (and the option to
> map the buffer into userspace).  So we need to fix to get the interface
> right as we already have DMAable memory withour a struct page and we
> are bound to get more of those.  Nevermind all the caching implications
> even if we have a struct page.

Putting aside the problem of memory without struct page, one thing to
note here that what is a contiguous DMA range for device X, may not be
mappable contiguously for device Y and it would still need something
like a scatter list to fully describe the buffer.

Do we already have a structure that would work for this purposes? I'd
assume that we need something like the existing scatterlist but with
page links replaced with something that doesn't require the memory to
have struct page, possibly just PFN?

>
> It would also be great to use that opportunity to get rid of all the
> code duplication of almost the same dmabug provides backed by the
> DMA API.

Could you sched some more light on this? I'm curious what is the code
duplication you're referring to.

Best regards,
Tomasz


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-12 Thread Christoph Hellwig
On Wed, Dec 12, 2018 at 06:34:25PM +0900, Tomasz Figa wrote:
> The typical DMA-buf import/export flow is as follows:
> 1) Driver X allocates buffer A using this API for device x and gets a
> DMA address inside x's DMA (or IOVA) address space.
> 2) Driver X creates a dma_buf D(A), backed by buffer A and gives the
> user space process a file descriptor FD(A) referring to it.
> 3) Driver Y gets FD(A) from the user space and needs to map it into
> the DMA/IOVA address space of device y. It doe it by calling
> dma_buf_map_attachment() which returns an sg_table describing the
> mapping.

And just as I said last time I think we need to fix the dmabuf code
to not rely on struct scatterlist.  struct scatterlist is an interface
that is fundamentally page based, while the dma coherent allocator
only gives your a kernel virtual and dma address (and the option to
map the buffer into userspace).  So we need to fix to get the interface
right as we already have DMAable memory withour a struct page and we
are bound to get more of those.  Nevermind all the caching implications
even if we have a struct page.

It would also be great to use that opportunity to get rid of all the
code duplication of almost the same dmabug provides backed by the
DMA API.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-12 Thread Tomasz Figa
On Wed, Dec 12, 2018 at 6:09 PM Christoph Hellwig  wrote:
>
> On Wed, Dec 12, 2018 at 05:57:02PM +0900, Tomasz Figa wrote:
> > How about dma_sync_sg_*()? I'd expect some drivers to export/import
> > such memory via sg, since that's the typical way of describing memory
> > in DMA-buf.
>
> The way it is implemented dma_sync_sg_* would just work, however there
> really should be no need to have sglists for buffers created by this
> API.

The typical DMA-buf import/export flow is as follows:
1) Driver X allocates buffer A using this API for device x and gets a
DMA address inside x's DMA (or IOVA) address space.
2) Driver X creates a dma_buf D(A), backed by buffer A and gives the
user space process a file descriptor FD(A) referring to it.
3) Driver Y gets FD(A) from the user space and needs to map it into
the DMA/IOVA address space of device y. It doe it by calling
dma_buf_map_attachment() which returns an sg_table describing the
mapping.

And then I realized that actually there is no need for the importer
(driver Y) to call dma_sync_*() on its own, since the exporter (driver
X) is expected to map and sync in its .map_dma_buf() dma_buf_ops
callback. But there is still a need to have an sg_table created for
the buffer, because it's what dma_buf_map_attachment() returns.

>
> > Sounds good to me. Thanks for working on this. I'd be happy to be on
> > CC and help with review when you post the patches later.
>
> The patches were already posted here:
>
> https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031982.html

Okay, thanks. I can see it in my inbox.

Best regards,
Tomasz


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-12 Thread Christoph Hellwig
On Wed, Dec 12, 2018 at 05:57:02PM +0900, Tomasz Figa wrote:
> How about dma_sync_sg_*()? I'd expect some drivers to export/import
> such memory via sg, since that's the typical way of describing memory
> in DMA-buf.

The way it is implemented dma_sync_sg_* would just work, however there
really should be no need to have sglists for buffers created by this
API.

> Sounds good to me. Thanks for working on this. I'd be happy to be on
> CC and help with review when you post the patches later.

The patches were already posted here:

https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031982.html


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-12 Thread Tomasz Figa
Hi Christoph,

On Sat, Dec 8, 2018 at 12:25 AM Christoph Hellwig  wrote:
>
> Folks, can you take a look at this tree and see if this is useful
> for USB:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator
>
> The idea is that you use dma_alloc_attrs with the DMA_ATTR_NON_CONSISTENT
> now that I've made sure it is avaiable everywhere [1], and we can use
> dma_sync_single_* on it.

How about dma_sync_sg_*()? I'd expect some drivers to export/import
such memory via sg, since that's the typical way of describing memory
in DMA-buf.

>
> The only special case USB will need are the HCD_LOCAL_MEM devices, for
> which we must use dma_alloc_coherent (or dma_alloc_attrs without
> DMA_ATTR_NON_CONSISTENT) and must skip the dma_sync_single_* calls,
> so we'll probably need USB subsystem wrappers for those calls.
>
> [1] except powerpc in this tree - I have another series to make powerpc
> use the generic dma noncoherent code, which would cover it.

Sounds good to me. Thanks for working on this. I'd be happy to be on
CC and help with review when you post the patches later.

Best regards,
Tomasz


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-12-07 Thread Christoph Hellwig
Folks, can you take a look at this tree and see if this is useful
for USB:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator

The idea is that you use dma_alloc_attrs with the DMA_ATTR_NON_CONSISTENT
now that I've made sure it is avaiable everywhere [1], and we can use
dma_sync_single_* on it.

The only special case USB will need are the HCD_LOCAL_MEM devices, for
which we must use dma_alloc_coherent (or dma_alloc_attrs without
DMA_ATTR_NON_CONSISTENT) and must skip the dma_sync_single_* calls,
so we'll probably need USB subsystem wrappers for those calls.

[1] except powerpc in this tree - I have another series to make powerpc
use the generic dma noncoherent code, which would cover it.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-09-19 Thread Ezequiel Garcia
On Tue, 2018-09-11 at 21:58 +0300, Matwey V. Kornilov wrote:
> вт, 28 авг. 2018 г. в 10:17, Matwey V. Kornilov :
> > 
> > вт, 21 авг. 2018 г. в 20:06, Matwey V. Kornilov :
> > > 
> > > DMA cocherency slows the transfer down on systems without hardware
> > > coherent DMA.
> > > Instead we use noncocherent DMA memory and explicit sync at data receive
> > > handler.
> > > 
> > > Based on previous commit the following performance benchmarks have been
> > > carried out. Average memcpy() data transfer rate (rate) and handler
> > > completion time (time) have been measured when running video stream at
> > > 640x480 resolution at 10fps.
> > > 
> > > x86_64 based system (Intel Core i5-3470). This platform has hardware
> > > coherent DMA support and proposed change doesn't make big difference here.
> > > 
> > >  * kmalloc:rate = (2.0 +- 0.4) GBps
> > >time = (5.0 +- 3.0) usec
> > >  * usb_alloc_coherent: rate = (3.4 +- 1.2) GBps
> > >time = (3.5 +- 3.0) usec
> > > 
> > > We see that the measurements agree within error ranges in this case.
> > > So theoretically predicted performance downgrade cannot be reliably
> > > measured here.
> > > 
> > > armv7l based system (TI AM335x BeagleBone Black @ 300MHz). This platform
> > > has no hardware coherent DMA support. DMA coherence is implemented via
> > > disabled page caching that slows down memcpy() due to memory controller
> > > behaviour.
> > > 
> > >  * kmalloc:rate =  (114 +- 5) MBps
> > >time =   (84 +- 4) usec
> > >  * usb_alloc_coherent: rate = (28.1 +- 0.1) MBps
> > >time =  (341 +- 2) usec
> > > 
> > > Note, that quantative difference leads (this commit leads to 4 times
> > > acceleration) to qualitative behavior change in this case. As it was
> > > stated before, the video stream cannot be successfully received at AM335x
> > > platforms with MUSB based USB host controller due to performance issues
> > > [1].
> > > 
> > > [1] https://www.spinics.net/lists/linux-usb/msg165735.html
> > > 
> > > Signed-off-by: Matwey V. Kornilov 
> > 
> > Ping
> > 

We are already using slab buffers in em28xx, so I think this change
makes sense, until we have proper a non-coherent allocation API.

Acked-by: Ezequiel Garcia 

Thanks a lot Matwey for your work.


Re: [PATCH v5 2/2] media: usb: pwc: Don't use coherent DMA buffers for ISO transfer

2018-08-28 Thread Matwey V. Kornilov
вт, 21 авг. 2018 г. в 20:06, Matwey V. Kornilov :
>
> DMA cocherency slows the transfer down on systems without hardware
> coherent DMA.
> Instead we use noncocherent DMA memory and explicit sync at data receive
> handler.
>
> Based on previous commit the following performance benchmarks have been
> carried out. Average memcpy() data transfer rate (rate) and handler
> completion time (time) have been measured when running video stream at
> 640x480 resolution at 10fps.
>
> x86_64 based system (Intel Core i5-3470). This platform has hardware
> coherent DMA support and proposed change doesn't make big difference here.
>
>  * kmalloc:rate = (2.0 +- 0.4) GBps
>time = (5.0 +- 3.0) usec
>  * usb_alloc_coherent: rate = (3.4 +- 1.2) GBps
>time = (3.5 +- 3.0) usec
>
> We see that the measurements agree within error ranges in this case.
> So theoretically predicted performance downgrade cannot be reliably
> measured here.
>
> armv7l based system (TI AM335x BeagleBone Black @ 300MHz). This platform
> has no hardware coherent DMA support. DMA coherence is implemented via
> disabled page caching that slows down memcpy() due to memory controller
> behaviour.
>
>  * kmalloc:rate =  (114 +- 5) MBps
>time =   (84 +- 4) usec
>  * usb_alloc_coherent: rate = (28.1 +- 0.1) MBps
>time =  (341 +- 2) usec
>
> Note, that quantative difference leads (this commit leads to 4 times
> acceleration) to qualitative behavior change in this case. As it was
> stated before, the video stream cannot be successfully received at AM335x
> platforms with MUSB based USB host controller due to performance issues
> [1].
>
> [1] https://www.spinics.net/lists/linux-usb/msg165735.html
>
> Signed-off-by: Matwey V. Kornilov 

Ping

> ---
>  drivers/media/usb/pwc/pwc-if.c | 57 
> --
>  1 file changed, 44 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/media/usb/pwc/pwc-if.c b/drivers/media/usb/pwc/pwc-if.c
> index 72d2897a4b9f..1360722ab423 100644
> --- a/drivers/media/usb/pwc/pwc-if.c
> +++ b/drivers/media/usb/pwc/pwc-if.c
> @@ -159,6 +159,32 @@ static const struct video_device pwc_template = {
>  /***/
>  /* Private functions */
>
> +static void *pwc_alloc_urb_buffer(struct device *dev,
> + size_t size, dma_addr_t *dma_handle)
> +{
> +   void *buffer = kmalloc(size, GFP_KERNEL);
> +
> +   if (!buffer)
> +   return NULL;
> +
> +   *dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE);
> +   if (dma_mapping_error(dev, *dma_handle)) {
> +   kfree(buffer);
> +   return NULL;
> +   }
> +
> +   return buffer;
> +}
> +
> +static void pwc_free_urb_buffer(struct device *dev,
> +   size_t size,
> +   void *buffer,
> +   dma_addr_t dma_handle)
> +{
> +   dma_unmap_single(dev, dma_handle, size, DMA_FROM_DEVICE);
> +   kfree(buffer);
> +}
> +
>  static struct pwc_frame_buf *pwc_get_next_fill_buf(struct pwc_device *pdev)
>  {
> unsigned long flags = 0;
> @@ -306,6 +332,11 @@ static void pwc_isoc_handler(struct urb *urb)
> /* Reset ISOC error counter. We did get here, after all. */
> pdev->visoc_errors = 0;
>
> +   dma_sync_single_for_cpu(&urb->dev->dev,
> +   urb->transfer_dma,
> +   urb->transfer_buffer_length,
> +   DMA_FROM_DEVICE);
> +
> /* vsync: 0 = don't copy data
>   1 = sync-hunt
>   2 = synched
> @@ -428,16 +459,15 @@ static int pwc_isoc_init(struct pwc_device *pdev)
> urb->dev = udev;
> urb->pipe = usb_rcvisocpipe(udev, pdev->vendpoint);
> urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
> -   urb->transfer_buffer = usb_alloc_coherent(udev,
> - ISO_BUFFER_SIZE,
> - GFP_KERNEL,
> - &urb->transfer_dma);
> +   urb->transfer_buffer_length = ISO_BUFFER_SIZE;
> +   urb->transfer_buffer = pwc_alloc_urb_buffer(&udev->dev,
> +   
> urb->transfer_buffer_length,
> +   
> &urb->transfer_dma);
> if (urb->transfer_buffer == NULL) {
> PWC_ERROR("Failed to allocate urb buffer %d\n", i);
> pwc_isoc_cleanup(pdev);
> return -ENOMEM;
> }
> -   urb->transfer_buffer_length = ISO_BUFFER_SIZE;
> urb->compl