Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-02-19 Thread Laura Abbott

On 1/24/19 8:44 AM, Brian Starkey wrote:

On Thu, Jan 24, 2019 at 10:04:46AM -0600, Andrew F. Davis wrote:

On 1/23/19 11:11 AM, Brian Starkey wrote:


[snip]



I'm very new to all this, so any pointers to history in this area are
appreciated.



[snip]




In case you didn't come across it already, the effort which seems to
have gained the most "air-time" recently is
https://github.com/cubanismo/allocator, which is still a userspace
module (perhaps some concepts from there could go into the kernel?),
but makes some attempts at generic constraint solving. It's also not
really moving anywhere at the moment.



Very interesting, I'm going to have to stare at this for a bit.


In which case, some reading material that might be of interest :-)

https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf
https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html

-Brian



In some respects this is more a question of "what is the purpose
of Ion". Once upon a time, Ion was going to do constraint solving
but that never really happened and I figured Ion would get deprecated.
People keep coming out of the woodwork with new uses for Ion so
its stuck around. This is why I've primarily focused on Ion as a
framework for exposing available memory types to userspace and leave
the constraint solving to someone else, since that's what most
users seem to want out of Ion ("I know I want memory type X please
give it to me"). That's not to say that this was a perfect or
even the correct approach, just what made the most sense based
on users.

Thanks,
Laura
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-24 Thread Brian Starkey
On Thu, Jan 24, 2019 at 10:04:46AM -0600, Andrew F. Davis wrote:
> On 1/23/19 11:11 AM, Brian Starkey wrote:

[snip]

> 
> I'm very new to all this, so any pointers to history in this area are
> appreciated.
> 

[snip]

> 
> > In case you didn't come across it already, the effort which seems to
> > have gained the most "air-time" recently is
> > https://github.com/cubanismo/allocator, which is still a userspace
> > module (perhaps some concepts from there could go into the kernel?),
> > but makes some attempts at generic constraint solving. It's also not
> > really moving anywhere at the moment.
> > 
> 
> Very interesting, I'm going to have to stare at this for a bit.

In which case, some reading material that might be of interest :-)

https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf
https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf
https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html

-Brian

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-24 Thread Andrew F. Davis
On 1/23/19 11:11 AM, Brian Starkey wrote:
> Hi Andrew,
> 
> On Wed, Jan 23, 2019 at 10:51:24AM -0600, Andrew F. Davis wrote:
>> On 1/22/19 11:33 AM, Sumit Semwal wrote:
>>> Hello everyone,
>>>
>>> Sincere apologies for chiming in a bit late here, but was off due to
>>> some health issues.
>>>
>>
>> Hope you are feeling better friend :)
>>
>> Looks like this email was a bit broken and you replied again, the
>> responses are a little different in each email, so I'd like to respond
>> to bits of both, I'll fix up the formatting.
>>
>>> Also, adding Daniel Vetter to the mix, since he has been one of the
>>> core guys who shaped up dma-buf as it is today.
>>>
>>> On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis  wrote:

 On 1/21/19 5:22 AM, Brian Starkey wrote:
>>
>> [snip]
>>
>
> Actually I meant in the kernel, in exporters. I haven't seen anyone
> using the API as it was intended (defer allocation until first map,
> migrate between different attachments, etc.). Mostly, backing storage
> seems to get allocated at the point of export, and device mappings are
> often held persistently (e.g. the DRM prime code maps the buffer at
> import time, and keeps it mapped: drm_gem_prime_import_dev).
>

>>>
>>> So I suppose some clarification on the 'intended use' part of dma-buf
>>> about deferred allocation is due, so here it is: (Daniel, please feel
>>> free to chime in with your opinion here)
>>>
>>>  - dma-buf was of course designed as a framework to help intelligent
>>> exporters to defer allocation until first map, and be able to migrate
>>> backing storage if required etc. At the same time, it is not a
>>> _requirement_ from any exporter, so exporters so far have just used it
>>> as a convenient mechanism for zero-copy.
>>> - ION is one of the few dma-buf exporters in kernel, which satisfies a
>>> certain set of expectations from its users.
>>>
>>
>> The issue here is that Ion is blocking the ability to late allocate, it
>> expects its heaps to have the memory ready at allocation time. My point
>> being if the DMA-BUFs intended design was to allow this then Ion should
>> respect that and also allow the same from its heap exporters.
>>
 I haven't either, which is a shame as it allows for some really useful
 management strategies for shared memory resources. I'm working on one
 such case right now, maybe I'll get to be the first to upstream one :)

>>> That will be a really good thing! Though perhaps we ought to think if
>>> for what you're trying to do, is ION the right place, or should you
>>> have a device-specific exporter, available to users via dma-buf apis?
>>>
>>
>> I'm starting to question if Ion is the right place myself..
>>
>> At a conceptual level I don't believe userspace should be picking the
>> backing memory type. This is because the right type of backing memory
>> for a task will change from system to system. The kernel should abstract
>> away these hardware differences from userspace as much as it can to
>> allow portable code.
>>
>> For instance a device may need a contiguous buffer on one system but the
>> same device on another may have some IOMMU. So which type of memory do
>> we allocate? Same issue for cacheability and other properties.
>>
>> What we need is a central allocator with full system knowledge to do the
>> choosing for us. It seems many agree with the above and I take
>> inspiration from your cenalloc patchset. The thing I'm not sure about is
>> letting the device drivers set their constraints, because they also
>> don't have the full system integration details. For cases where devices
>> are behind an IOMMU it is easy enough for the device to know, but what
>> about when we have external MMUs out on the bus for anyone to use (I'm
>> guessing you remember TILER..).
>>
>> I would propose the central allocator keep per-system knowledge (or
>> fetch it from DT, or if this is considered policy then userspace) which
>> it can use to directly check the attached devices and pick the right memory.
>>
>> Anyway the central system allocator could handle 90% of cases I can
>> think of, and this is where Ion comes back in, the other cases would
>> still require the program to manually pick the right memory (maybe for
>> performance reasons, etc.).
>>
>> So my vision is to have Ion as the the main front-end for DMA-BUF
>> allocations, and expose the central allocator through it (maybe as a
>> default heap type that can be populated on a per-system basis), but also
>> have other individual heap types exported for the edge cases where
>> manual selection is needed like we do now.
>>
>> Hence why Ion should allow direct control of the dma_buf_ops from the
>> heaps, so we can build central allocators as Ion heaps.
>>
>> If I'm off into the weeds here and you have some other ideas I'm all ears.
>>
> 
> This is a topic I've gone around a few times. The crux of it is, as
> you know, a central allocator is Really Hard. I don't know what you've
> 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-23 Thread Brian Starkey
Hi Andrew,

On Wed, Jan 23, 2019 at 10:51:24AM -0600, Andrew F. Davis wrote:
> On 1/22/19 11:33 AM, Sumit Semwal wrote:
> > Hello everyone,
> > 
> > Sincere apologies for chiming in a bit late here, but was off due to
> > some health issues.
> > 
> 
> Hope you are feeling better friend :)
> 
> Looks like this email was a bit broken and you replied again, the
> responses are a little different in each email, so I'd like to respond
> to bits of both, I'll fix up the formatting.
> 
> > Also, adding Daniel Vetter to the mix, since he has been one of the
> > core guys who shaped up dma-buf as it is today.
> > 
> > On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis  wrote:
> >>
> >> On 1/21/19 5:22 AM, Brian Starkey wrote:
> 
> [snip]
> 
> >>>
> >>> Actually I meant in the kernel, in exporters. I haven't seen anyone
> >>> using the API as it was intended (defer allocation until first map,
> >>> migrate between different attachments, etc.). Mostly, backing storage
> >>> seems to get allocated at the point of export, and device mappings are
> >>> often held persistently (e.g. the DRM prime code maps the buffer at
> >>> import time, and keeps it mapped: drm_gem_prime_import_dev).
> >>>
> >>
> > 
> > So I suppose some clarification on the 'intended use' part of dma-buf
> > about deferred allocation is due, so here it is: (Daniel, please feel
> > free to chime in with your opinion here)
> > 
> >  - dma-buf was of course designed as a framework to help intelligent
> > exporters to defer allocation until first map, and be able to migrate
> > backing storage if required etc. At the same time, it is not a
> > _requirement_ from any exporter, so exporters so far have just used it
> > as a convenient mechanism for zero-copy.
> > - ION is one of the few dma-buf exporters in kernel, which satisfies a
> > certain set of expectations from its users.
> > 
> 
> The issue here is that Ion is blocking the ability to late allocate, it
> expects its heaps to have the memory ready at allocation time. My point
> being if the DMA-BUFs intended design was to allow this then Ion should
> respect that and also allow the same from its heap exporters.
> 
> >> I haven't either, which is a shame as it allows for some really useful
> >> management strategies for shared memory resources. I'm working on one
> >> such case right now, maybe I'll get to be the first to upstream one :)
> >>
> > That will be a really good thing! Though perhaps we ought to think if
> > for what you're trying to do, is ION the right place, or should you
> > have a device-specific exporter, available to users via dma-buf apis?
> > 
> 
> I'm starting to question if Ion is the right place myself..
> 
> At a conceptual level I don't believe userspace should be picking the
> backing memory type. This is because the right type of backing memory
> for a task will change from system to system. The kernel should abstract
> away these hardware differences from userspace as much as it can to
> allow portable code.
> 
> For instance a device may need a contiguous buffer on one system but the
> same device on another may have some IOMMU. So which type of memory do
> we allocate? Same issue for cacheability and other properties.
> 
> What we need is a central allocator with full system knowledge to do the
> choosing for us. It seems many agree with the above and I take
> inspiration from your cenalloc patchset. The thing I'm not sure about is
> letting the device drivers set their constraints, because they also
> don't have the full system integration details. For cases where devices
> are behind an IOMMU it is easy enough for the device to know, but what
> about when we have external MMUs out on the bus for anyone to use (I'm
> guessing you remember TILER..).
> 
> I would propose the central allocator keep per-system knowledge (or
> fetch it from DT, or if this is considered policy then userspace) which
> it can use to directly check the attached devices and pick the right memory.
> 
> Anyway the central system allocator could handle 90% of cases I can
> think of, and this is where Ion comes back in, the other cases would
> still require the program to manually pick the right memory (maybe for
> performance reasons, etc.).
> 
> So my vision is to have Ion as the the main front-end for DMA-BUF
> allocations, and expose the central allocator through it (maybe as a
> default heap type that can be populated on a per-system basis), but also
> have other individual heap types exported for the edge cases where
> manual selection is needed like we do now.
> 
> Hence why Ion should allow direct control of the dma_buf_ops from the
> heaps, so we can build central allocators as Ion heaps.
> 
> If I'm off into the weeds here and you have some other ideas I'm all ears.
> 

This is a topic I've gone around a few times. The crux of it is, as
you know, a central allocator is Really Hard. I don't know what you've
seen/done so far in this area, so please forgive me if this is old hat
to you.

Android's 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-23 Thread Andrew F. Davis
On 1/22/19 9:23 PM, Sumit Semwal wrote:
> Hello everyone,
> 
> (Thanks to Dan for letting me know my last email got corrupted :/ -
> resending it here)
> 

Hmm, this one seems a bit messed up also (Thunderbird doesn't seem to
like it at least).

[snip]

> - from dma-buf PoV, ION is an exporter of dma-buf buffers, for its users
> that have specific requirements.
> 

This is what I'm hoping to change up a little bit, Ion shouldn't be the
exporter, its heaps should be the exporters (manage the dma_buf_ops),
Ion would only do advertising of available heaps and allow allocating
DMA-BUFs from them.

IMO that would clear up the other discussions going on right now about
how Ion should handle different dma-buf syncing tasks, it simply
wouldn't :). Plus Ion core gets slimmed down, maybe even enough for
destaging..

>> I haven't either, which is a shame as it allows for some really useful
>> management strategies for shared memory resources. I'm working on one
>> such case right now, maybe I'll get to be the first to upstream one :)
>>
> Yes, it would, and great that you're looking to be the first one to do it :)
> 
>> > I wasn't aware that CPU access before first device access was
>> > considered an abuse of the API - it seems like a valid thing to want
>> > to do.
>> >
>>
>> That's just it, I don't know if it is an abuse of API, I'm trying to get
>> some clarity on that. If we do want to allow early CPU access then that
>> seems to be in contrast to the idea of deferred allocation until first
>> device map, what is supposed to backing the buffer if no devices have
>> attached or mapped yet? Just some system memory followed by migration on
>> the first attach to the proper backing? Seems too time wasteful to be
>> have a valid use.
>>
>> Maybe it should be up to the exporter if early CPU access is allowed?
>>
>> I'm hoping someone with authority over the DMA-BUF framework can clarify
>> original intentions here.
>>
> 
> I suppose dma-buf as a framework can't know or decide what the exporter
> wants or can do - whether the exporter wants to use it for 'only
> zero-copy', or do some intelligent things behind the scene, I think
> should be best left to the exporter.
> 
> Hope this helps,

Yes, these inputs are very helpful, thanks,
Andrew

> Sumit.
> 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-23 Thread Andrew F. Davis
On 1/22/19 11:33 AM, Sumit Semwal wrote:
> Hello everyone,
> 
> Sincere apologies for chiming in a bit late here, but was off due to
> some health issues.
> 

Hope you are feeling better friend :)

Looks like this email was a bit broken and you replied again, the
responses are a little different in each email, so I'd like to respond
to bits of both, I'll fix up the formatting.

> Also, adding Daniel Vetter to the mix, since he has been one of the
> core guys who shaped up dma-buf as it is today.
> 
> On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis  wrote:
>>
>> On 1/21/19 5:22 AM, Brian Starkey wrote:

[snip]

>>>
>>> Actually I meant in the kernel, in exporters. I haven't seen anyone
>>> using the API as it was intended (defer allocation until first map,
>>> migrate between different attachments, etc.). Mostly, backing storage
>>> seems to get allocated at the point of export, and device mappings are
>>> often held persistently (e.g. the DRM prime code maps the buffer at
>>> import time, and keeps it mapped: drm_gem_prime_import_dev).
>>>
>>
> 
> So I suppose some clarification on the 'intended use' part of dma-buf
> about deferred allocation is due, so here it is: (Daniel, please feel
> free to chime in with your opinion here)
> 
>  - dma-buf was of course designed as a framework to help intelligent
> exporters to defer allocation until first map, and be able to migrate
> backing storage if required etc. At the same time, it is not a
> _requirement_ from any exporter, so exporters so far have just used it
> as a convenient mechanism for zero-copy.
> - ION is one of the few dma-buf exporters in kernel, which satisfies a
> certain set of expectations from its users.
> 

The issue here is that Ion is blocking the ability to late allocate, it
expects its heaps to have the memory ready at allocation time. My point
being if the DMA-BUFs intended design was to allow this then Ion should
respect that and also allow the same from its heap exporters.

>> I haven't either, which is a shame as it allows for some really useful
>> management strategies for shared memory resources. I'm working on one
>> such case right now, maybe I'll get to be the first to upstream one :)
>>
> That will be a really good thing! Though perhaps we ought to think if
> for what you're trying to do, is ION the right place, or should you
> have a device-specific exporter, available to users via dma-buf apis?
> 

I'm starting to question if Ion is the right place myself..

At a conceptual level I don't believe userspace should be picking the
backing memory type. This is because the right type of backing memory
for a task will change from system to system. The kernel should abstract
away these hardware differences from userspace as much as it can to
allow portable code.

For instance a device may need a contiguous buffer on one system but the
same device on another may have some IOMMU. So which type of memory do
we allocate? Same issue for cacheability and other properties.

What we need is a central allocator with full system knowledge to do the
choosing for us. It seems many agree with the above and I take
inspiration from your cenalloc patchset. The thing I'm not sure about is
letting the device drivers set their constraints, because they also
don't have the full system integration details. For cases where devices
are behind an IOMMU it is easy enough for the device to know, but what
about when we have external MMUs out on the bus for anyone to use (I'm
guessing you remember TILER..).

I would propose the central allocator keep per-system knowledge (or
fetch it from DT, or if this is considered policy then userspace) which
it can use to directly check the attached devices and pick the right memory.

Anyway the central system allocator could handle 90% of cases I can
think of, and this is where Ion comes back in, the other cases would
still require the program to manually pick the right memory (maybe for
performance reasons, etc.).

So my vision is to have Ion as the the main front-end for DMA-BUF
allocations, and expose the central allocator through it (maybe as a
default heap type that can be populated on a per-system basis), but also
have other individual heap types exported for the edge cases where
manual selection is needed like we do now.

Hence why Ion should allow direct control of the dma_buf_ops from the
heaps, so we can build central allocators as Ion heaps.

If I'm off into the weeds here and you have some other ideas I'm all ears.

Andrew

>>> I wasn't aware that CPU access before first device access was
>>> considered an abuse of the API - it seems like a valid thing to want
>>> to do.
>>>
>>
>> That's just it, I don't know if it is an abuse of API, I'm trying to get
>> some clarity on that. If we do want to allow early CPU access then that
>> seems to be in contrast to the idea of deferred allocation until first
>> device map, what is supposed to backing the buffer if no devices have
>> attached or mapped yet? Just some system 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-22 Thread Liam Mark
On Mon, 21 Jan 2019, Brian Starkey wrote:

> Hi,
> 
> Sorry for being a bit sporadic on this. I was out travelling last week
> with little time for email.
> 
> On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
> > On 1/17/19 7:11 PM, Liam Mark wrote:
> > > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> > > 
> > >> On 1/16/19 4:54 PM, Liam Mark wrote:
> > >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> > >>>
> >  On 1/16/19 9:19 AM, Brian Starkey wrote:
> > > Hi :-)
> > >
> > > On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> > >> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> > >>> On 1/15/19 11:45 AM, Liam Mark wrote:
> >  On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> > 
> > > On 1/14/19 11:13 AM, Liam Mark wrote:
> > >> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> > >>
> > >>> Buffers may not be mapped from the CPU so skip cache 
> > >>> maintenance here.
> > >>> Accesses from the CPU to a cached heap should be bracketed with
> > >>> {begin,end}_cpu_access calls so maintenance should not be 
> > >>> needed anyway.
> > >>>
> > >>> Signed-off-by: Andrew F. Davis 
> > >>> ---
> > >>>  drivers/staging/android/ion/ion.c | 7 ---
> > >>>  1 file changed, 4 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/drivers/staging/android/ion/ion.c 
> > >>> b/drivers/staging/android/ion/ion.c
> > >>> index 14e48f6eb734..09cb5a8e2b09 100644
> > >>> --- a/drivers/staging/android/ion/ion.c
> > >>> +++ b/drivers/staging/android/ion/ion.c
> > >>> @@ -261,8 +261,8 @@ static struct sg_table 
> > >>> *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> > >>>  
> > >>> table = a->table;
> > >>>  
> > >>> -   if (!dma_map_sg(attachment->dev, table->sgl, 
> > >>> table->nents,
> > >>> -   direction))
> > >>> +   if (!dma_map_sg_attrs(attachment->dev, table->sgl, 
> > >>> table->nents,
> > >>> + direction, 
> > >>> DMA_ATTR_SKIP_CPU_SYNC))
> > >>
> > >> Unfortunately I don't think you can do this for a couple reasons.
> > >> You can't rely on {begin,end}_cpu_access calls to do cache 
> > >> maintenance.
> > >> If the calls to {begin,end}_cpu_access were made before the call 
> > >> to 
> > >> dma_buf_attach then there won't have been a device attached so 
> > >> the calls 
> > >> to {begin,end}_cpu_access won't have done any cache maintenance.
> > >>
> > >
> > > That should be okay though, if you have no attachments (or all
> > > attachments are IO-coherent) then there is no need for cache
> > > maintenance. Unless you mean a sequence where a non-io-coherent 
> > > device
> > > is attached later after data has already been written. Does that
> > > sequence need supporting? 
> > 
> >  Yes, but also I think there are cases where CPU access can happen 
> >  before 
> >  in Android, but I will focus on later for now.
> > 
> > > DMA-BUF doesn't have to allocate the backing
> > > memory until map_dma_buf() time, and that should only happen 
> > > after all
> > > the devices have attached so it can know where to put the buffer. 
> > > So we
> > > shouldn't expect any CPU access to buffers before all the devices 
> > > are
> > > attached and mapped, right?
> > >
> > 
> >  Here is an example where CPU access can happen later in Android.
> > 
> >  Camera device records video -> software post processing -> video 
> >  device 
> >  (who does compression of raw data) and writes to a file
> > 
> >  In this example assume the buffer is cached and the devices are 
> >  not 
> >  IO-coherent (quite common).
> > 
> > >>>
> > >>> This is the start of the problem, having cached mappings of memory 
> > >>> that
> > >>> is also being accessed non-coherently is going to cause issues one 
> > >>> way
> > >>> or another. On top of the speculative cache fills that have to be
> > >>> constantly fought back against with CMOs like below; some coherent
> > >>> interconnects behave badly when you mix coherent and non-coherent 
> > >>> access
> > >>> (snoop filters get messed up).
> > >>>
> > >>> The solution is to either always have the addresses marked 
> > >>> non-coherent
> > >>> (like device memory, no-map carveouts), or if you really want to use
> > >>> regular system memory allocated at runtime, then all cached 
> > >>> mappings of
> > >>> it need to be dropped, 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-22 Thread Sumit Semwal
Hello everyone,

Sincere apologies for chiming in a bit late here, but was off due to
some health issues.

Also, adding Daniel Vetter to the mix, since he has been one of the
core guys who shaped up dma-buf as it is today.

On Tue, 22 Jan 2019 at 02:51, Andrew F. Davis  wrote:

 On 1/21/19 5:22 AM, Brian Starkey wrote:
  Hi,
 
  Sorry for being a bit sporadic on this. I was out travelling last week
  with little time for email.
 
  On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
  On 1/17/19 7:11 PM, Liam Mark wrote:
  On Thu, 17 Jan 2019, Andrew F. Davis wrote:
 
  On 1/16/19 4:54 PM, Liam Mark wrote:
  On Wed, 16 Jan 2019, Andrew F. Davis wrote:
 
  On 1/16/19 9:19 AM, Brian Starkey wrote:
  Hi :-)
 
  On Tue, Jan 15, 2019 at 12:40:16PM
-0600, Andrew F. Davis wrote:
  On 1/15/19 12:38 PM, Andrew F.
Davis wrote:
  On 1/15/19 11:45 AM, Liam Mark wrote:
  On Tue, 15 Jan 2019,
Andrew F. Davis wrote:
 
  On 1/14/19 11:13 AM,
Liam Mark wrote:
  On Fri, 11 Jan
2019, Andrew F. Davis wrote:
 
  Buffers may
not be mapped from the CPU so skip cache maintenance here.
  Accesses
from the CPU to a cached heap should be bracketed with
 
{begin,end}_cpu_access calls so maintenance should not be needed
anyway.
 
 
Signed-off-by: Andrew F. Davis 
  ---
 
drivers/staging/android/ion/ion.c | 7 ---
   1 file
changed, 4 insertions(+), 3 deletions(-)
 
  diff --git
a/drivers/staging/android/ion/ion.c
b/drivers/staging/android/ion/ion.c
  index
14e48f6eb734..09cb5a8e2b09 100644
  ---
a/drivers/staging/android/ion/ion.c
  +++
b/drivers/staging/android/ion/ion.c
  @@ -261,8
+261,8 @@ static struct sg_table *ion_map_dma_buf(struct
dma_buf_attachment *attachment,
 
table = a-table;
 
  - if
(!dma_map_sg(attachment-dev, table-sgl, table-nents,
  -
 direction))
  + if
(!dma_map_sg_attrs(attachment-dev, table-sgl, table-nents,
  +
   direction, DMA_ATTR_SKIP_CPU_SYNC))
 
  Unfortunately I
don't think you can do this for a couple reasons.
  You can't rely
on {begin,end}_cpu_access calls to do cache maintenance.
  If the calls to
{begin,end}_cpu_access were made before the call to
  dma_buf_attach
then there won't have been a device attached so the calls
  to
{begin,end}_cpu_access won't have done any cache maintenance.
 
 
  That should be okay
though, if you have no attachments (or all
  attachments are
IO-coherent) then there is no need for cache
  maintenance. Unless
you mean a sequence where a non-io-coherent device
  is attached later
after data has already been written. Does that
  sequence need supporting?
 
  Yes, but also I think
there are cases where CPU access can happen before
  in Android, but I will
focus on later for now.
 
  DMA-BUF doesn't have
to allocate the backing
  memory until
map_dma_buf() time, and that should only happen after all
  the devices have
attached so it can know where to put the buffer. So we
  shouldn't expect any
CPU access to buffers before all the devices are
  attached and mapped, right?
 
 
  Here is an example where
CPU access can happen later in Android.
 
  Camera device records
video - software post processing - video device
  (who does compression of
raw data) and writes to a file
 
  In this example assume
the buffer is cached and the devices are not
  IO-coherent (quite common).
 
 
  This is the start of the
problem, having cached mappings of memory that
  is also being accessed
non-coherently is going to cause issues one way
  or another. On top of the
speculative cache fills that have to be
  constantly fought back
against with CMOs like below; some coherent
  interconnects behave badly
when you mix coherent and non-coherent access
  (snoop filters get messed up).
 
  The solution is to either
always have the addresses marked non-coherent
  (like device memory, no-map
carveouts), or if you really want to use
  regular system memory
allocated at runtime, then all cached mappings of
  it need to be dropped, even
the kernel logical address (area as painful
  as that would be).
 
  Ouch :-( I wasn't aware about these
potential interconnect issues. How
  "real" is that? It seems that we
aren't really hitting that today on
  real devices.
 
 
  Sadly there is at least one real device
like this now (TI AM654). We
  spent some time working with the ARM
interconnect spec designers to see
  if this was allowed behavior, final
conclusion was mixing coherent and
  non-coherent accesses is never a good
idea.. So we have been working to
  try to minimize any cases of mixed
attributes [0], if a region is
  coherent then everyone in the system
needs to treat it as such and
  vice-versa, even clever CMO that work on
other systems wont save you
  here. :(
 
  [0]
https://github.com/ARM-software/arm-trusted-firmware/pull/1553
 
 
  "Never a good idea" - but I think it should still be well defined by
  the ARMv8 ARM (Section B2.8). Does this apply to your system?
 
  "If the mismatched attributes for a memory location all assign the
  same shareability attribute to 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-21 Thread Andrew F. Davis
On 1/21/19 5:22 AM, Brian Starkey wrote:
> Hi,
> 
> Sorry for being a bit sporadic on this. I was out travelling last week
> with little time for email.
> 
> On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
>> On 1/17/19 7:11 PM, Liam Mark wrote:
>>> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/16/19 4:54 PM, Liam Mark wrote:
> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/16/19 9:19 AM, Brian Starkey wrote:
>>> Hi :-)
>>>
>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
 On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> On 1/15/19 11:45 AM, Liam Mark wrote:
>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>
>>> On 1/14/19 11:13 AM, Liam Mark wrote:
 On Fri, 11 Jan 2019, Andrew F. Davis wrote:

> Buffers may not be mapped from the CPU so skip cache maintenance 
> here.
> Accesses from the CPU to a cached heap should be bracketed with
> {begin,end}_cpu_access calls so maintenance should not be needed 
> anyway.
>
> Signed-off-by: Andrew F. Davis 
> ---
>  drivers/staging/android/ion/ion.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/android/ion/ion.c 
> b/drivers/staging/android/ion/ion.c
> index 14e48f6eb734..09cb5a8e2b09 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -261,8 +261,8 @@ static struct sg_table 
> *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>  
>   table = a->table;
>  
> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> - direction))
> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> +   direction, DMA_ATTR_SKIP_CPU_SYNC))

 Unfortunately I don't think you can do this for a couple reasons.
 You can't rely on {begin,end}_cpu_access calls to do cache 
 maintenance.
 If the calls to {begin,end}_cpu_access were made before the call 
 to 
 dma_buf_attach then there won't have been a device attached so the 
 calls 
 to {begin,end}_cpu_access won't have done any cache maintenance.

>>>
>>> That should be okay though, if you have no attachments (or all
>>> attachments are IO-coherent) then there is no need for cache
>>> maintenance. Unless you mean a sequence where a non-io-coherent 
>>> device
>>> is attached later after data has already been written. Does that
>>> sequence need supporting? 
>>
>> Yes, but also I think there are cases where CPU access can happen 
>> before 
>> in Android, but I will focus on later for now.
>>
>>> DMA-BUF doesn't have to allocate the backing
>>> memory until map_dma_buf() time, and that should only happen after 
>>> all
>>> the devices have attached so it can know where to put the buffer. 
>>> So we
>>> shouldn't expect any CPU access to buffers before all the devices 
>>> are
>>> attached and mapped, right?
>>>
>>
>> Here is an example where CPU access can happen later in Android.
>>
>> Camera device records video -> software post processing -> video 
>> device 
>> (who does compression of raw data) and writes to a file
>>
>> In this example assume the buffer is cached and the devices are not 
>> IO-coherent (quite common).
>>
>
> This is the start of the problem, having cached mappings of memory 
> that
> is also being accessed non-coherently is going to cause issues one way
> or another. On top of the speculative cache fills that have to be
> constantly fought back against with CMOs like below; some coherent
> interconnects behave badly when you mix coherent and non-coherent 
> access
> (snoop filters get messed up).
>
> The solution is to either always have the addresses marked 
> non-coherent
> (like device memory, no-map carveouts), or if you really want to use
> regular system memory allocated at runtime, then all cached mappings 
> of
> it need to be dropped, even the kernel logical address (area as 
> painful
> as that would be).
>>>
>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
>>> "real" is that? It seems that we aren't really hitting that today on
>>> real devices.
>>>
>>
>> Sadly there 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-21 Thread Liam Mark
On Fri, 18 Jan 2019, Andrew F. Davis wrote:

> On 1/17/19 7:11 PM, Liam Mark wrote:
> > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/16/19 4:54 PM, Liam Mark wrote:
> >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >>>
>  On 1/16/19 9:19 AM, Brian Starkey wrote:
> > Hi :-)
> >
> > On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> >> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> >>> On 1/15/19 11:45 AM, Liam Mark wrote:
>  On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> 
> > On 1/14/19 11:13 AM, Liam Mark wrote:
> >> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>
> >>> Buffers may not be mapped from the CPU so skip cache maintenance 
> >>> here.
> >>> Accesses from the CPU to a cached heap should be bracketed with
> >>> {begin,end}_cpu_access calls so maintenance should not be needed 
> >>> anyway.
> >>>
> >>> Signed-off-by: Andrew F. Davis 
> >>> ---
> >>>  drivers/staging/android/ion/ion.c | 7 ---
> >>>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/staging/android/ion/ion.c 
> >>> b/drivers/staging/android/ion/ion.c
> >>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>> --- a/drivers/staging/android/ion/ion.c
> >>> +++ b/drivers/staging/android/ion/ion.c
> >>> @@ -261,8 +261,8 @@ static struct sg_table 
> >>> *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>>  
> >>>   table = a->table;
> >>>  
> >>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>> - direction))
> >>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>> +   direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>
> >> Unfortunately I don't think you can do this for a couple reasons.
> >> You can't rely on {begin,end}_cpu_access calls to do cache 
> >> maintenance.
> >> If the calls to {begin,end}_cpu_access were made before the call 
> >> to 
> >> dma_buf_attach then there won't have been a device attached so the 
> >> calls 
> >> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>
> >
> > That should be okay though, if you have no attachments (or all
> > attachments are IO-coherent) then there is no need for cache
> > maintenance. Unless you mean a sequence where a non-io-coherent 
> > device
> > is attached later after data has already been written. Does that
> > sequence need supporting? 
> 
>  Yes, but also I think there are cases where CPU access can happen 
>  before 
>  in Android, but I will focus on later for now.
> 
> > DMA-BUF doesn't have to allocate the backing
> > memory until map_dma_buf() time, and that should only happen after 
> > all
> > the devices have attached so it can know where to put the buffer. 
> > So we
> > shouldn't expect any CPU access to buffers before all the devices 
> > are
> > attached and mapped, right?
> >
> 
>  Here is an example where CPU access can happen later in Android.
> 
>  Camera device records video -> software post processing -> video 
>  device 
>  (who does compression of raw data) and writes to a file
> 
>  In this example assume the buffer is cached and the devices are not 
>  IO-coherent (quite common).
> 
> >>>
> >>> This is the start of the problem, having cached mappings of memory 
> >>> that
> >>> is also being accessed non-coherently is going to cause issues one way
> >>> or another. On top of the speculative cache fills that have to be
> >>> constantly fought back against with CMOs like below; some coherent
> >>> interconnects behave badly when you mix coherent and non-coherent 
> >>> access
> >>> (snoop filters get messed up).
> >>>
> >>> The solution is to either always have the addresses marked 
> >>> non-coherent
> >>> (like device memory, no-map carveouts), or if you really want to use
> >>> regular system memory allocated at runtime, then all cached mappings 
> >>> of
> >>> it need to be dropped, even the kernel logical address (area as 
> >>> painful
> >>> as that would be).
> >
> > Ouch :-( I wasn't aware about these potential interconnect issues. How
> > "real" is that? It seems that we aren't really hitting that today on
> > real devices.
> >
> 
>  Sadly there is at least one real device like this now (TI AM654). We
>  spent some time working with the ARM interconnect spec designers to see
>  if this was allowed behavior, final 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-21 Thread Liam Mark
On Mon, 21 Jan 2019, Andrew F. Davis wrote:

> On 1/18/19 3:43 PM, Liam Mark wrote:
> > On Fri, 18 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/17/19 7:04 PM, Liam Mark wrote:
> >>> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> >>>
>  On 1/16/19 4:48 PM, Liam Mark wrote:
> > On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/15/19 1:05 PM, Laura Abbott wrote:
> >>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>  On 1/15/19 11:45 AM, Liam Mark wrote:
> > On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>
>  Buffers may not be mapped from the CPU so skip cache maintenance
>  here.
>  Accesses from the CPU to a cached heap should be bracketed with
>  {begin,end}_cpu_access calls so maintenance should not be needed
>  anyway.
> 
>  Signed-off-by: Andrew F. Davis 
>  ---
>    drivers/staging/android/ion/ion.c | 7 ---
>    1 file changed, 4 insertions(+), 3 deletions(-)
> 
>  diff --git a/drivers/staging/android/ion/ion.c
>  b/drivers/staging/android/ion/ion.c
>  index 14e48f6eb734..09cb5a8e2b09 100644
>  --- a/drivers/staging/android/ion/ion.c
>  +++ b/drivers/staging/android/ion/ion.c
>  @@ -261,8 +261,8 @@ static struct sg_table 
>  *ion_map_dma_buf(struct
>  dma_buf_attachment *attachment,
>      table = a->table;
>    -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>  -    direction))
>  +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, 
>  table->nents,
>  +  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>
> >>> Unfortunately I don't think you can do this for a couple reasons.
> >>> You can't rely on {begin,end}_cpu_access calls to do cache
> >>> maintenance.
> >>> If the calls to {begin,end}_cpu_access were made before the call 
> >>> to
> >>> dma_buf_attach then there won't have been a device attached so the
> >>> calls
> >>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>
> >>
> >> That should be okay though, if you have no attachments (or all
> >> attachments are IO-coherent) then there is no need for cache
> >> maintenance. Unless you mean a sequence where a non-io-coherent 
> >> device
> >> is attached later after data has already been written. Does that
> >> sequence need supporting?
> >
> > Yes, but also I think there are cases where CPU access can happen 
> > before
> > in Android, but I will focus on later for now.
> >
> >> DMA-BUF doesn't have to allocate the backing
> >> memory until map_dma_buf() time, and that should only happen after 
> >> all
> >> the devices have attached so it can know where to put the buffer. 
> >> So we
> >> shouldn't expect any CPU access to buffers before all the devices 
> >> are
> >> attached and mapped, right?
> >>
> >
> > Here is an example where CPU access can happen later in Android.
> >
> > Camera device records video -> software post processing -> video 
> > device
> > (who does compression of raw data) and writes to a file
> >
> > In this example assume the buffer is cached and the devices are not
> > IO-coherent (quite common).
> >
> 
>  This is the start of the problem, having cached mappings of memory 
>  that
>  is also being accessed non-coherently is going to cause issues one 
>  way
>  or another. On top of the speculative cache fills that have to be
>  constantly fought back against with CMOs like below; some coherent
>  interconnects behave badly when you mix coherent and non-coherent 
>  access
>  (snoop filters get messed up).
> 
>  The solution is to either always have the addresses marked 
>  non-coherent
>  (like device memory, no-map carveouts), or if you really want to use
>  regular system memory allocated at runtime, then all cached mappings 
>  of
>  it need to be dropped, even the kernel logical address (area as 
>  painful
>  as that would be).
> 
> >>>
> >>> I agree it's broken, hence my desire to remove it :)
> >>>
> >>> The other problem is that uncached buffers are being used for
> >>> performance reason so anything that would involve getting
> >>> rid of the logical address would probably negate 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-21 Thread Andrew F. Davis
On 1/18/19 3:43 PM, Liam Mark wrote:
> On Fri, 18 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/17/19 7:04 PM, Liam Mark wrote:
>>> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/16/19 4:48 PM, Liam Mark wrote:
> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
 On 1/15/19 11:45 AM, Liam Mark wrote:
> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>
 Buffers may not be mapped from the CPU so skip cache maintenance
 here.
 Accesses from the CPU to a cached heap should be bracketed with
 {begin,end}_cpu_access calls so maintenance should not be needed
 anyway.

 Signed-off-by: Andrew F. Davis 
 ---
   drivers/staging/android/ion/ion.c | 7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/drivers/staging/android/ion/ion.c
 b/drivers/staging/android/ion/ion.c
 index 14e48f6eb734..09cb5a8e2b09 100644
 --- a/drivers/staging/android/ion/ion.c
 +++ b/drivers/staging/android/ion/ion.c
 @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
 dma_buf_attachment *attachment,
     table = a->table;
   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
 -    direction))
 +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, 
 table->nents,
 +  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>
>>> Unfortunately I don't think you can do this for a couple reasons.
>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>> maintenance.
>>> If the calls to {begin,end}_cpu_access were made before the call to
>>> dma_buf_attach then there won't have been a device attached so the
>>> calls
>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>
>>
>> That should be okay though, if you have no attachments (or all
>> attachments are IO-coherent) then there is no need for cache
>> maintenance. Unless you mean a sequence where a non-io-coherent 
>> device
>> is attached later after data has already been written. Does that
>> sequence need supporting?
>
> Yes, but also I think there are cases where CPU access can happen 
> before
> in Android, but I will focus on later for now.
>
>> DMA-BUF doesn't have to allocate the backing
>> memory until map_dma_buf() time, and that should only happen after 
>> all
>> the devices have attached so it can know where to put the buffer. So 
>> we
>> shouldn't expect any CPU access to buffers before all the devices are
>> attached and mapped, right?
>>
>
> Here is an example where CPU access can happen later in Android.
>
> Camera device records video -> software post processing -> video 
> device
> (who does compression of raw data) and writes to a file
>
> In this example assume the buffer is cached and the devices are not
> IO-coherent (quite common).
>

 This is the start of the problem, having cached mappings of memory that
 is also being accessed non-coherently is going to cause issues one way
 or another. On top of the speculative cache fills that have to be
 constantly fought back against with CMOs like below; some coherent
 interconnects behave badly when you mix coherent and non-coherent 
 access
 (snoop filters get messed up).

 The solution is to either always have the addresses marked non-coherent
 (like device memory, no-map carveouts), or if you really want to use
 regular system memory allocated at runtime, then all cached mappings of
 it need to be dropped, even the kernel logical address (area as painful
 as that would be).

>>>
>>> I agree it's broken, hence my desire to remove it :)
>>>
>>> The other problem is that uncached buffers are being used for
>>> performance reason so anything that would involve getting
>>> rid of the logical address would probably negate any performance
>>> benefit.
>>>
>>
>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>> adamant that they have valid uses. I'm just not sure performance is one
>> of them, maybe in the case of software locks between devices or
>> something where there needs to be a lot of back and forth interleaved

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-21 Thread Brian Starkey
Hi,

Sorry for being a bit sporadic on this. I was out travelling last week
with little time for email.

On Fri, Jan 18, 2019 at 11:16:31AM -0600, Andrew F. Davis wrote:
> On 1/17/19 7:11 PM, Liam Mark wrote:
> > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/16/19 4:54 PM, Liam Mark wrote:
> >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >>>
>  On 1/16/19 9:19 AM, Brian Starkey wrote:
> > Hi :-)
> >
> > On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> >> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> >>> On 1/15/19 11:45 AM, Liam Mark wrote:
>  On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> 
> > On 1/14/19 11:13 AM, Liam Mark wrote:
> >> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>
> >>> Buffers may not be mapped from the CPU so skip cache maintenance 
> >>> here.
> >>> Accesses from the CPU to a cached heap should be bracketed with
> >>> {begin,end}_cpu_access calls so maintenance should not be needed 
> >>> anyway.
> >>>
> >>> Signed-off-by: Andrew F. Davis 
> >>> ---
> >>>  drivers/staging/android/ion/ion.c | 7 ---
> >>>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/staging/android/ion/ion.c 
> >>> b/drivers/staging/android/ion/ion.c
> >>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>> --- a/drivers/staging/android/ion/ion.c
> >>> +++ b/drivers/staging/android/ion/ion.c
> >>> @@ -261,8 +261,8 @@ static struct sg_table 
> >>> *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> >>>  
> >>>   table = a->table;
> >>>  
> >>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>> - direction))
> >>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>> +   direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>
> >> Unfortunately I don't think you can do this for a couple reasons.
> >> You can't rely on {begin,end}_cpu_access calls to do cache 
> >> maintenance.
> >> If the calls to {begin,end}_cpu_access were made before the call 
> >> to 
> >> dma_buf_attach then there won't have been a device attached so the 
> >> calls 
> >> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>
> >
> > That should be okay though, if you have no attachments (or all
> > attachments are IO-coherent) then there is no need for cache
> > maintenance. Unless you mean a sequence where a non-io-coherent 
> > device
> > is attached later after data has already been written. Does that
> > sequence need supporting? 
> 
>  Yes, but also I think there are cases where CPU access can happen 
>  before 
>  in Android, but I will focus on later for now.
> 
> > DMA-BUF doesn't have to allocate the backing
> > memory until map_dma_buf() time, and that should only happen after 
> > all
> > the devices have attached so it can know where to put the buffer. 
> > So we
> > shouldn't expect any CPU access to buffers before all the devices 
> > are
> > attached and mapped, right?
> >
> 
>  Here is an example where CPU access can happen later in Android.
> 
>  Camera device records video -> software post processing -> video 
>  device 
>  (who does compression of raw data) and writes to a file
> 
>  In this example assume the buffer is cached and the devices are not 
>  IO-coherent (quite common).
> 
> >>>
> >>> This is the start of the problem, having cached mappings of memory 
> >>> that
> >>> is also being accessed non-coherently is going to cause issues one way
> >>> or another. On top of the speculative cache fills that have to be
> >>> constantly fought back against with CMOs like below; some coherent
> >>> interconnects behave badly when you mix coherent and non-coherent 
> >>> access
> >>> (snoop filters get messed up).
> >>>
> >>> The solution is to either always have the addresses marked 
> >>> non-coherent
> >>> (like device memory, no-map carveouts), or if you really want to use
> >>> regular system memory allocated at runtime, then all cached mappings 
> >>> of
> >>> it need to be dropped, even the kernel logical address (area as 
> >>> painful
> >>> as that would be).
> >
> > Ouch :-( I wasn't aware about these potential interconnect issues. How
> > "real" is that? It seems that we aren't really hitting that today on
> > real devices.
> >
> 
>  Sadly there is at least one real device like this now (TI AM654). 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-19 Thread Christoph Hellwig
On Fri, Jan 18, 2019 at 12:31:34PM -0800, Laura Abbott wrote:
> I thought about doing that, the problem is it becomes an ABI break for
> existing users which I really didn't want to do again. If it
> ends up being the last thing we do before moving out of staging,
> I'd consider doing it.

This is staging code, so any existing users by defintion do not matter.
Let's not drag Linux development down over this.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-18 Thread Liam Mark
On Fri, 18 Jan 2019, Andrew F. Davis wrote:

> On 1/17/19 7:04 PM, Liam Mark wrote:
> > On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/16/19 4:48 PM, Liam Mark wrote:
> >>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> >>>
>  On 1/15/19 1:05 PM, Laura Abbott wrote:
> > On 1/15/19 10:38 AM, Andrew F. Davis wrote:
> >> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>
>  On 1/14/19 11:13 AM, Liam Mark wrote:
> > On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >
> >> Buffers may not be mapped from the CPU so skip cache maintenance
> >> here.
> >> Accesses from the CPU to a cached heap should be bracketed with
> >> {begin,end}_cpu_access calls so maintenance should not be needed
> >> anyway.
> >>
> >> Signed-off-by: Andrew F. Davis 
> >> ---
> >>   drivers/staging/android/ion/ion.c | 7 ---
> >>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/staging/android/ion/ion.c
> >> b/drivers/staging/android/ion/ion.c
> >> index 14e48f6eb734..09cb5a8e2b09 100644
> >> --- a/drivers/staging/android/ion/ion.c
> >> +++ b/drivers/staging/android/ion/ion.c
> >> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
> >> dma_buf_attachment *attachment,
> >>     table = a->table;
> >>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >> -    direction))
> >> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, 
> >> table->nents,
> >> +  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >
> > Unfortunately I don't think you can do this for a couple reasons.
> > You can't rely on {begin,end}_cpu_access calls to do cache
> > maintenance.
> > If the calls to {begin,end}_cpu_access were made before the call to
> > dma_buf_attach then there won't have been a device attached so the
> > calls
> > to {begin,end}_cpu_access won't have done any cache maintenance.
> >
> 
>  That should be okay though, if you have no attachments (or all
>  attachments are IO-coherent) then there is no need for cache
>  maintenance. Unless you mean a sequence where a non-io-coherent 
>  device
>  is attached later after data has already been written. Does that
>  sequence need supporting?
> >>>
> >>> Yes, but also I think there are cases where CPU access can happen 
> >>> before
> >>> in Android, but I will focus on later for now.
> >>>
>  DMA-BUF doesn't have to allocate the backing
>  memory until map_dma_buf() time, and that should only happen after 
>  all
>  the devices have attached so it can know where to put the buffer. So 
>  we
>  shouldn't expect any CPU access to buffers before all the devices are
>  attached and mapped, right?
> 
> >>>
> >>> Here is an example where CPU access can happen later in Android.
> >>>
> >>> Camera device records video -> software post processing -> video 
> >>> device
> >>> (who does compression of raw data) and writes to a file
> >>>
> >>> In this example assume the buffer is cached and the devices are not
> >>> IO-coherent (quite common).
> >>>
> >>
> >> This is the start of the problem, having cached mappings of memory that
> >> is also being accessed non-coherently is going to cause issues one way
> >> or another. On top of the speculative cache fills that have to be
> >> constantly fought back against with CMOs like below; some coherent
> >> interconnects behave badly when you mix coherent and non-coherent 
> >> access
> >> (snoop filters get messed up).
> >>
> >> The solution is to either always have the addresses marked non-coherent
> >> (like device memory, no-map carveouts), or if you really want to use
> >> regular system memory allocated at runtime, then all cached mappings of
> >> it need to be dropped, even the kernel logical address (area as painful
> >> as that would be).
> >>
> >
> > I agree it's broken, hence my desire to remove it :)
> >
> > The other problem is that uncached buffers are being used for
> > performance reason so anything that would involve getting
> > rid of the logical address would probably negate any performance
> > benefit.
> >
> 
>  I wouldn't go as far as to remove them just yet.. Liam seems pretty
>  adamant that they have valid uses. I'm just not sure performance is one
>  of them, maybe in the case of software locks between devices or
>  something where there needs to be a lot of back and forth interleaved
>  access on small amounts of data?
> 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-18 Thread Laura Abbott

On 1/18/19 12:43 PM, Andrew F. Davis wrote:

On 1/18/19 2:31 PM, Laura Abbott wrote:

On 1/17/19 8:13 AM, Andrew F. Davis wrote:

On 1/16/19 4:48 PM, Liam Mark wrote:

On Wed, 16 Jan 2019, Andrew F. Davis wrote:


On 1/15/19 1:05 PM, Laura Abbott wrote:

On 1/15/19 10:38 AM, Andrew F. Davis wrote:

On 1/15/19 11:45 AM, Liam Mark wrote:

On Tue, 15 Jan 2019, Andrew F. Davis wrote:


On 1/14/19 11:13 AM, Liam Mark wrote:

On Fri, 11 Jan 2019, Andrew F. Davis wrote:


Buffers may not be mapped from the CPU so skip cache maintenance
here.
Accesses from the CPU to a cached heap should be bracketed with
{begin,end}_cpu_access calls so maintenance should not be needed
anyway.

Signed-off-by: Andrew F. Davis 
---
    drivers/staging/android/ion/ion.c | 7 ---
    1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c
b/drivers/staging/android/ion/ion.c
index 14e48f6eb734..09cb5a8e2b09 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -261,8 +261,8 @@ static struct sg_table
*ion_map_dma_buf(struct
dma_buf_attachment *attachment,
      table = a->table;
    -    if (!dma_map_sg(attachment->dev, table->sgl,
table->nents,
-    direction))
+    if (!dma_map_sg_attrs(attachment->dev, table->sgl,
table->nents,
+  direction, DMA_ATTR_SKIP_CPU_SYNC))


Unfortunately I don't think you can do this for a couple reasons.
You can't rely on {begin,end}_cpu_access calls to do cache
maintenance.
If the calls to {begin,end}_cpu_access were made before the
call to
dma_buf_attach then there won't have been a device attached so the
calls
to {begin,end}_cpu_access won't have done any cache maintenance.



That should be okay though, if you have no attachments (or all
attachments are IO-coherent) then there is no need for cache
maintenance. Unless you mean a sequence where a non-io-coherent
device
is attached later after data has already been written. Does that
sequence need supporting?


Yes, but also I think there are cases where CPU access can happen
before
in Android, but I will focus on later for now.


DMA-BUF doesn't have to allocate the backing
memory until map_dma_buf() time, and that should only happen
after all
the devices have attached so it can know where to put the
buffer. So we
shouldn't expect any CPU access to buffers before all the
devices are
attached and mapped, right?



Here is an example where CPU access can happen later in Android.

Camera device records video -> software post processing -> video
device
(who does compression of raw data) and writes to a file

In this example assume the buffer is cached and the devices are not
IO-coherent (quite common).



This is the start of the problem, having cached mappings of memory
that
is also being accessed non-coherently is going to cause issues one
way
or another. On top of the speculative cache fills that have to be
constantly fought back against with CMOs like below; some coherent
interconnects behave badly when you mix coherent and non-coherent
access
(snoop filters get messed up).

The solution is to either always have the addresses marked
non-coherent
(like device memory, no-map carveouts), or if you really want to use
regular system memory allocated at runtime, then all cached
mappings of
it need to be dropped, even the kernel logical address (area as
painful
as that would be).



I agree it's broken, hence my desire to remove it :)

The other problem is that uncached buffers are being used for
performance reason so anything that would involve getting
rid of the logical address would probably negate any performance
benefit.



I wouldn't go as far as to remove them just yet.. Liam seems pretty
adamant that they have valid uses. I'm just not sure performance is one
of them, maybe in the case of software locks between devices or
something where there needs to be a lot of back and forth interleaved
access on small amounts of data?



I wasn't aware that ARM considered this not supported, I thought it was
supported but they advised against it because of the potential
performance
impact.



Not sure what you mean by "this" being not supported, do you mean mixed
attribute mappings? If so, it will certainly cause problems, and the
problems will change from platform to platform, avoid at all costs is my
understanding of ARM's position.


This is after all supported in the DMA APIs and up until now devices
have
been successfully commercializing with this configurations, and I think
they will continue to commercialize with these configurations for
quite a
while.



Use of uncached memory mappings are almost always wrong in my experience
and are used to work around some bug or because the user doesn't want to
implement proper CMOs. Counter examples welcome.


It would be really unfortunate if support was removed as I think that
would drive clients away from using upstream ION.



I'm not petitioning to remove support, but at very least lets reverse
the 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-18 Thread Andrew F. Davis
On 1/18/19 2:31 PM, Laura Abbott wrote:
> On 1/17/19 8:13 AM, Andrew F. Davis wrote:
>> On 1/16/19 4:48 PM, Liam Mark wrote:
>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/15/19 1:05 PM, Laura Abbott wrote:
> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/14/19 11:13 AM, Liam Mark wrote:
> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>
>> Buffers may not be mapped from the CPU so skip cache maintenance
>> here.
>> Accesses from the CPU to a cached heap should be bracketed with
>> {begin,end}_cpu_access calls so maintenance should not be needed
>> anyway.
>>
>> Signed-off-by: Andrew F. Davis 
>> ---
>>    drivers/staging/android/ion/ion.c | 7 ---
>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/staging/android/ion/ion.c
>> b/drivers/staging/android/ion/ion.c
>> index 14e48f6eb734..09cb5a8e2b09 100644
>> --- a/drivers/staging/android/ion/ion.c
>> +++ b/drivers/staging/android/ion/ion.c
>> @@ -261,8 +261,8 @@ static struct sg_table
>> *ion_map_dma_buf(struct
>> dma_buf_attachment *attachment,
>>      table = a->table;
>>    -    if (!dma_map_sg(attachment->dev, table->sgl,
>> table->nents,
>> -    direction))
>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl,
>> table->nents,
>> +  direction, DMA_ATTR_SKIP_CPU_SYNC))
>
> Unfortunately I don't think you can do this for a couple reasons.
> You can't rely on {begin,end}_cpu_access calls to do cache
> maintenance.
> If the calls to {begin,end}_cpu_access were made before the
> call to
> dma_buf_attach then there won't have been a device attached so the
> calls
> to {begin,end}_cpu_access won't have done any cache maintenance.
>

 That should be okay though, if you have no attachments (or all
 attachments are IO-coherent) then there is no need for cache
 maintenance. Unless you mean a sequence where a non-io-coherent
 device
 is attached later after data has already been written. Does that
 sequence need supporting?
>>>
>>> Yes, but also I think there are cases where CPU access can happen
>>> before
>>> in Android, but I will focus on later for now.
>>>
 DMA-BUF doesn't have to allocate the backing
 memory until map_dma_buf() time, and that should only happen
 after all
 the devices have attached so it can know where to put the
 buffer. So we
 shouldn't expect any CPU access to buffers before all the
 devices are
 attached and mapped, right?

>>>
>>> Here is an example where CPU access can happen later in Android.
>>>
>>> Camera device records video -> software post processing -> video
>>> device
>>> (who does compression of raw data) and writes to a file
>>>
>>> In this example assume the buffer is cached and the devices are not
>>> IO-coherent (quite common).
>>>
>>
>> This is the start of the problem, having cached mappings of memory
>> that
>> is also being accessed non-coherently is going to cause issues one
>> way
>> or another. On top of the speculative cache fills that have to be
>> constantly fought back against with CMOs like below; some coherent
>> interconnects behave badly when you mix coherent and non-coherent
>> access
>> (snoop filters get messed up).
>>
>> The solution is to either always have the addresses marked
>> non-coherent
>> (like device memory, no-map carveouts), or if you really want to use
>> regular system memory allocated at runtime, then all cached
>> mappings of
>> it need to be dropped, even the kernel logical address (area as
>> painful
>> as that would be).
>>
>
> I agree it's broken, hence my desire to remove it :)
>
> The other problem is that uncached buffers are being used for
> performance reason so anything that would involve getting
> rid of the logical address would probably negate any performance
> benefit.
>

 I wouldn't go as far as to remove them just yet.. Liam seems pretty
 adamant that they have valid uses. I'm just not sure performance is one
 of them, maybe in the case of software locks between devices or
 something where there needs to be a lot of back and forth interleaved
 access on small amounts of data?

>>>
>>> I wasn't aware that ARM considered this not supported, I thought it was
>>> supported but they advised against it because of the potential
>>> performance
>>> impact.
>>>
>>
>> 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-18 Thread Laura Abbott

On 1/17/19 8:13 AM, Andrew F. Davis wrote:

On 1/16/19 4:48 PM, Liam Mark wrote:

On Wed, 16 Jan 2019, Andrew F. Davis wrote:


On 1/15/19 1:05 PM, Laura Abbott wrote:

On 1/15/19 10:38 AM, Andrew F. Davis wrote:

On 1/15/19 11:45 AM, Liam Mark wrote:

On Tue, 15 Jan 2019, Andrew F. Davis wrote:


On 1/14/19 11:13 AM, Liam Mark wrote:

On Fri, 11 Jan 2019, Andrew F. Davis wrote:


Buffers may not be mapped from the CPU so skip cache maintenance
here.
Accesses from the CPU to a cached heap should be bracketed with
{begin,end}_cpu_access calls so maintenance should not be needed
anyway.

Signed-off-by: Andrew F. Davis 
---
   drivers/staging/android/ion/ion.c | 7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c
b/drivers/staging/android/ion/ion.c
index 14e48f6eb734..09cb5a8e2b09 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
dma_buf_attachment *attachment,
     table = a->table;
   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
-    direction))
+    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
+  direction, DMA_ATTR_SKIP_CPU_SYNC))


Unfortunately I don't think you can do this for a couple reasons.
You can't rely on {begin,end}_cpu_access calls to do cache
maintenance.
If the calls to {begin,end}_cpu_access were made before the call to
dma_buf_attach then there won't have been a device attached so the
calls
to {begin,end}_cpu_access won't have done any cache maintenance.



That should be okay though, if you have no attachments (or all
attachments are IO-coherent) then there is no need for cache
maintenance. Unless you mean a sequence where a non-io-coherent device
is attached later after data has already been written. Does that
sequence need supporting?


Yes, but also I think there are cases where CPU access can happen before
in Android, but I will focus on later for now.


DMA-BUF doesn't have to allocate the backing
memory until map_dma_buf() time, and that should only happen after all
the devices have attached so it can know where to put the buffer. So we
shouldn't expect any CPU access to buffers before all the devices are
attached and mapped, right?



Here is an example where CPU access can happen later in Android.

Camera device records video -> software post processing -> video device
(who does compression of raw data) and writes to a file

In this example assume the buffer is cached and the devices are not
IO-coherent (quite common).



This is the start of the problem, having cached mappings of memory that
is also being accessed non-coherently is going to cause issues one way
or another. On top of the speculative cache fills that have to be
constantly fought back against with CMOs like below; some coherent
interconnects behave badly when you mix coherent and non-coherent access
(snoop filters get messed up).

The solution is to either always have the addresses marked non-coherent
(like device memory, no-map carveouts), or if you really want to use
regular system memory allocated at runtime, then all cached mappings of
it need to be dropped, even the kernel logical address (area as painful
as that would be).



I agree it's broken, hence my desire to remove it :)

The other problem is that uncached buffers are being used for
performance reason so anything that would involve getting
rid of the logical address would probably negate any performance
benefit.



I wouldn't go as far as to remove them just yet.. Liam seems pretty
adamant that they have valid uses. I'm just not sure performance is one
of them, maybe in the case of software locks between devices or
something where there needs to be a lot of back and forth interleaved
access on small amounts of data?



I wasn't aware that ARM considered this not supported, I thought it was
supported but they advised against it because of the potential performance
impact.



Not sure what you mean by "this" being not supported, do you mean mixed
attribute mappings? If so, it will certainly cause problems, and the
problems will change from platform to platform, avoid at all costs is my
understanding of ARM's position.


This is after all supported in the DMA APIs and up until now devices have
been successfully commercializing with this configurations, and I think
they will continue to commercialize with these configurations for quite a
while.



Use of uncached memory mappings are almost always wrong in my experience
and are used to work around some bug or because the user doesn't want to
implement proper CMOs. Counter examples welcome.


It would be really unfortunate if support was removed as I think that
would drive clients away from using upstream ION.



I'm not petitioning to remove support, but at very least lets reverse
the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by
default, to get uncached you 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-18 Thread Andrew F. Davis
On 1/17/19 7:11 PM, Liam Mark wrote:
> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/16/19 4:54 PM, Liam Mark wrote:
>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/16/19 9:19 AM, Brian Starkey wrote:
> Hi :-)
>
> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
>>> On 1/15/19 11:45 AM, Liam Mark wrote:
 On Tue, 15 Jan 2019, Andrew F. Davis wrote:

> On 1/14/19 11:13 AM, Liam Mark wrote:
>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>
>>> Buffers may not be mapped from the CPU so skip cache maintenance 
>>> here.
>>> Accesses from the CPU to a cached heap should be bracketed with
>>> {begin,end}_cpu_access calls so maintenance should not be needed 
>>> anyway.
>>>
>>> Signed-off-by: Andrew F. Davis 
>>> ---
>>>  drivers/staging/android/ion/ion.c | 7 ---
>>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/staging/android/ion/ion.c 
>>> b/drivers/staging/android/ion/ion.c
>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>> --- a/drivers/staging/android/ion/ion.c
>>> +++ b/drivers/staging/android/ion/ion.c
>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
>>> dma_buf_attachment *attachment,
>>>  
>>> table = a->table;
>>>  
>>> -   if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>> -   direction))
>>> +   if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>
>> Unfortunately I don't think you can do this for a couple reasons.
>> You can't rely on {begin,end}_cpu_access calls to do cache 
>> maintenance.
>> If the calls to {begin,end}_cpu_access were made before the call to 
>> dma_buf_attach then there won't have been a device attached so the 
>> calls 
>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>
>
> That should be okay though, if you have no attachments (or all
> attachments are IO-coherent) then there is no need for cache
> maintenance. Unless you mean a sequence where a non-io-coherent device
> is attached later after data has already been written. Does that
> sequence need supporting? 

 Yes, but also I think there are cases where CPU access can happen 
 before 
 in Android, but I will focus on later for now.

> DMA-BUF doesn't have to allocate the backing
> memory until map_dma_buf() time, and that should only happen after all
> the devices have attached so it can know where to put the buffer. So 
> we
> shouldn't expect any CPU access to buffers before all the devices are
> attached and mapped, right?
>

 Here is an example where CPU access can happen later in Android.

 Camera device records video -> software post processing -> video 
 device 
 (who does compression of raw data) and writes to a file

 In this example assume the buffer is cached and the devices are not 
 IO-coherent (quite common).

>>>
>>> This is the start of the problem, having cached mappings of memory that
>>> is also being accessed non-coherently is going to cause issues one way
>>> or another. On top of the speculative cache fills that have to be
>>> constantly fought back against with CMOs like below; some coherent
>>> interconnects behave badly when you mix coherent and non-coherent access
>>> (snoop filters get messed up).
>>>
>>> The solution is to either always have the addresses marked non-coherent
>>> (like device memory, no-map carveouts), or if you really want to use
>>> regular system memory allocated at runtime, then all cached mappings of
>>> it need to be dropped, even the kernel logical address (area as painful
>>> as that would be).
>
> Ouch :-( I wasn't aware about these potential interconnect issues. How
> "real" is that? It seems that we aren't really hitting that today on
> real devices.
>

 Sadly there is at least one real device like this now (TI AM654). We
 spent some time working with the ARM interconnect spec designers to see
 if this was allowed behavior, final conclusion was mixing coherent and
 non-coherent accesses is never a good idea.. So we have been working to
 try to minimize any cases of mixed attributes [0], if a region is
 coherent then everyone in the system needs to treat it as such and
 vice-versa, even clever CMO that work on other systems wont save you
 here. :(

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-18 Thread Andrew F. Davis
On 1/17/19 7:04 PM, Liam Mark wrote:
> On Thu, 17 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/16/19 4:48 PM, Liam Mark wrote:
>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/15/19 1:05 PM, Laura Abbott wrote:
> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/14/19 11:13 AM, Liam Mark wrote:
> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>
>> Buffers may not be mapped from the CPU so skip cache maintenance
>> here.
>> Accesses from the CPU to a cached heap should be bracketed with
>> {begin,end}_cpu_access calls so maintenance should not be needed
>> anyway.
>>
>> Signed-off-by: Andrew F. Davis 
>> ---
>>   drivers/staging/android/ion/ion.c | 7 ---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/staging/android/ion/ion.c
>> b/drivers/staging/android/ion/ion.c
>> index 14e48f6eb734..09cb5a8e2b09 100644
>> --- a/drivers/staging/android/ion/ion.c
>> +++ b/drivers/staging/android/ion/ion.c
>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>> dma_buf_attachment *attachment,
>>     table = a->table;
>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>> -    direction))
>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>> +  direction, DMA_ATTR_SKIP_CPU_SYNC))
>
> Unfortunately I don't think you can do this for a couple reasons.
> You can't rely on {begin,end}_cpu_access calls to do cache
> maintenance.
> If the calls to {begin,end}_cpu_access were made before the call to
> dma_buf_attach then there won't have been a device attached so the
> calls
> to {begin,end}_cpu_access won't have done any cache maintenance.
>

 That should be okay though, if you have no attachments (or all
 attachments are IO-coherent) then there is no need for cache
 maintenance. Unless you mean a sequence where a non-io-coherent device
 is attached later after data has already been written. Does that
 sequence need supporting?
>>>
>>> Yes, but also I think there are cases where CPU access can happen before
>>> in Android, but I will focus on later for now.
>>>
 DMA-BUF doesn't have to allocate the backing
 memory until map_dma_buf() time, and that should only happen after all
 the devices have attached so it can know where to put the buffer. So we
 shouldn't expect any CPU access to buffers before all the devices are
 attached and mapped, right?

>>>
>>> Here is an example where CPU access can happen later in Android.
>>>
>>> Camera device records video -> software post processing -> video device
>>> (who does compression of raw data) and writes to a file
>>>
>>> In this example assume the buffer is cached and the devices are not
>>> IO-coherent (quite common).
>>>
>>
>> This is the start of the problem, having cached mappings of memory that
>> is also being accessed non-coherently is going to cause issues one way
>> or another. On top of the speculative cache fills that have to be
>> constantly fought back against with CMOs like below; some coherent
>> interconnects behave badly when you mix coherent and non-coherent access
>> (snoop filters get messed up).
>>
>> The solution is to either always have the addresses marked non-coherent
>> (like device memory, no-map carveouts), or if you really want to use
>> regular system memory allocated at runtime, then all cached mappings of
>> it need to be dropped, even the kernel logical address (area as painful
>> as that would be).
>>
>
> I agree it's broken, hence my desire to remove it :)
>
> The other problem is that uncached buffers are being used for
> performance reason so anything that would involve getting
> rid of the logical address would probably negate any performance
> benefit.
>

 I wouldn't go as far as to remove them just yet.. Liam seems pretty
 adamant that they have valid uses. I'm just not sure performance is one
 of them, maybe in the case of software locks between devices or
 something where there needs to be a lot of back and forth interleaved
 access on small amounts of data?

>>>
>>> I wasn't aware that ARM considered this not supported, I thought it was 
>>> supported but they advised against it because of the potential performance 
>>> impact.
>>>
>>
>> Not sure what you mean by "this" being not supported, do you mean mixed
>> attribute mappings? If so, it will certainly cause problems, and 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-17 Thread Liam Mark
On Thu, 17 Jan 2019, Andrew F. Davis wrote:

> On 1/16/19 4:54 PM, Liam Mark wrote:
> > On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/16/19 9:19 AM, Brian Starkey wrote:
> >>> Hi :-)
> >>>
> >>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
>  On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> > On 1/15/19 11:45 AM, Liam Mark wrote:
> >> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>
> >>> On 1/14/19 11:13 AM, Liam Mark wrote:
>  On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> 
> > Buffers may not be mapped from the CPU so skip cache maintenance 
> > here.
> > Accesses from the CPU to a cached heap should be bracketed with
> > {begin,end}_cpu_access calls so maintenance should not be needed 
> > anyway.
> >
> > Signed-off-by: Andrew F. Davis 
> > ---
> >  drivers/staging/android/ion/ion.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/staging/android/ion/ion.c 
> > b/drivers/staging/android/ion/ion.c
> > index 14e48f6eb734..09cb5a8e2b09 100644
> > --- a/drivers/staging/android/ion/ion.c
> > +++ b/drivers/staging/android/ion/ion.c
> > @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
> > dma_buf_attachment *attachment,
> >  
> > table = a->table;
> >  
> > -   if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > -   direction))
> > +   if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> > + direction, DMA_ATTR_SKIP_CPU_SYNC))
> 
>  Unfortunately I don't think you can do this for a couple reasons.
>  You can't rely on {begin,end}_cpu_access calls to do cache 
>  maintenance.
>  If the calls to {begin,end}_cpu_access were made before the call to 
>  dma_buf_attach then there won't have been a device attached so the 
>  calls 
>  to {begin,end}_cpu_access won't have done any cache maintenance.
> 
> >>>
> >>> That should be okay though, if you have no attachments (or all
> >>> attachments are IO-coherent) then there is no need for cache
> >>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>> is attached later after data has already been written. Does that
> >>> sequence need supporting? 
> >>
> >> Yes, but also I think there are cases where CPU access can happen 
> >> before 
> >> in Android, but I will focus on later for now.
> >>
> >>> DMA-BUF doesn't have to allocate the backing
> >>> memory until map_dma_buf() time, and that should only happen after all
> >>> the devices have attached so it can know where to put the buffer. So 
> >>> we
> >>> shouldn't expect any CPU access to buffers before all the devices are
> >>> attached and mapped, right?
> >>>
> >>
> >> Here is an example where CPU access can happen later in Android.
> >>
> >> Camera device records video -> software post processing -> video 
> >> device 
> >> (who does compression of raw data) and writes to a file
> >>
> >> In this example assume the buffer is cached and the devices are not 
> >> IO-coherent (quite common).
> >>
> >
> > This is the start of the problem, having cached mappings of memory that
> > is also being accessed non-coherently is going to cause issues one way
> > or another. On top of the speculative cache fills that have to be
> > constantly fought back against with CMOs like below; some coherent
> > interconnects behave badly when you mix coherent and non-coherent access
> > (snoop filters get messed up).
> >
> > The solution is to either always have the addresses marked non-coherent
> > (like device memory, no-map carveouts), or if you really want to use
> > regular system memory allocated at runtime, then all cached mappings of
> > it need to be dropped, even the kernel logical address (area as painful
> > as that would be).
> >>>
> >>> Ouch :-( I wasn't aware about these potential interconnect issues. How
> >>> "real" is that? It seems that we aren't really hitting that today on
> >>> real devices.
> >>>
> >>
> >> Sadly there is at least one real device like this now (TI AM654). We
> >> spent some time working with the ARM interconnect spec designers to see
> >> if this was allowed behavior, final conclusion was mixing coherent and
> >> non-coherent accesses is never a good idea.. So we have been working to
> >> try to minimize any cases of mixed attributes [0], if a region is
> >> coherent then everyone in the system needs to treat it as such and
> >> vice-versa, even clever CMO that work on other systems wont save you
> >> here. :(
> >>
> >> [0] 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-17 Thread Liam Mark
On Thu, 17 Jan 2019, Andrew F. Davis wrote:

> On 1/16/19 4:48 PM, Liam Mark wrote:
> > On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/15/19 1:05 PM, Laura Abbott wrote:
> >>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>  On 1/15/19 11:45 AM, Liam Mark wrote:
> > On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/14/19 11:13 AM, Liam Mark wrote:
> >>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>>
>  Buffers may not be mapped from the CPU so skip cache maintenance
>  here.
>  Accesses from the CPU to a cached heap should be bracketed with
>  {begin,end}_cpu_access calls so maintenance should not be needed
>  anyway.
> 
>  Signed-off-by: Andrew F. Davis 
>  ---
>    drivers/staging/android/ion/ion.c | 7 ---
>    1 file changed, 4 insertions(+), 3 deletions(-)
> 
>  diff --git a/drivers/staging/android/ion/ion.c
>  b/drivers/staging/android/ion/ion.c
>  index 14e48f6eb734..09cb5a8e2b09 100644
>  --- a/drivers/staging/android/ion/ion.c
>  +++ b/drivers/staging/android/ion/ion.c
>  @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>  dma_buf_attachment *attachment,
>      table = a->table;
>    -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>  -    direction))
>  +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>  +  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>>
> >>> Unfortunately I don't think you can do this for a couple reasons.
> >>> You can't rely on {begin,end}_cpu_access calls to do cache
> >>> maintenance.
> >>> If the calls to {begin,end}_cpu_access were made before the call to
> >>> dma_buf_attach then there won't have been a device attached so the
> >>> calls
> >>> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>>
> >>
> >> That should be okay though, if you have no attachments (or all
> >> attachments are IO-coherent) then there is no need for cache
> >> maintenance. Unless you mean a sequence where a non-io-coherent device
> >> is attached later after data has already been written. Does that
> >> sequence need supporting?
> >
> > Yes, but also I think there are cases where CPU access can happen before
> > in Android, but I will focus on later for now.
> >
> >> DMA-BUF doesn't have to allocate the backing
> >> memory until map_dma_buf() time, and that should only happen after all
> >> the devices have attached so it can know where to put the buffer. So we
> >> shouldn't expect any CPU access to buffers before all the devices are
> >> attached and mapped, right?
> >>
> >
> > Here is an example where CPU access can happen later in Android.
> >
> > Camera device records video -> software post processing -> video device
> > (who does compression of raw data) and writes to a file
> >
> > In this example assume the buffer is cached and the devices are not
> > IO-coherent (quite common).
> >
> 
>  This is the start of the problem, having cached mappings of memory that
>  is also being accessed non-coherently is going to cause issues one way
>  or another. On top of the speculative cache fills that have to be
>  constantly fought back against with CMOs like below; some coherent
>  interconnects behave badly when you mix coherent and non-coherent access
>  (snoop filters get messed up).
> 
>  The solution is to either always have the addresses marked non-coherent
>  (like device memory, no-map carveouts), or if you really want to use
>  regular system memory allocated at runtime, then all cached mappings of
>  it need to be dropped, even the kernel logical address (area as painful
>  as that would be).
> 
> >>>
> >>> I agree it's broken, hence my desire to remove it :)
> >>>
> >>> The other problem is that uncached buffers are being used for
> >>> performance reason so anything that would involve getting
> >>> rid of the logical address would probably negate any performance
> >>> benefit.
> >>>
> >>
> >> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> >> adamant that they have valid uses. I'm just not sure performance is one
> >> of them, maybe in the case of software locks between devices or
> >> something where there needs to be a lot of back and forth interleaved
> >> access on small amounts of data?
> >>
> > 
> > I wasn't aware that ARM considered this not supported, I thought it was 
> > supported but they advised against it because of the potential performance 
> > impact.
> > 
> 
> Not sure what you mean by "this" being not supported, do you mean mixed
> attribute mappings? If so, it will certainly cause problems, and the
> problems will change from platform 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-17 Thread Andrew F. Davis
On 1/16/19 4:54 PM, Liam Mark wrote:
> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/16/19 9:19 AM, Brian Starkey wrote:
>>> Hi :-)
>>>
>>> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
 On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> On 1/15/19 11:45 AM, Liam Mark wrote:
>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>
>>> On 1/14/19 11:13 AM, Liam Mark wrote:
 On Fri, 11 Jan 2019, Andrew F. Davis wrote:

> Buffers may not be mapped from the CPU so skip cache maintenance here.
> Accesses from the CPU to a cached heap should be bracketed with
> {begin,end}_cpu_access calls so maintenance should not be needed 
> anyway.
>
> Signed-off-by: Andrew F. Davis 
> ---
>  drivers/staging/android/ion/ion.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/android/ion/ion.c 
> b/drivers/staging/android/ion/ion.c
> index 14e48f6eb734..09cb5a8e2b09 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
> dma_buf_attachment *attachment,
>  
>   table = a->table;
>  
> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> - direction))
> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> +   direction, DMA_ATTR_SKIP_CPU_SYNC))

 Unfortunately I don't think you can do this for a couple reasons.
 You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
 If the calls to {begin,end}_cpu_access were made before the call to 
 dma_buf_attach then there won't have been a device attached so the 
 calls 
 to {begin,end}_cpu_access won't have done any cache maintenance.

>>>
>>> That should be okay though, if you have no attachments (or all
>>> attachments are IO-coherent) then there is no need for cache
>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>> is attached later after data has already been written. Does that
>>> sequence need supporting? 
>>
>> Yes, but also I think there are cases where CPU access can happen before 
>> in Android, but I will focus on later for now.
>>
>>> DMA-BUF doesn't have to allocate the backing
>>> memory until map_dma_buf() time, and that should only happen after all
>>> the devices have attached so it can know where to put the buffer. So we
>>> shouldn't expect any CPU access to buffers before all the devices are
>>> attached and mapped, right?
>>>
>>
>> Here is an example where CPU access can happen later in Android.
>>
>> Camera device records video -> software post processing -> video device 
>> (who does compression of raw data) and writes to a file
>>
>> In this example assume the buffer is cached and the devices are not 
>> IO-coherent (quite common).
>>
>
> This is the start of the problem, having cached mappings of memory that
> is also being accessed non-coherently is going to cause issues one way
> or another. On top of the speculative cache fills that have to be
> constantly fought back against with CMOs like below; some coherent
> interconnects behave badly when you mix coherent and non-coherent access
> (snoop filters get messed up).
>
> The solution is to either always have the addresses marked non-coherent
> (like device memory, no-map carveouts), or if you really want to use
> regular system memory allocated at runtime, then all cached mappings of
> it need to be dropped, even the kernel logical address (area as painful
> as that would be).
>>>
>>> Ouch :-( I wasn't aware about these potential interconnect issues. How
>>> "real" is that? It seems that we aren't really hitting that today on
>>> real devices.
>>>
>>
>> Sadly there is at least one real device like this now (TI AM654). We
>> spent some time working with the ARM interconnect spec designers to see
>> if this was allowed behavior, final conclusion was mixing coherent and
>> non-coherent accesses is never a good idea.. So we have been working to
>> try to minimize any cases of mixed attributes [0], if a region is
>> coherent then everyone in the system needs to treat it as such and
>> vice-versa, even clever CMO that work on other systems wont save you
>> here. :(
>>
>> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
>>
>>
>
>> ION buffer is allocated.
>>
>> //Camera device records video
>> dma_buf_attach
>> dma_map_attachment (buffer needs to be cleaned)
>
> Why does the buffer need to be cleaned here? I just got through reading
> 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-17 Thread Andrew F. Davis
On 1/16/19 4:48 PM, Liam Mark wrote:
> On Wed, 16 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/15/19 1:05 PM, Laura Abbott wrote:
>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
 On 1/15/19 11:45 AM, Liam Mark wrote:
> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>
>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>
 Buffers may not be mapped from the CPU so skip cache maintenance
 here.
 Accesses from the CPU to a cached heap should be bracketed with
 {begin,end}_cpu_access calls so maintenance should not be needed
 anyway.

 Signed-off-by: Andrew F. Davis 
 ---
   drivers/staging/android/ion/ion.c | 7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/drivers/staging/android/ion/ion.c
 b/drivers/staging/android/ion/ion.c
 index 14e48f6eb734..09cb5a8e2b09 100644
 --- a/drivers/staging/android/ion/ion.c
 +++ b/drivers/staging/android/ion/ion.c
 @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
 dma_buf_attachment *attachment,
     table = a->table;
   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
 -    direction))
 +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
 +  direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>
>>> Unfortunately I don't think you can do this for a couple reasons.
>>> You can't rely on {begin,end}_cpu_access calls to do cache
>>> maintenance.
>>> If the calls to {begin,end}_cpu_access were made before the call to
>>> dma_buf_attach then there won't have been a device attached so the
>>> calls
>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>
>>
>> That should be okay though, if you have no attachments (or all
>> attachments are IO-coherent) then there is no need for cache
>> maintenance. Unless you mean a sequence where a non-io-coherent device
>> is attached later after data has already been written. Does that
>> sequence need supporting?
>
> Yes, but also I think there are cases where CPU access can happen before
> in Android, but I will focus on later for now.
>
>> DMA-BUF doesn't have to allocate the backing
>> memory until map_dma_buf() time, and that should only happen after all
>> the devices have attached so it can know where to put the buffer. So we
>> shouldn't expect any CPU access to buffers before all the devices are
>> attached and mapped, right?
>>
>
> Here is an example where CPU access can happen later in Android.
>
> Camera device records video -> software post processing -> video device
> (who does compression of raw data) and writes to a file
>
> In this example assume the buffer is cached and the devices are not
> IO-coherent (quite common).
>

 This is the start of the problem, having cached mappings of memory that
 is also being accessed non-coherently is going to cause issues one way
 or another. On top of the speculative cache fills that have to be
 constantly fought back against with CMOs like below; some coherent
 interconnects behave badly when you mix coherent and non-coherent access
 (snoop filters get messed up).

 The solution is to either always have the addresses marked non-coherent
 (like device memory, no-map carveouts), or if you really want to use
 regular system memory allocated at runtime, then all cached mappings of
 it need to be dropped, even the kernel logical address (area as painful
 as that would be).

>>>
>>> I agree it's broken, hence my desire to remove it :)
>>>
>>> The other problem is that uncached buffers are being used for
>>> performance reason so anything that would involve getting
>>> rid of the logical address would probably negate any performance
>>> benefit.
>>>
>>
>> I wouldn't go as far as to remove them just yet.. Liam seems pretty
>> adamant that they have valid uses. I'm just not sure performance is one
>> of them, maybe in the case of software locks between devices or
>> something where there needs to be a lot of back and forth interleaved
>> access on small amounts of data?
>>
> 
> I wasn't aware that ARM considered this not supported, I thought it was 
> supported but they advised against it because of the potential performance 
> impact.
> 

Not sure what you mean by "this" being not supported, do you mean mixed
attribute mappings? If so, it will certainly cause problems, and the
problems will change from platform to platform, avoid at all costs is my
understanding of ARM's position.

> This is after all supported in the DMA APIs and up until now devices have 
> been successfully commercializing with this configurations, and I think 
> they will continue to 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-17 Thread Christoph Hellwig
On Wed, Jan 16, 2019 at 10:17:23AM -0600, Andrew F. Davis wrote:
> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> adamant that they have valid uses. I'm just not sure performance is one
> of them, maybe in the case of software locks between devices or
> something where there needs to be a lot of back and forth interleaved
> access on small amounts of data?

We shouldn't add unused features to start with.  If mainline users for
them appear we can always add them back once we've reviewed the uses
cases and agree that they are sane and have no better solution.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-16 Thread Liam Mark
On Wed, 16 Jan 2019, Andrew F. Davis wrote:

> On 1/16/19 9:19 AM, Brian Starkey wrote:
> > Hi :-)
> > 
> > On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> >> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> >>> On 1/15/19 11:45 AM, Liam Mark wrote:
>  On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> 
> > On 1/14/19 11:13 AM, Liam Mark wrote:
> >> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >>
> >>> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >>> Accesses from the CPU to a cached heap should be bracketed with
> >>> {begin,end}_cpu_access calls so maintenance should not be needed 
> >>> anyway.
> >>>
> >>> Signed-off-by: Andrew F. Davis 
> >>> ---
> >>>  drivers/staging/android/ion/ion.c | 7 ---
> >>>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/staging/android/ion/ion.c 
> >>> b/drivers/staging/android/ion/ion.c
> >>> index 14e48f6eb734..09cb5a8e2b09 100644
> >>> --- a/drivers/staging/android/ion/ion.c
> >>> +++ b/drivers/staging/android/ion/ion.c
> >>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
> >>> dma_buf_attachment *attachment,
> >>>  
> >>>   table = a->table;
> >>>  
> >>> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >>> - direction))
> >>> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >>> +   direction, DMA_ATTR_SKIP_CPU_SYNC))
> >>
> >> Unfortunately I don't think you can do this for a couple reasons.
> >> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> >> If the calls to {begin,end}_cpu_access were made before the call to 
> >> dma_buf_attach then there won't have been a device attached so the 
> >> calls 
> >> to {begin,end}_cpu_access won't have done any cache maintenance.
> >>
> >
> > That should be okay though, if you have no attachments (or all
> > attachments are IO-coherent) then there is no need for cache
> > maintenance. Unless you mean a sequence where a non-io-coherent device
> > is attached later after data has already been written. Does that
> > sequence need supporting? 
> 
>  Yes, but also I think there are cases where CPU access can happen before 
>  in Android, but I will focus on later for now.
> 
> > DMA-BUF doesn't have to allocate the backing
> > memory until map_dma_buf() time, and that should only happen after all
> > the devices have attached so it can know where to put the buffer. So we
> > shouldn't expect any CPU access to buffers before all the devices are
> > attached and mapped, right?
> >
> 
>  Here is an example where CPU access can happen later in Android.
> 
>  Camera device records video -> software post processing -> video device 
>  (who does compression of raw data) and writes to a file
> 
>  In this example assume the buffer is cached and the devices are not 
>  IO-coherent (quite common).
> 
> >>>
> >>> This is the start of the problem, having cached mappings of memory that
> >>> is also being accessed non-coherently is going to cause issues one way
> >>> or another. On top of the speculative cache fills that have to be
> >>> constantly fought back against with CMOs like below; some coherent
> >>> interconnects behave badly when you mix coherent and non-coherent access
> >>> (snoop filters get messed up).
> >>>
> >>> The solution is to either always have the addresses marked non-coherent
> >>> (like device memory, no-map carveouts), or if you really want to use
> >>> regular system memory allocated at runtime, then all cached mappings of
> >>> it need to be dropped, even the kernel logical address (area as painful
> >>> as that would be).
> > 
> > Ouch :-( I wasn't aware about these potential interconnect issues. How
> > "real" is that? It seems that we aren't really hitting that today on
> > real devices.
> > 
> 
> Sadly there is at least one real device like this now (TI AM654). We
> spent some time working with the ARM interconnect spec designers to see
> if this was allowed behavior, final conclusion was mixing coherent and
> non-coherent accesses is never a good idea.. So we have been working to
> try to minimize any cases of mixed attributes [0], if a region is
> coherent then everyone in the system needs to treat it as such and
> vice-versa, even clever CMO that work on other systems wont save you
> here. :(
> 
> [0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553
> 
> 
> >>>
>  ION buffer is allocated.
> 
>  //Camera device records video
>  dma_buf_attach
>  dma_map_attachment (buffer needs to be cleaned)
> >>>
> >>> Why does the buffer need to be cleaned here? I just got through reading
> >>> the thread linked by Laura in the other reply. I 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-16 Thread Liam Mark
On Wed, 16 Jan 2019, Andrew F. Davis wrote:

> On 1/15/19 1:05 PM, Laura Abbott wrote:
> > On 1/15/19 10:38 AM, Andrew F. Davis wrote:
> >> On 1/15/19 11:45 AM, Liam Mark wrote:
> >>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>>
>  On 1/14/19 11:13 AM, Liam Mark wrote:
> > On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> >
> >> Buffers may not be mapped from the CPU so skip cache maintenance
> >> here.
> >> Accesses from the CPU to a cached heap should be bracketed with
> >> {begin,end}_cpu_access calls so maintenance should not be needed
> >> anyway.
> >>
> >> Signed-off-by: Andrew F. Davis 
> >> ---
> >>   drivers/staging/android/ion/ion.c | 7 ---
> >>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/staging/android/ion/ion.c
> >> b/drivers/staging/android/ion/ion.c
> >> index 14e48f6eb734..09cb5a8e2b09 100644
> >> --- a/drivers/staging/android/ion/ion.c
> >> +++ b/drivers/staging/android/ion/ion.c
> >> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
> >> dma_buf_attachment *attachment,
> >>     table = a->table;
> >>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >> -    direction))
> >> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >> +  direction, DMA_ATTR_SKIP_CPU_SYNC))
> >
> > Unfortunately I don't think you can do this for a couple reasons.
> > You can't rely on {begin,end}_cpu_access calls to do cache
> > maintenance.
> > If the calls to {begin,end}_cpu_access were made before the call to
> > dma_buf_attach then there won't have been a device attached so the
> > calls
> > to {begin,end}_cpu_access won't have done any cache maintenance.
> >
> 
>  That should be okay though, if you have no attachments (or all
>  attachments are IO-coherent) then there is no need for cache
>  maintenance. Unless you mean a sequence where a non-io-coherent device
>  is attached later after data has already been written. Does that
>  sequence need supporting?
> >>>
> >>> Yes, but also I think there are cases where CPU access can happen before
> >>> in Android, but I will focus on later for now.
> >>>
>  DMA-BUF doesn't have to allocate the backing
>  memory until map_dma_buf() time, and that should only happen after all
>  the devices have attached so it can know where to put the buffer. So we
>  shouldn't expect any CPU access to buffers before all the devices are
>  attached and mapped, right?
> 
> >>>
> >>> Here is an example where CPU access can happen later in Android.
> >>>
> >>> Camera device records video -> software post processing -> video device
> >>> (who does compression of raw data) and writes to a file
> >>>
> >>> In this example assume the buffer is cached and the devices are not
> >>> IO-coherent (quite common).
> >>>
> >>
> >> This is the start of the problem, having cached mappings of memory that
> >> is also being accessed non-coherently is going to cause issues one way
> >> or another. On top of the speculative cache fills that have to be
> >> constantly fought back against with CMOs like below; some coherent
> >> interconnects behave badly when you mix coherent and non-coherent access
> >> (snoop filters get messed up).
> >>
> >> The solution is to either always have the addresses marked non-coherent
> >> (like device memory, no-map carveouts), or if you really want to use
> >> regular system memory allocated at runtime, then all cached mappings of
> >> it need to be dropped, even the kernel logical address (area as painful
> >> as that would be).
> >>
> > 
> > I agree it's broken, hence my desire to remove it :)
> > 
> > The other problem is that uncached buffers are being used for
> > performance reason so anything that would involve getting
> > rid of the logical address would probably negate any performance
> > benefit.
> > 
> 
> I wouldn't go as far as to remove them just yet.. Liam seems pretty
> adamant that they have valid uses. I'm just not sure performance is one
> of them, maybe in the case of software locks between devices or
> something where there needs to be a lot of back and forth interleaved
> access on small amounts of data?
> 

I wasn't aware that ARM considered this not supported, I thought it was 
supported but they advised against it because of the potential performance 
impact.

This is after all supported in the DMA APIs and up until now devices have 
been successfully commercializing with this configurations, and I think 
they will continue to commercialize with these configurations for quite a 
while.

It would be really unfortunate if support was removed as I think that 
would drive clients away from using upstream ION.

> >>> ION buffer is allocated.
> >>>
> >>> //Camera device records video
> >>> dma_buf_attach
> >>> dma_map_attachment (buffer needs to 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-16 Thread Andrew F. Davis
On 1/16/19 9:19 AM, Brian Starkey wrote:
> Hi :-)
> 
> On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
>> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
>>> On 1/15/19 11:45 AM, Liam Mark wrote:
 On Tue, 15 Jan 2019, Andrew F. Davis wrote:

> On 1/14/19 11:13 AM, Liam Mark wrote:
>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>
>>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>>> Accesses from the CPU to a cached heap should be bracketed with
>>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>>
>>> Signed-off-by: Andrew F. Davis 
>>> ---
>>>  drivers/staging/android/ion/ion.c | 7 ---
>>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/staging/android/ion/ion.c 
>>> b/drivers/staging/android/ion/ion.c
>>> index 14e48f6eb734..09cb5a8e2b09 100644
>>> --- a/drivers/staging/android/ion/ion.c
>>> +++ b/drivers/staging/android/ion/ion.c
>>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
>>> dma_buf_attachment *attachment,
>>>  
>>> table = a->table;
>>>  
>>> -   if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>>> -   direction))
>>> +   if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>>> + direction, DMA_ATTR_SKIP_CPU_SYNC))
>>
>> Unfortunately I don't think you can do this for a couple reasons.
>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>> If the calls to {begin,end}_cpu_access were made before the call to 
>> dma_buf_attach then there won't have been a device attached so the calls 
>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>
>
> That should be okay though, if you have no attachments (or all
> attachments are IO-coherent) then there is no need for cache
> maintenance. Unless you mean a sequence where a non-io-coherent device
> is attached later after data has already been written. Does that
> sequence need supporting? 

 Yes, but also I think there are cases where CPU access can happen before 
 in Android, but I will focus on later for now.

> DMA-BUF doesn't have to allocate the backing
> memory until map_dma_buf() time, and that should only happen after all
> the devices have attached so it can know where to put the buffer. So we
> shouldn't expect any CPU access to buffers before all the devices are
> attached and mapped, right?
>

 Here is an example where CPU access can happen later in Android.

 Camera device records video -> software post processing -> video device 
 (who does compression of raw data) and writes to a file

 In this example assume the buffer is cached and the devices are not 
 IO-coherent (quite common).

>>>
>>> This is the start of the problem, having cached mappings of memory that
>>> is also being accessed non-coherently is going to cause issues one way
>>> or another. On top of the speculative cache fills that have to be
>>> constantly fought back against with CMOs like below; some coherent
>>> interconnects behave badly when you mix coherent and non-coherent access
>>> (snoop filters get messed up).
>>>
>>> The solution is to either always have the addresses marked non-coherent
>>> (like device memory, no-map carveouts), or if you really want to use
>>> regular system memory allocated at runtime, then all cached mappings of
>>> it need to be dropped, even the kernel logical address (area as painful
>>> as that would be).
> 
> Ouch :-( I wasn't aware about these potential interconnect issues. How
> "real" is that? It seems that we aren't really hitting that today on
> real devices.
> 

Sadly there is at least one real device like this now (TI AM654). We
spent some time working with the ARM interconnect spec designers to see
if this was allowed behavior, final conclusion was mixing coherent and
non-coherent accesses is never a good idea.. So we have been working to
try to minimize any cases of mixed attributes [0], if a region is
coherent then everyone in the system needs to treat it as such and
vice-versa, even clever CMO that work on other systems wont save you
here. :(

[0] https://github.com/ARM-software/arm-trusted-firmware/pull/1553


>>>
 ION buffer is allocated.

 //Camera device records video
 dma_buf_attach
 dma_map_attachment (buffer needs to be cleaned)
>>>
>>> Why does the buffer need to be cleaned here? I just got through reading
>>> the thread linked by Laura in the other reply. I do like +Brian's
>>
>> Actually +Brian this time :)
>>
>>> suggestion of tracking if the buffer has had CPU access since the last
>>> time and only flushing the cache if it has. As unmapped heaps never get
>>> CPU mapped this would never be the case for unmapped 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-16 Thread Andrew F. Davis
On 1/15/19 1:05 PM, Laura Abbott wrote:
> On 1/15/19 10:38 AM, Andrew F. Davis wrote:
>> On 1/15/19 11:45 AM, Liam Mark wrote:
>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>>
 On 1/14/19 11:13 AM, Liam Mark wrote:
> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>
>> Buffers may not be mapped from the CPU so skip cache maintenance
>> here.
>> Accesses from the CPU to a cached heap should be bracketed with
>> {begin,end}_cpu_access calls so maintenance should not be needed
>> anyway.
>>
>> Signed-off-by: Andrew F. Davis 
>> ---
>>   drivers/staging/android/ion/ion.c | 7 ---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/staging/android/ion/ion.c
>> b/drivers/staging/android/ion/ion.c
>> index 14e48f6eb734..09cb5a8e2b09 100644
>> --- a/drivers/staging/android/ion/ion.c
>> +++ b/drivers/staging/android/ion/ion.c
>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct
>> dma_buf_attachment *attachment,
>>     table = a->table;
>>   -    if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>> -    direction))
>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>> +  direction, DMA_ATTR_SKIP_CPU_SYNC))
>
> Unfortunately I don't think you can do this for a couple reasons.
> You can't rely on {begin,end}_cpu_access calls to do cache
> maintenance.
> If the calls to {begin,end}_cpu_access were made before the call to
> dma_buf_attach then there won't have been a device attached so the
> calls
> to {begin,end}_cpu_access won't have done any cache maintenance.
>

 That should be okay though, if you have no attachments (or all
 attachments are IO-coherent) then there is no need for cache
 maintenance. Unless you mean a sequence where a non-io-coherent device
 is attached later after data has already been written. Does that
 sequence need supporting?
>>>
>>> Yes, but also I think there are cases where CPU access can happen before
>>> in Android, but I will focus on later for now.
>>>
 DMA-BUF doesn't have to allocate the backing
 memory until map_dma_buf() time, and that should only happen after all
 the devices have attached so it can know where to put the buffer. So we
 shouldn't expect any CPU access to buffers before all the devices are
 attached and mapped, right?

>>>
>>> Here is an example where CPU access can happen later in Android.
>>>
>>> Camera device records video -> software post processing -> video device
>>> (who does compression of raw data) and writes to a file
>>>
>>> In this example assume the buffer is cached and the devices are not
>>> IO-coherent (quite common).
>>>
>>
>> This is the start of the problem, having cached mappings of memory that
>> is also being accessed non-coherently is going to cause issues one way
>> or another. On top of the speculative cache fills that have to be
>> constantly fought back against with CMOs like below; some coherent
>> interconnects behave badly when you mix coherent and non-coherent access
>> (snoop filters get messed up).
>>
>> The solution is to either always have the addresses marked non-coherent
>> (like device memory, no-map carveouts), or if you really want to use
>> regular system memory allocated at runtime, then all cached mappings of
>> it need to be dropped, even the kernel logical address (area as painful
>> as that would be).
>>
> 
> I agree it's broken, hence my desire to remove it :)
> 
> The other problem is that uncached buffers are being used for
> performance reason so anything that would involve getting
> rid of the logical address would probably negate any performance
> benefit.
> 

I wouldn't go as far as to remove them just yet.. Liam seems pretty
adamant that they have valid uses. I'm just not sure performance is one
of them, maybe in the case of software locks between devices or
something where there needs to be a lot of back and forth interleaved
access on small amounts of data?

>>> ION buffer is allocated.
>>>
>>> //Camera device records video
>>> dma_buf_attach
>>> dma_map_attachment (buffer needs to be cleaned)
>>
>> Why does the buffer need to be cleaned here? I just got through reading
>> the thread linked by Laura in the other reply. I do like +Brian's
>> suggestion of tracking if the buffer has had CPU access since the last
>> time and only flushing the cache if it has. As unmapped heaps never get
>> CPU mapped this would never be the case for unmapped heaps, it solves my
>> problem.
>>
>>> [camera device writes to buffer]
>>> dma_buf_unmap_attachment (buffer needs to be invalidated)
>>
>> It doesn't know there will be any further CPU access, it could get freed
>> after this for all we know, the invalidate can be saved until the CPU
>> requests access again.
>>
>>> dma_buf_detach  (device cannot stay attached because it is being sent

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-16 Thread Brian Starkey
Hi :-)

On Tue, Jan 15, 2019 at 12:40:16PM -0600, Andrew F. Davis wrote:
> On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> > On 1/15/19 11:45 AM, Liam Mark wrote:
> >> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> >>
> >>> On 1/14/19 11:13 AM, Liam Mark wrote:
>  On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> 
> > Buffers may not be mapped from the CPU so skip cache maintenance here.
> > Accesses from the CPU to a cached heap should be bracketed with
> > {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >
> > Signed-off-by: Andrew F. Davis 
> > ---
> >  drivers/staging/android/ion/ion.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/staging/android/ion/ion.c 
> > b/drivers/staging/android/ion/ion.c
> > index 14e48f6eb734..09cb5a8e2b09 100644
> > --- a/drivers/staging/android/ion/ion.c
> > +++ b/drivers/staging/android/ion/ion.c
> > @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
> > dma_buf_attachment *attachment,
> >  
> > table = a->table;
> >  
> > -   if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > -   direction))
> > +   if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> > + direction, DMA_ATTR_SKIP_CPU_SYNC))
> 
>  Unfortunately I don't think you can do this for a couple reasons.
>  You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>  If the calls to {begin,end}_cpu_access were made before the call to 
>  dma_buf_attach then there won't have been a device attached so the calls 
>  to {begin,end}_cpu_access won't have done any cache maintenance.
> 
> >>>
> >>> That should be okay though, if you have no attachments (or all
> >>> attachments are IO-coherent) then there is no need for cache
> >>> maintenance. Unless you mean a sequence where a non-io-coherent device
> >>> is attached later after data has already been written. Does that
> >>> sequence need supporting? 
> >>
> >> Yes, but also I think there are cases where CPU access can happen before 
> >> in Android, but I will focus on later for now.
> >>
> >>> DMA-BUF doesn't have to allocate the backing
> >>> memory until map_dma_buf() time, and that should only happen after all
> >>> the devices have attached so it can know where to put the buffer. So we
> >>> shouldn't expect any CPU access to buffers before all the devices are
> >>> attached and mapped, right?
> >>>
> >>
> >> Here is an example where CPU access can happen later in Android.
> >>
> >> Camera device records video -> software post processing -> video device 
> >> (who does compression of raw data) and writes to a file
> >>
> >> In this example assume the buffer is cached and the devices are not 
> >> IO-coherent (quite common).
> >>
> > 
> > This is the start of the problem, having cached mappings of memory that
> > is also being accessed non-coherently is going to cause issues one way
> > or another. On top of the speculative cache fills that have to be
> > constantly fought back against with CMOs like below; some coherent
> > interconnects behave badly when you mix coherent and non-coherent access
> > (snoop filters get messed up).
> > 
> > The solution is to either always have the addresses marked non-coherent
> > (like device memory, no-map carveouts), or if you really want to use
> > regular system memory allocated at runtime, then all cached mappings of
> > it need to be dropped, even the kernel logical address (area as painful
> > as that would be).

Ouch :-( I wasn't aware about these potential interconnect issues. How
"real" is that? It seems that we aren't really hitting that today on
real devices.

> > 
> >> ION buffer is allocated.
> >>
> >> //Camera device records video
> >> dma_buf_attach
> >> dma_map_attachment (buffer needs to be cleaned)
> > 
> > Why does the buffer need to be cleaned here? I just got through reading
> > the thread linked by Laura in the other reply. I do like +Brian's
> 
> Actually +Brian this time :)
> 
> > suggestion of tracking if the buffer has had CPU access since the last
> > time and only flushing the cache if it has. As unmapped heaps never get
> > CPU mapped this would never be the case for unmapped heaps, it solves my
> > problem.
> > 
> >> [camera device writes to buffer]
> >> dma_buf_unmap_attachment (buffer needs to be invalidated)
> > 
> > It doesn't know there will be any further CPU access, it could get freed
> > after this for all we know, the invalidate can be saved until the CPU
> > requests access again.

We don't have any API to allow the invalidate to happen on CPU access
if all devices already detached. We need a struct device pointer to
give to the DMA API, otherwise on arm64 there'll be no invalidate.

I had a chat with a few people internally after the previous
discussion with Liam. One suggestion 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-15 Thread Andrew F. Davis
On 1/14/19 11:13 AM, Liam Mark wrote:
> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> 
>> Buffers may not be mapped from the CPU so skip cache maintenance here.
>> Accesses from the CPU to a cached heap should be bracketed with
>> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>>
>> Signed-off-by: Andrew F. Davis 
>> ---
>>  drivers/staging/android/ion/ion.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/staging/android/ion/ion.c 
>> b/drivers/staging/android/ion/ion.c
>> index 14e48f6eb734..09cb5a8e2b09 100644
>> --- a/drivers/staging/android/ion/ion.c
>> +++ b/drivers/staging/android/ion/ion.c
>> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
>> dma_buf_attachment *attachment,
>>  
>>  table = a->table;
>>  
>> -if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
>> -direction))
>> +if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
>> +  direction, DMA_ATTR_SKIP_CPU_SYNC))
> 
> Unfortunately I don't think you can do this for a couple reasons.
> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> If the calls to {begin,end}_cpu_access were made before the call to 
> dma_buf_attach then there won't have been a device attached so the calls 
> to {begin,end}_cpu_access won't have done any cache maintenance.
> 

That should be okay though, if you have no attachments (or all
attachments are IO-coherent) then there is no need for cache
maintenance. Unless you mean a sequence where a non-io-coherent device
is attached later after data has already been written. Does that
sequence need supporting? DMA-BUF doesn't have to allocate the backing
memory until map_dma_buf() time, and that should only happen after all
the devices have attached so it can know where to put the buffer. So we
shouldn't expect any CPU access to buffers before all the devices are
attached and mapped, right?

> Also ION no longer provides DMA ready memory, so if you are not doing CPU 
> access then there is no requirement (that I am aware of) for you to call 
> {begin,end}_cpu_access before passing the buffer to the device and if this 
> buffer is cached and your device is not IO-coherent then the cache maintenance
> in ion_map_dma_buf and ion_unmap_dma_buf is required.
> 

If I am not doing any CPU access then why do I need CPU cache
maintenance on the buffer?

Andrew

>>  return ERR_PTR(-ENOMEM);
>>  
>>  return table;
>> @@ -272,7 +272,8 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment 
>> *attachment,
>>struct sg_table *table,
>>enum dma_data_direction direction)
>>  {
>> -dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
>> +dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents,
>> +   direction, DMA_ATTR_SKIP_CPU_SYNC);
>>  }
>>  
>>  static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
>> -- 
>> 2.19.1
>>
>> ___
>> devel mailing list
>> de...@linuxdriverproject.org
>> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
>>
> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-15 Thread Laura Abbott

On 1/15/19 10:38 AM, Andrew F. Davis wrote:

On 1/15/19 11:45 AM, Liam Mark wrote:

On Tue, 15 Jan 2019, Andrew F. Davis wrote:


On 1/14/19 11:13 AM, Liam Mark wrote:

On Fri, 11 Jan 2019, Andrew F. Davis wrote:


Buffers may not be mapped from the CPU so skip cache maintenance here.
Accesses from the CPU to a cached heap should be bracketed with
{begin,end}_cpu_access calls so maintenance should not be needed anyway.

Signed-off-by: Andrew F. Davis 
---
  drivers/staging/android/ion/ion.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c 
b/drivers/staging/android/ion/ion.c
index 14e48f6eb734..09cb5a8e2b09 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
dma_buf_attachment *attachment,
  
  	table = a->table;
  
-	if (!dma_map_sg(attachment->dev, table->sgl, table->nents,

-   direction))
+   if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
+ direction, DMA_ATTR_SKIP_CPU_SYNC))


Unfortunately I don't think you can do this for a couple reasons.
You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
If the calls to {begin,end}_cpu_access were made before the call to
dma_buf_attach then there won't have been a device attached so the calls
to {begin,end}_cpu_access won't have done any cache maintenance.



That should be okay though, if you have no attachments (or all
attachments are IO-coherent) then there is no need for cache
maintenance. Unless you mean a sequence where a non-io-coherent device
is attached later after data has already been written. Does that
sequence need supporting?


Yes, but also I think there are cases where CPU access can happen before
in Android, but I will focus on later for now.


DMA-BUF doesn't have to allocate the backing
memory until map_dma_buf() time, and that should only happen after all
the devices have attached so it can know where to put the buffer. So we
shouldn't expect any CPU access to buffers before all the devices are
attached and mapped, right?



Here is an example where CPU access can happen later in Android.

Camera device records video -> software post processing -> video device
(who does compression of raw data) and writes to a file

In this example assume the buffer is cached and the devices are not
IO-coherent (quite common).



This is the start of the problem, having cached mappings of memory that
is also being accessed non-coherently is going to cause issues one way
or another. On top of the speculative cache fills that have to be
constantly fought back against with CMOs like below; some coherent
interconnects behave badly when you mix coherent and non-coherent access
(snoop filters get messed up).

The solution is to either always have the addresses marked non-coherent
(like device memory, no-map carveouts), or if you really want to use
regular system memory allocated at runtime, then all cached mappings of
it need to be dropped, even the kernel logical address (area as painful
as that would be).



I agree it's broken, hence my desire to remove it :)

The other problem is that uncached buffers are being used for
performance reason so anything that would involve getting
rid of the logical address would probably negate any performance
benefit.


ION buffer is allocated.

//Camera device records video
dma_buf_attach
dma_map_attachment (buffer needs to be cleaned)


Why does the buffer need to be cleaned here? I just got through reading
the thread linked by Laura in the other reply. I do like +Brian's
suggestion of tracking if the buffer has had CPU access since the last
time and only flushing the cache if it has. As unmapped heaps never get
CPU mapped this would never be the case for unmapped heaps, it solves my
problem.


[camera device writes to buffer]
dma_buf_unmap_attachment (buffer needs to be invalidated)


It doesn't know there will be any further CPU access, it could get freed
after this for all we know, the invalidate can be saved until the CPU
requests access again.


dma_buf_detach  (device cannot stay attached because it is being sent down
the pipeline and Camera doesn't know the end of the use case)



This seems like a broken use-case, I understand the desire to keep
everything as modular as possible and separate the steps, but at this
point no one owns this buffers backing memory, not the CPU or any
device. I would go as far as to say DMA-BUF should be free now to
de-allocate the backing storage if it wants, that way it could get ready
for the next attachment, which may change the required backing memory
completely.

All devices should attach before the first mapping, and only let go
after the task is complete, otherwise this buffers data needs copied off
to a different location or the CPU needs to take ownership in-between.



Maybe it's broken but it's the status quo and we spent a good

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-15 Thread Andrew F. Davis
On 1/15/19 12:38 PM, Andrew F. Davis wrote:
> On 1/15/19 11:45 AM, Liam Mark wrote:
>> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
>>
>>> On 1/14/19 11:13 AM, Liam Mark wrote:
 On Fri, 11 Jan 2019, Andrew F. Davis wrote:

> Buffers may not be mapped from the CPU so skip cache maintenance here.
> Accesses from the CPU to a cached heap should be bracketed with
> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
>
> Signed-off-by: Andrew F. Davis 
> ---
>  drivers/staging/android/ion/ion.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/android/ion/ion.c 
> b/drivers/staging/android/ion/ion.c
> index 14e48f6eb734..09cb5a8e2b09 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
> dma_buf_attachment *attachment,
>  
>   table = a->table;
>  
> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> - direction))
> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> +   direction, DMA_ATTR_SKIP_CPU_SYNC))

 Unfortunately I don't think you can do this for a couple reasons.
 You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
 If the calls to {begin,end}_cpu_access were made before the call to 
 dma_buf_attach then there won't have been a device attached so the calls 
 to {begin,end}_cpu_access won't have done any cache maintenance.

>>>
>>> That should be okay though, if you have no attachments (or all
>>> attachments are IO-coherent) then there is no need for cache
>>> maintenance. Unless you mean a sequence where a non-io-coherent device
>>> is attached later after data has already been written. Does that
>>> sequence need supporting? 
>>
>> Yes, but also I think there are cases where CPU access can happen before 
>> in Android, but I will focus on later for now.
>>
>>> DMA-BUF doesn't have to allocate the backing
>>> memory until map_dma_buf() time, and that should only happen after all
>>> the devices have attached so it can know where to put the buffer. So we
>>> shouldn't expect any CPU access to buffers before all the devices are
>>> attached and mapped, right?
>>>
>>
>> Here is an example where CPU access can happen later in Android.
>>
>> Camera device records video -> software post processing -> video device 
>> (who does compression of raw data) and writes to a file
>>
>> In this example assume the buffer is cached and the devices are not 
>> IO-coherent (quite common).
>>
> 
> This is the start of the problem, having cached mappings of memory that
> is also being accessed non-coherently is going to cause issues one way
> or another. On top of the speculative cache fills that have to be
> constantly fought back against with CMOs like below; some coherent
> interconnects behave badly when you mix coherent and non-coherent access
> (snoop filters get messed up).
> 
> The solution is to either always have the addresses marked non-coherent
> (like device memory, no-map carveouts), or if you really want to use
> regular system memory allocated at runtime, then all cached mappings of
> it need to be dropped, even the kernel logical address (area as painful
> as that would be).
> 
>> ION buffer is allocated.
>>
>> //Camera device records video
>> dma_buf_attach
>> dma_map_attachment (buffer needs to be cleaned)
> 
> Why does the buffer need to be cleaned here? I just got through reading
> the thread linked by Laura in the other reply. I do like +Brian's

Actually +Brian this time :)

> suggestion of tracking if the buffer has had CPU access since the last
> time and only flushing the cache if it has. As unmapped heaps never get
> CPU mapped this would never be the case for unmapped heaps, it solves my
> problem.
> 
>> [camera device writes to buffer]
>> dma_buf_unmap_attachment (buffer needs to be invalidated)
> 
> It doesn't know there will be any further CPU access, it could get freed
> after this for all we know, the invalidate can be saved until the CPU
> requests access again.
> 
>> dma_buf_detach  (device cannot stay attached because it is being sent down 
>> the pipeline and Camera doesn't know the end of the use case)
>>
> 
> This seems like a broken use-case, I understand the desire to keep
> everything as modular as possible and separate the steps, but at this
> point no one owns this buffers backing memory, not the CPU or any
> device. I would go as far as to say DMA-BUF should be free now to
> de-allocate the backing storage if it wants, that way it could get ready
> for the next attachment, which may change the required backing memory
> completely.
> 
> All devices should attach before the first mapping, and only let go
> after the task is complete, otherwise this buffers data needs copied off
> to a different 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-15 Thread Andrew F. Davis
On 1/15/19 11:45 AM, Liam Mark wrote:
> On Tue, 15 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/14/19 11:13 AM, Liam Mark wrote:
>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote:
>>>
 Buffers may not be mapped from the CPU so skip cache maintenance here.
 Accesses from the CPU to a cached heap should be bracketed with
 {begin,end}_cpu_access calls so maintenance should not be needed anyway.

 Signed-off-by: Andrew F. Davis 
 ---
  drivers/staging/android/ion/ion.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/drivers/staging/android/ion/ion.c 
 b/drivers/staging/android/ion/ion.c
 index 14e48f6eb734..09cb5a8e2b09 100644
 --- a/drivers/staging/android/ion/ion.c
 +++ b/drivers/staging/android/ion/ion.c
 @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
 dma_buf_attachment *attachment,
  
table = a->table;
  
 -  if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
 -  direction))
 +  if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
 +direction, DMA_ATTR_SKIP_CPU_SYNC))
>>>
>>> Unfortunately I don't think you can do this for a couple reasons.
>>> You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
>>> If the calls to {begin,end}_cpu_access were made before the call to 
>>> dma_buf_attach then there won't have been a device attached so the calls 
>>> to {begin,end}_cpu_access won't have done any cache maintenance.
>>>
>>
>> That should be okay though, if you have no attachments (or all
>> attachments are IO-coherent) then there is no need for cache
>> maintenance. Unless you mean a sequence where a non-io-coherent device
>> is attached later after data has already been written. Does that
>> sequence need supporting? 
> 
> Yes, but also I think there are cases where CPU access can happen before 
> in Android, but I will focus on later for now.
> 
>> DMA-BUF doesn't have to allocate the backing
>> memory until map_dma_buf() time, and that should only happen after all
>> the devices have attached so it can know where to put the buffer. So we
>> shouldn't expect any CPU access to buffers before all the devices are
>> attached and mapped, right?
>>
> 
> Here is an example where CPU access can happen later in Android.
> 
> Camera device records video -> software post processing -> video device 
> (who does compression of raw data) and writes to a file
> 
> In this example assume the buffer is cached and the devices are not 
> IO-coherent (quite common).
> 

This is the start of the problem, having cached mappings of memory that
is also being accessed non-coherently is going to cause issues one way
or another. On top of the speculative cache fills that have to be
constantly fought back against with CMOs like below; some coherent
interconnects behave badly when you mix coherent and non-coherent access
(snoop filters get messed up).

The solution is to either always have the addresses marked non-coherent
(like device memory, no-map carveouts), or if you really want to use
regular system memory allocated at runtime, then all cached mappings of
it need to be dropped, even the kernel logical address (area as painful
as that would be).

> ION buffer is allocated.
> 
> //Camera device records video
> dma_buf_attach
> dma_map_attachment (buffer needs to be cleaned)

Why does the buffer need to be cleaned here? I just got through reading
the thread linked by Laura in the other reply. I do like +Brian's
suggestion of tracking if the buffer has had CPU access since the last
time and only flushing the cache if it has. As unmapped heaps never get
CPU mapped this would never be the case for unmapped heaps, it solves my
problem.

> [camera device writes to buffer]
> dma_buf_unmap_attachment (buffer needs to be invalidated)

It doesn't know there will be any further CPU access, it could get freed
after this for all we know, the invalidate can be saved until the CPU
requests access again.

> dma_buf_detach  (device cannot stay attached because it is being sent down 
> the pipeline and Camera doesn't know the end of the use case)
> 

This seems like a broken use-case, I understand the desire to keep
everything as modular as possible and separate the steps, but at this
point no one owns this buffers backing memory, not the CPU or any
device. I would go as far as to say DMA-BUF should be free now to
de-allocate the backing storage if it wants, that way it could get ready
for the next attachment, which may change the required backing memory
completely.

All devices should attach before the first mapping, and only let go
after the task is complete, otherwise this buffers data needs copied off
to a different location or the CPU needs to take ownership in-between.

> //buffer is send down the pipeline
> 
> // Usersapce software post processing occurs
> mmap buffer

Perhaps the invalidate should happen here in mmap.

> 

Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-15 Thread Liam Mark
On Tue, 15 Jan 2019, Andrew F. Davis wrote:

> On 1/14/19 11:13 AM, Liam Mark wrote:
> > On Fri, 11 Jan 2019, Andrew F. Davis wrote:
> > 
> >> Buffers may not be mapped from the CPU so skip cache maintenance here.
> >> Accesses from the CPU to a cached heap should be bracketed with
> >> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> >>
> >> Signed-off-by: Andrew F. Davis 
> >> ---
> >>  drivers/staging/android/ion/ion.c | 7 ---
> >>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/staging/android/ion/ion.c 
> >> b/drivers/staging/android/ion/ion.c
> >> index 14e48f6eb734..09cb5a8e2b09 100644
> >> --- a/drivers/staging/android/ion/ion.c
> >> +++ b/drivers/staging/android/ion/ion.c
> >> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
> >> dma_buf_attachment *attachment,
> >>  
> >>table = a->table;
> >>  
> >> -  if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> >> -  direction))
> >> +  if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> >> +direction, DMA_ATTR_SKIP_CPU_SYNC))
> > 
> > Unfortunately I don't think you can do this for a couple reasons.
> > You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
> > If the calls to {begin,end}_cpu_access were made before the call to 
> > dma_buf_attach then there won't have been a device attached so the calls 
> > to {begin,end}_cpu_access won't have done any cache maintenance.
> > 
> 
> That should be okay though, if you have no attachments (or all
> attachments are IO-coherent) then there is no need for cache
> maintenance. Unless you mean a sequence where a non-io-coherent device
> is attached later after data has already been written. Does that
> sequence need supporting? 

Yes, but also I think there are cases where CPU access can happen before 
in Android, but I will focus on later for now.

> DMA-BUF doesn't have to allocate the backing
> memory until map_dma_buf() time, and that should only happen after all
> the devices have attached so it can know where to put the buffer. So we
> shouldn't expect any CPU access to buffers before all the devices are
> attached and mapped, right?
> 

Here is an example where CPU access can happen later in Android.

Camera device records video -> software post processing -> video device 
(who does compression of raw data) and writes to a file

In this example assume the buffer is cached and the devices are not 
IO-coherent (quite common).

ION buffer is allocated.

//Camera device records video
dma_buf_attach
dma_map_attachment (buffer needs to be cleaned)
[camera device writes to buffer]
dma_buf_unmap_attachment (buffer needs to be invalidated)
dma_buf_detach  (device cannot stay attached because it is being sent down 
the pipeline and Camera doesn't know the end of the use case)

//buffer is send down the pipeline

// Usersapce software post processing occurs
mmap buffer
DMA_BUF_IOCTL_SYNC IOCT with flags DMA_BUF_SYNC_START // No CMO since no 
devices attached to buffer
[CPU reads/writes to the buffer]
DMA_BUF_IOCTL_SYNC IOCTL with flags DMA_BUF_SYNC_END // No CMO since no 
devices attached to buffer
munmap buffer

//buffer is send down the pipeline
// Buffer is send to video device (who does compression of raw data) and 
writes to a file
dma_buf_attach
dma_map_attachment (buffer needs to be cleaned)
[video device writes to buffer]
dma_buf_unmap_attachment 
dma_buf_detach  (device cannot stay attached because it is being sent down 
the pipeline and Video doesn't know the end of the use case)



> > Also ION no longer provides DMA ready memory, so if you are not doing CPU 
> > access then there is no requirement (that I am aware of) for you to call 
> > {begin,end}_cpu_access before passing the buffer to the device and if this 
> > buffer is cached and your device is not IO-coherent then the cache 
> > maintenance
> > in ion_map_dma_buf and ion_unmap_dma_buf is required.
> > 
> 
> If I am not doing any CPU access then why do I need CPU cache
> maintenance on the buffer?
> 

Because ION no longer provides DMA ready memory.
Take the above example.

ION allocates memory from buddy allocator and requests zeroing.
Zeros are written to the cache.

You pass the buffer to the camera device which is not IO-coherent.
The camera devices writes directly to the buffer in DDR.
Since you didn't clean the buffer a dirty cache line (one of the zeros) is 
evicted from the cache, this zero overwrites data the camera device has 
written which corrupts your data.

Liam

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap

2019-01-14 Thread Liam Mark
On Fri, 11 Jan 2019, Andrew F. Davis wrote:

> Buffers may not be mapped from the CPU so skip cache maintenance here.
> Accesses from the CPU to a cached heap should be bracketed with
> {begin,end}_cpu_access calls so maintenance should not be needed anyway.
> 
> Signed-off-by: Andrew F. Davis 
> ---
>  drivers/staging/android/ion/ion.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/android/ion/ion.c 
> b/drivers/staging/android/ion/ion.c
> index 14e48f6eb734..09cb5a8e2b09 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct 
> dma_buf_attachment *attachment,
>  
>   table = a->table;
>  
> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> - direction))
> + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> +   direction, DMA_ATTR_SKIP_CPU_SYNC))

Unfortunately I don't think you can do this for a couple reasons.
You can't rely on {begin,end}_cpu_access calls to do cache maintenance.
If the calls to {begin,end}_cpu_access were made before the call to 
dma_buf_attach then there won't have been a device attached so the calls 
to {begin,end}_cpu_access won't have done any cache maintenance.

Also ION no longer provides DMA ready memory, so if you are not doing CPU 
access then there is no requirement (that I am aware of) for you to call 
{begin,end}_cpu_access before passing the buffer to the device and if this 
buffer is cached and your device is not IO-coherent then the cache maintenance
in ion_map_dma_buf and ion_unmap_dma_buf is required.

>   return ERR_PTR(-ENOMEM);
>  
>   return table;
> @@ -272,7 +272,8 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment 
> *attachment,
> struct sg_table *table,
> enum dma_data_direction direction)
>  {
> - dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
> + dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents,
> +direction, DMA_ATTR_SKIP_CPU_SYNC);
>  }
>  
>  static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> -- 
> 2.19.1
> 
> ___
> devel mailing list
> de...@linuxdriverproject.org
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
> 

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel