Re: [PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-30 Thread John Stultz
On Fri, Oct 30, 2020 at 12:51 AM Hillf Danton  wrote:
> On Thu, 29 Oct 2020 21:04:30 -0700 John Stultz wrote:
> >
> > But I'll try to share my thoughts:
> >
> > So the system heap allows for allocation of non-contiguous buffers
> > (currently allocated from page_alloc), which we keep track using
> > sglists.
> > Since the resulting dmabufs are shared between multiple devices, we
> > want to provide a *specific type of memory* (in this case
> > non-contiguous system memory), rather than what the underlying
> > dma_alloc_attr() allocates for a specific device.
>
> If the memory slice(just a page for simple case) from
> dma_alloc_attr(for device-A) does not work for device-B, nor can
> page_alloc do the job imho.

Right. That's why userland chooses which heap to allocate from, as
it's the only one that knows the path the buffer will take.

thanks
-john
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-30 Thread Hillf Danton
On Thu, 29 Oct 2020 15:28:34 -0700 John Stultz wrote:
> On Thu, Oct 29, 2020 at 12:10 AM Hillf Danton  wrote:
> > On Thu, 29 Oct 2020 00:16:24 + John Stultz wrote:
> > > @@ -194,6 +210,9 @@ static int system_heap_mmap(struct dma_buf *dmabuf, 
> > > struct vm_area_struct *vma)
> > >   struct sg_page_iter piter;
> > >   int ret;
> > >
> > > + if (buffer->uncached)
> > > + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> > > +
> >
> > Wonder why you turn back to dma_mmap_wc() and friends?
> 
> Sorry, can you expand on what you are proposing here instead?  I'm not
> sure I see how dma_alloc/mmap/*_wc() quite fits here.

I just wondered if *_wc() could save you two minutes or three. Can you
shed some light on your concerns about their unfitness?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-30 Thread Hillf Danton
On Thu, 29 Oct 2020 21:04:30 -0700 John Stultz wrote:
> 
> But I'll try to share my thoughts:
> 
> So the system heap allows for allocation of non-contiguous buffers
> (currently allocated from page_alloc), which we keep track using
> sglists.
> Since the resulting dmabufs are shared between multiple devices, we
> want to provide a *specific type of memory* (in this case
> non-contiguous system memory), rather than what the underlying
> dma_alloc_attr() allocates for a specific device.

If the memory slice(just a page for simple case) from
dma_alloc_attr(for device-A) does not work for device-B, nor can
page_alloc do the job imho.
> 
> My sense is dma_mmap_wc() likely ought to be paired with switching to
> using dma_alloc_wc() as well, which calls down to dma_alloc_attr().
> Maybe one could use dma_alloc_attr against the heap device to allocate
> chunks that we track in the sglist. But I'm not sure how that saves us
> much other than possibly swapping dma_mmap_wc() for remap_pfn_range()?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-29 Thread John Stultz
On Thu, Oct 29, 2020 at 7:48 PM Hillf Danton  wrote:
> On Thu, 29 Oct 2020 15:28:34 -0700 John Stultz wrote:
> > On Thu, Oct 29, 2020 at 12:10 AM Hillf Danton  wrote:
> > > On Thu, 29 Oct 2020 00:16:24 + John Stultz wrote:
> > > > @@ -194,6 +210,9 @@ static int system_heap_mmap(struct dma_buf *dmabuf, 
> > > > struct vm_area_struct *vma)
> > > >   struct sg_page_iter piter;
> > > >   int ret;
> > > >
> > > > + if (buffer->uncached)
> > > > + vma->vm_page_prot = 
> > > > pgprot_writecombine(vma->vm_page_prot);
> > > > +
> > >
> > > Wonder why you turn back to dma_mmap_wc() and friends?
> >
> > Sorry, can you expand on what you are proposing here instead?  I'm not
> > sure I see how dma_alloc/mmap/*_wc() quite fits here.
>
> I just wondered if *_wc() could save you two minutes or three. Can you
> shed some light on your concerns about their unfitness?

Sorry, I feel a bit daft here. I'm still not exactly sure what you're
proposing, and your reply of saving minutes doesn't really clarify
things.
So I'm not sure it's a good use of time to try to (most likely,
incorrectly) refute all the possible things you might be suggesting.
:)

But I'll try to share my thoughts:

So the system heap allows for allocation of non-contiguous buffers
(currently allocated from page_alloc), which we keep track using
sglists.
Since the resulting dmabufs are shared between multiple devices, we
want to provide a *specific type of memory* (in this case
non-contiguous system memory), rather than what the underlying
dma_alloc_attr() allocates for a specific device.

My sense is dma_mmap_wc() likely ought to be paired with switching to
using dma_alloc_wc() as well, which calls down to dma_alloc_attr().
Maybe one could use dma_alloc_attr against the heap device to allocate
chunks that we track in the sglist. But I'm not sure how that saves us
much other than possibly swapping dma_mmap_wc() for remap_pfn_range()?

But again, I suspect I've mischaracterized what you're actually
suggesting. So please let me know what you're thinking and I'm happy
to consider it.

thanks
-john
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-29 Thread John Stultz
On Thu, Oct 29, 2020 at 12:10 AM Hillf Danton  wrote:
> On Thu, 29 Oct 2020 00:16:24 + John Stultz wrote:
> > @@ -194,6 +210,9 @@ static int system_heap_mmap(struct dma_buf *dmabuf, 
> > struct vm_area_struct *vma)
> >   struct sg_page_iter piter;
> >   int ret;
> >
> > + if (buffer->uncached)
> > + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> > +
>
> Wonder why you turn back to dma_mmap_wc() and friends?

Sorry, can you expand on what you are proposing here instead?  I'm not
sure I see how dma_alloc/mmap/*_wc() quite fits here.

thanks
-john
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-29 Thread Hillf Danton
On Thu, 29 Oct 2020 00:16:24 + John Stultz wrote:
> 
> This adds a heap that allocates non-contiguous buffers that are
> marked as writecombined, so they are not cached by the CPU.
> 
> This is useful, as most graphics buffers are usually not touched
> by the CPU or only written into once by the CPU. So when mapping
> the buffer over and over between devices, we can skip the CPU
> syncing, which saves a lot of cache management overhead, greatly
> improving performance.
> 
> For folk using ION, there was a ION_FLAG_CACHED flag, which
> signaled if the returned buffer should be CPU cacheable or not.
> With DMA-BUF heaps, we do not yet have such a flag, and by default
> the current heaps (system and cma) produce CPU cachable buffers.
> So for folks transitioning from ION to DMA-BUF Heaps, this fills
> in some of that missing functionality.
> 
> There has been a suggestion to make this functionality a flag
> (DMAHEAP_FLAG_UNCACHED?) on the system heap, similar to how
> ION used the ION_FLAG_CACHED. But I want to make sure an
> _UNCACHED flag would truely be a generic attribute across all
> heaps. So far that has been unclear, so having it as a separate
> heap seemes better for now. (But I'm open to discussion on this
> point!)
> 
> This is a rework of earlier efforts to add a uncached system heap,
> done utilizing the exisitng system heap, adding just a bit of
> logic to handle the uncached case.
> 
> Feedback would be very welcome!
> 
> Many thanks to Liam Mark for his help to get this working.
> 
> Pending opensource users of this code include:
> * AOSP HiKey960 gralloc:
>   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
>   - Visibly improves performance over the system heap
> * AOSP Codec2 (possibly, needs more review):
>   - 
> https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> 
> Cc: Sumit Semwal 
> Cc: Liam Mark 
> Cc: Laura Abbott 
> Cc: Brian Starkey 
> Cc: Hridya Valsaraju 
> Cc: Suren Baghdasaryan 
> Cc: Sandeep Patil 
> Cc: Daniel Mentz 
> Cc: Chris Goldsworthy 
> Cc: Ørjan Eide 
> Cc: Robin Murphy 
> Cc: Ezequiel Garcia 
> Cc: Simon Ser 
> Cc: James Jones 
> Cc: linux-me...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: John Stultz 
> ---
> v4:
> * Make sys_uncached_heap static, as
> Reported-by: kernel test robot 
> * Fix wrong return value, caught by smatch
> Reported-by: kernel test robot 
> Reported-by: Dan Carpenter 
> * Ensure we call flush/invalidate_kernel_vmap_range() in the
>   uncached cases to try to address feedback about VIVT caches
>   from Christoph
> * Reorder a few lines as suggested by BrianS
> * Avoid holding the initial mapping for the lifetime of the buffer
>   as suggested by BrianS
> * Fix a unlikely race between allocate and updating the dma_mask
>   that BrianS noticed.
> ---
>  drivers/dma-buf/heaps/system_heap.c | 111 
>  1 file changed, 95 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/dma-buf/heaps/system_heap.c 
> b/drivers/dma-buf/heaps/system_heap.c
> index ef4b2c1032df..a328c76249d2 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -22,6 +22,7 @@
>  #include 
>  
>  static struct dma_heap *sys_heap;
> +static struct dma_heap *sys_uncached_heap;
>  
>  struct system_heap_buffer {
>   struct dma_heap *heap;
> @@ -31,6 +32,8 @@ struct system_heap_buffer {
>   struct sg_table sg_table;
>   int vmap_cnt;
>   void *vaddr;
> +
> + bool uncached;
>  };
>  
>  struct dma_heap_attachment {
> @@ -38,6 +41,8 @@ struct dma_heap_attachment {
>   struct sg_table *table;
>   struct list_head list;
>   bool mapped;
> +
> + bool uncached;
>  };
>  
>  #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
> @@ -94,7 +99,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
>   a->dev = attachment->dev;
>   INIT_LIST_HEAD(>list);
>   a->mapped = false;
> -
> + a->uncached = buffer->uncached;
>   attachment->priv = a;
>  
>   mutex_lock(>lock);
> @@ -124,9 +129,13 @@ static struct sg_table *system_heap_map_dma_buf(struct 
> dma_buf_attachment *attac
>  {
>   struct dma_heap_attachment *a = attachment->priv;
>   struct sg_table *table = a->table;
> + int attr = 0;
>   int ret;
>  
> - ret = dma_map_sgtable(attachment->dev, table, direction, 0);
> + if (a->uncached)
> + attr = DMA_ATTR_SKIP_CPU_SYNC;
> +
> + ret = dma_map_sgtable(attachment->dev, table, direction, attr);
>   if (ret)
>   return ERR_PTR(ret);
>  
> @@ -139,9 +148,12 @@ static void system_heap_unmap_dma_buf(struct 
> dma_buf_attachment *attachment,
> enum dma_data_direction direction)
>  {
>   struct dma_heap_attachment *a = attachment->priv;
> + int attr = 0;
>  
> + if (a->uncached)
> + attr = 

[PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-28 Thread John Stultz
This adds a heap that allocates non-contiguous buffers that are
marked as writecombined, so they are not cached by the CPU.

This is useful, as most graphics buffers are usually not touched
by the CPU or only written into once by the CPU. So when mapping
the buffer over and over between devices, we can skip the CPU
syncing, which saves a lot of cache management overhead, greatly
improving performance.

For folk using ION, there was a ION_FLAG_CACHED flag, which
signaled if the returned buffer should be CPU cacheable or not.
With DMA-BUF heaps, we do not yet have such a flag, and by default
the current heaps (system and cma) produce CPU cachable buffers.
So for folks transitioning from ION to DMA-BUF Heaps, this fills
in some of that missing functionality.

There has been a suggestion to make this functionality a flag
(DMAHEAP_FLAG_UNCACHED?) on the system heap, similar to how
ION used the ION_FLAG_CACHED. But I want to make sure an
_UNCACHED flag would truely be a generic attribute across all
heaps. So far that has been unclear, so having it as a separate
heap seemes better for now. (But I'm open to discussion on this
point!)

This is a rework of earlier efforts to add a uncached system heap,
done utilizing the exisitng system heap, adding just a bit of
logic to handle the uncached case.

Feedback would be very welcome!

Many thanks to Liam Mark for his help to get this working.

Pending opensource users of this code include:
* AOSP HiKey960 gralloc:
  - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
  - Visibly improves performance over the system heap
* AOSP Codec2 (possibly, needs more review):
  - 
https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v4:
* Make sys_uncached_heap static, as
Reported-by: kernel test robot 
* Fix wrong return value, caught by smatch
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
* Ensure we call flush/invalidate_kernel_vmap_range() in the
  uncached cases to try to address feedback about VIVT caches
  from Christoph
* Reorder a few lines as suggested by BrianS
* Avoid holding the initial mapping for the lifetime of the buffer
  as suggested by BrianS
* Fix a unlikely race between allocate and updating the dma_mask
  that BrianS noticed.
---
 drivers/dma-buf/heaps/system_heap.c | 111 
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index ef4b2c1032df..a328c76249d2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -22,6 +22,7 @@
 #include 
 
 static struct dma_heap *sys_heap;
+static struct dma_heap *sys_uncached_heap;
 
 struct system_heap_buffer {
struct dma_heap *heap;
@@ -31,6 +32,8 @@ struct system_heap_buffer {
struct sg_table sg_table;
int vmap_cnt;
void *vaddr;
+
+   bool uncached;
 };
 
 struct dma_heap_attachment {
@@ -38,6 +41,8 @@ struct dma_heap_attachment {
struct sg_table *table;
struct list_head list;
bool mapped;
+
+   bool uncached;
 };
 
 #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
@@ -94,7 +99,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
a->dev = attachment->dev;
INIT_LIST_HEAD(>list);
a->mapped = false;
-
+   a->uncached = buffer->uncached;
attachment->priv = a;
 
mutex_lock(>lock);
@@ -124,9 +129,13 @@ static struct sg_table *system_heap_map_dma_buf(struct 
dma_buf_attachment *attac
 {
struct dma_heap_attachment *a = attachment->priv;
struct sg_table *table = a->table;
+   int attr = 0;
int ret;
 
-   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   if (a->uncached)
+   attr = DMA_ATTR_SKIP_CPU_SYNC;
+
+   ret = dma_map_sgtable(attachment->dev, table, direction, attr);
if (ret)
return ERR_PTR(ret);
 
@@ -139,9 +148,12 @@ static void system_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
  enum dma_data_direction direction)
 {
struct dma_heap_attachment *a = attachment->priv;
+   int attr = 0;
 
+   if (a->uncached)
+   attr = DMA_ATTR_SKIP_CPU_SYNC;
a->mapped = false;
-   dma_unmap_sgtable(attachment->dev, table, direction, 0);
+   dma_unmap_sgtable(attachment->dev, table, direction, attr);
 }
 
 static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -155,10 +167,12 @@ 

[PATCH v4 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-16 Thread John Stultz
This adds a heap that allocates non-contiguous buffers that are
marked as writecombined, so they are not cached by the CPU.

This is useful, as most graphics buffers are usually not touched
by the CPU or only written into once by the CPU. So when mapping
the buffer over and over between devices, we can skip the CPU
syncing, which saves a lot of cache management overhead, greatly
improving performance.

For folk using ION, there was a ION_FLAG_CACHED flag, which
signaled if the returned buffer should be CPU cacheable or not.
With DMA-BUF heaps, we do not yet have such a flag, and by default
the current heaps (system and cma) produce CPU cachable buffers.
So for folks transitioning from ION to DMA-BUF Heaps, this fills
in some of that missing functionality.

There has been a suggestion to make this functionality a flag
(DMAHEAP_FLAG_UNCACHED?) on the system heap, similar to how
ION used the ION_FLAG_CACHED. But I want to make sure an
_UNCACHED flag would truely be a generic attribute across all
heaps. So far that has been unclear, so having it as a separate
heap seemes better for now. (But I'm open to discussion on this
point!)

This is a rework of earlier efforts to add a uncached system heap,
done utilizing the exisitng system heap, adding just a bit of
logic to handle the uncached case.

Feedback would be very welcome!

Many thanks to Liam Mark for his help to get this working.

Pending opensource users of this code include:
* AOSP HiKey960 gralloc:
  - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
  - Visibly improves performance over the system heap
* AOSP Codec2 (possibly, needs more review):
  - 
https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325

Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v4:
* Make sys_uncached_heap static, as
Reported-by: kernel test robot 
* Fix wrong return value, caught by smatch
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
* Ensure we call flush/invalidate_kernel_vmap_range() in the
  uncached cases to try to address feedback about VIVT caches
  from Christoph
* Reorder a few lines as suggested by BrianS
* Avoid holding the initial mapping for the lifetime of the buffer
  as suggested by BrianS
* Fix a unlikely race between allocate and updating the dma_mask
  that BrianS noticed.
---
 drivers/dma-buf/heaps/system_heap.c | 111 
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index ef4b2c1032df..a328c76249d2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -22,6 +22,7 @@
 #include 
 
 static struct dma_heap *sys_heap;
+static struct dma_heap *sys_uncached_heap;
 
 struct system_heap_buffer {
struct dma_heap *heap;
@@ -31,6 +32,8 @@ struct system_heap_buffer {
struct sg_table sg_table;
int vmap_cnt;
void *vaddr;
+
+   bool uncached;
 };
 
 struct dma_heap_attachment {
@@ -38,6 +41,8 @@ struct dma_heap_attachment {
struct sg_table *table;
struct list_head list;
bool mapped;
+
+   bool uncached;
 };
 
 #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
@@ -94,7 +99,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
a->dev = attachment->dev;
INIT_LIST_HEAD(>list);
a->mapped = false;
-
+   a->uncached = buffer->uncached;
attachment->priv = a;
 
mutex_lock(>lock);
@@ -124,9 +129,13 @@ static struct sg_table *system_heap_map_dma_buf(struct 
dma_buf_attachment *attac
 {
struct dma_heap_attachment *a = attachment->priv;
struct sg_table *table = a->table;
+   int attr = 0;
int ret;
 
-   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   if (a->uncached)
+   attr = DMA_ATTR_SKIP_CPU_SYNC;
+
+   ret = dma_map_sgtable(attachment->dev, table, direction, attr);
if (ret)
return ERR_PTR(ret);
 
@@ -139,9 +148,12 @@ static void system_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
  enum dma_data_direction direction)
 {
struct dma_heap_attachment *a = attachment->priv;
+   int attr = 0;
 
+   if (a->uncached)
+   attr = DMA_ATTR_SKIP_CPU_SYNC;
a->mapped = false;
-   dma_unmap_sgtable(attachment->dev, table, direction, 0);
+   dma_unmap_sgtable(attachment->dev, table, direction, attr);
 }
 
 static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -155,10 +167,12 @@