Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
On Fri, Aug 14, 2020 at 9:15 AM Ezequiel Garcia wrote: > Thanks for the patch. > > On Fri, 14 Aug 2020 at 03:25, John Stultz wrote: > > > > This adds a heap that allocates non-contiguous buffers that are > > marked as writecombined, so they are not cached by the CPU. > > > > What's the rationale for exposing the memory > attribute as a new heap, instead of just introducing flags? > > I guess this calls for some guidelines on what situations > call for a separate heap, and when it's just a modifier flag. YES! :) A big part of this patch is to start a discussion and feel out what properties of heaps are generic enough to be flags, and what aspects are unique enough to deserve having their own heap implementation. ION used the ION_FLAG_CACHED bit for this and considered it a generic property (though by default all buffers were uncached). This seemed to cause enough friction in reviews that we dropped it and used cachable buffers for the initial DMA BUF heaps. Further, I want to make sure we avoid the custom flag abuse that ION saw, especially with vendor heaps. So I think having each unique behavior being a separate heap is a reasonable stance. That said, we added the (currently unused) heap-flags field to the interface as there may be some attributes or modalities that are truly generic across heaps. So if we want to add an UNCACHED flag instead, I'm open to that.. however I want to make sure it has clear general meaning so that its behavior is consistent across all heaps and architectures (or produces an error if it's not supported). thanks -john ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
On Fri, Aug 14, 2020 at 2:17 AM Daniel Vetter wrote: > > On Fri, Aug 14, 2020 at 06:24:58AM +, John Stultz wrote: > > This adds a heap that allocates non-contiguous buffers that are > > marked as writecombined, so they are not cached by the CPU. > > > > This is useful, as most graphics buffers are usually not touched > > by the CPU or only written into once by the CPU. So when mapping > > the buffer over and over between devices, we can skip the CPU > > syncing, which saves a lot of cache management overhead, greatly > > improving performance. > > > > For folk using ION, there was a ION_FLAG_CACHED flag, which > > signaled if the returned buffer should be CPU cacheable or not. > > With DMA-BUF heaps, we have no such flag, and by default the > > current heaps (system and cma) produce CPU cachable buffers. > > So for folks transitioning from ION to DMA-BUF Heaps, this fills > > in some of that missing functionality. > > > > This does have a few "ugly" bits that were required to get > > the buffer properly flushed out initially which I'd like to > > improve. So feedback would be very welcome! > > > > Many thanks to Liam Mark for his help to get this working. > > > > Cc: Sumit Semwal > > Cc: Andrew F. Davis > > Cc: Benjamin Gaignard > > Cc: Liam Mark > > Cc: Laura Abbott > > Cc: Brian Starkey > > Cc: Hridya Valsaraju > > Cc: Robin Murphy > > Cc: linux-me...@vger.kernel.org > > Cc: dri-devel@lists.freedesktop.org > > Signed-off-by: John Stultz > > --- > > v2: > > * Fix build issue on sh reported-by: kernel test robot > > * Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and > > for_each_sg_page() along with numerous other cleanups suggested > > by Robin Murphy > > Mildly annoying me, but where's the userspace for this? Do we have a > gralloc somewhere that works with open driver stacks and uses this? So this is still early RFC, but I've added support to the HiKey960 gralloc, and have pending patches (following DRM rules) I pushed here: https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 And avoiding the cache syncing overhead improves performance nicely on that board. There is also work in progress to change the codec2 implementation in AOSP (which uses ion directly), to move over to DMA BUF heaps and as it used the !ION_FLAG_CACHED case there this heap would be useful to match ION's functionality. The latest patches for that are here: https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640 (though I'm expecting a deeper rework on those) thanks -john ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
Hi John, Thanks for the patch. On Fri, 14 Aug 2020 at 03:25, John Stultz wrote: > > This adds a heap that allocates non-contiguous buffers that are > marked as writecombined, so they are not cached by the CPU. > What's the rationale for exposing the memory attribute as a new heap, instead of just introducing flags? I guess this calls for some guidelines on what situations call for a separate heap, and when it's just a modifier flag. Thanks! Ezequiel > This is useful, as most graphics buffers are usually not touched > by the CPU or only written into once by the CPU. So when mapping > the buffer over and over between devices, we can skip the CPU > syncing, which saves a lot of cache management overhead, greatly > improving performance. > > For folk using ION, there was a ION_FLAG_CACHED flag, which > signaled if the returned buffer should be CPU cacheable or not. > With DMA-BUF heaps, we have no such flag, and by default the > current heaps (system and cma) produce CPU cachable buffers. > So for folks transitioning from ION to DMA-BUF Heaps, this fills > in some of that missing functionality. > > This does have a few "ugly" bits that were required to get > the buffer properly flushed out initially which I'd like to > improve. So feedback would be very welcome! > > Many thanks to Liam Mark for his help to get this working. > > Cc: Sumit Semwal > Cc: Andrew F. Davis > Cc: Benjamin Gaignard > Cc: Liam Mark > Cc: Laura Abbott > Cc: Brian Starkey > Cc: Hridya Valsaraju > Cc: Robin Murphy > Cc: linux-me...@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > Signed-off-by: John Stultz > --- > v2: > * Fix build issue on sh reported-by: kernel test robot > * Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and > for_each_sg_page() along with numerous other cleanups suggested > by Robin Murphy > --- > drivers/dma-buf/heaps/Kconfig| 10 + > drivers/dma-buf/heaps/Makefile | 1 + > drivers/dma-buf/heaps/system_uncached_heap.c | 371 +++ > 3 files changed, 382 insertions(+) > create mode 100644 drivers/dma-buf/heaps/system_uncached_heap.c > > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > index a5eef06c4226..420b0ed0a512 100644 > --- a/drivers/dma-buf/heaps/Kconfig > +++ b/drivers/dma-buf/heaps/Kconfig > @@ -5,6 +5,16 @@ config DMABUF_HEAPS_SYSTEM > Choose this option to enable the system dmabuf heap. The system heap > is backed by pages from the buddy allocator. If in doubt, say Y. > > +config DMABUF_HEAPS_SYSTEM_UNCACHED > + bool "DMA-BUF Uncached System Heap" > + depends on DMABUF_HEAPS > + help > + Choose this option to enable the uncached system dmabuf heap. This > + heap is backed by pages from the buddy allocator, but pages are > setup > + for write combining. This avoids cache management overhead, and can > + be faster if pages are mostly untouched by the cpu. If in doubt, > + say Y. > + > config DMABUF_HEAPS_CMA > bool "DMA-BUF CMA Heap" > depends on DMABUF_HEAPS && DMA_CMA > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile > index 6e54cdec3da0..085685ec478f 100644 > --- a/drivers/dma-buf/heaps/Makefile > +++ b/drivers/dma-buf/heaps/Makefile > @@ -1,4 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0 > obj-y += heap-helpers.o > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o > +obj-$(CONFIG_DMABUF_HEAPS_SYSTEM_UNCACHED) += system_uncached_heap.o > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o > diff --git a/drivers/dma-buf/heaps/system_uncached_heap.c > b/drivers/dma-buf/heaps/system_uncached_heap.c > new file mode 100644 > index ..3b8e699bcae7 > --- /dev/null > +++ b/drivers/dma-buf/heaps/system_uncached_heap.c > @@ -0,0 +1,371 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Uncached System DMA-Heap exporter > + * > + * Copyright (C) 2020 Linaro Ltd. > + * > + * Based off of Andrew Davis' SRAM heap: > + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/ > + * Andrew F. Davis > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct uncached_heap { > + struct dma_heap *heap; > +}; > + > +struct uncached_heap_buffer { > + struct dma_heap *heap; > + struct list_head attachments; > + struct mutex lock; > + unsigned long len; > + struct sg_table sg_table; > + int vmap_cnt; > + void *vaddr; > +}; > + > +struct dma_heap_attachment { > + struct device *dev; > + struct sg_table *table; > + struct list_head list; > +}; > + > +static struct sg_table *dup_sg_table(struct sg_table *table) > +{ > + struct sg_table *new_table; > + int ret, i; > + struct scatterlist *sg, *new_sg; > + > + new_table =
Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
On Fri, Aug 14, 2020 at 06:24:58AM +, John Stultz wrote: > This adds a heap that allocates non-contiguous buffers that are > marked as writecombined, so they are not cached by the CPU. > > This is useful, as most graphics buffers are usually not touched > by the CPU or only written into once by the CPU. So when mapping > the buffer over and over between devices, we can skip the CPU > syncing, which saves a lot of cache management overhead, greatly > improving performance. > > For folk using ION, there was a ION_FLAG_CACHED flag, which > signaled if the returned buffer should be CPU cacheable or not. > With DMA-BUF heaps, we have no such flag, and by default the > current heaps (system and cma) produce CPU cachable buffers. > So for folks transitioning from ION to DMA-BUF Heaps, this fills > in some of that missing functionality. > > This does have a few "ugly" bits that were required to get > the buffer properly flushed out initially which I'd like to > improve. So feedback would be very welcome! > > Many thanks to Liam Mark for his help to get this working. > > Cc: Sumit Semwal > Cc: Andrew F. Davis > Cc: Benjamin Gaignard > Cc: Liam Mark > Cc: Laura Abbott > Cc: Brian Starkey > Cc: Hridya Valsaraju > Cc: Robin Murphy > Cc: linux-me...@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > Signed-off-by: John Stultz > --- > v2: > * Fix build issue on sh reported-by: kernel test robot > * Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and > for_each_sg_page() along with numerous other cleanups suggested > by Robin Murphy Mildly annoying me, but where's the userspace for this? Do we have a gralloc somewhere that works with open driver stacks and uses this? Without that I'm somewhat afraid we'll engineer ourselves something that looks like it should work, but doesn't really work in reality. -Daniel > --- > drivers/dma-buf/heaps/Kconfig| 10 + > drivers/dma-buf/heaps/Makefile | 1 + > drivers/dma-buf/heaps/system_uncached_heap.c | 371 +++ > 3 files changed, 382 insertions(+) > create mode 100644 drivers/dma-buf/heaps/system_uncached_heap.c > > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > index a5eef06c4226..420b0ed0a512 100644 > --- a/drivers/dma-buf/heaps/Kconfig > +++ b/drivers/dma-buf/heaps/Kconfig > @@ -5,6 +5,16 @@ config DMABUF_HEAPS_SYSTEM > Choose this option to enable the system dmabuf heap. The system heap > is backed by pages from the buddy allocator. If in doubt, say Y. > > +config DMABUF_HEAPS_SYSTEM_UNCACHED > + bool "DMA-BUF Uncached System Heap" > + depends on DMABUF_HEAPS > + help > + Choose this option to enable the uncached system dmabuf heap. This > + heap is backed by pages from the buddy allocator, but pages are setup > + for write combining. This avoids cache management overhead, and can > + be faster if pages are mostly untouched by the cpu. If in doubt, > + say Y. > + > config DMABUF_HEAPS_CMA > bool "DMA-BUF CMA Heap" > depends on DMABUF_HEAPS && DMA_CMA > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile > index 6e54cdec3da0..085685ec478f 100644 > --- a/drivers/dma-buf/heaps/Makefile > +++ b/drivers/dma-buf/heaps/Makefile > @@ -1,4 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0 > obj-y+= heap-helpers.o > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)+= system_heap.o > +obj-$(CONFIG_DMABUF_HEAPS_SYSTEM_UNCACHED) += system_uncached_heap.o > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o > diff --git a/drivers/dma-buf/heaps/system_uncached_heap.c > b/drivers/dma-buf/heaps/system_uncached_heap.c > new file mode 100644 > index ..3b8e699bcae7 > --- /dev/null > +++ b/drivers/dma-buf/heaps/system_uncached_heap.c > @@ -0,0 +1,371 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Uncached System DMA-Heap exporter > + * > + * Copyright (C) 2020 Linaro Ltd. > + * > + * Based off of Andrew Davis' SRAM heap: > + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/ > + * Andrew F. Davis > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct uncached_heap { > + struct dma_heap *heap; > +}; > + > +struct uncached_heap_buffer { > + struct dma_heap *heap; > + struct list_head attachments; > + struct mutex lock; > + unsigned long len; > + struct sg_table sg_table; > + int vmap_cnt; > + void *vaddr; > +}; > + > +struct dma_heap_attachment { > + struct device *dev; > + struct sg_table *table; > + struct list_head list; > +}; > + > +static struct sg_table *dup_sg_table(struct sg_table *table) > +{ > + struct sg_table *new_table; > + int ret, i; > + struct scatterlist *sg, *new_sg; > + > + new_table = kzalloc(sizeof(*new_table), GFP_KERNEL); > +
[RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
This adds a heap that allocates non-contiguous buffers that are marked as writecombined, so they are not cached by the CPU. This is useful, as most graphics buffers are usually not touched by the CPU or only written into once by the CPU. So when mapping the buffer over and over between devices, we can skip the CPU syncing, which saves a lot of cache management overhead, greatly improving performance. For folk using ION, there was a ION_FLAG_CACHED flag, which signaled if the returned buffer should be CPU cacheable or not. With DMA-BUF heaps, we have no such flag, and by default the current heaps (system and cma) produce CPU cachable buffers. So for folks transitioning from ION to DMA-BUF Heaps, this fills in some of that missing functionality. This does have a few "ugly" bits that were required to get the buffer properly flushed out initially which I'd like to improve. So feedback would be very welcome! Many thanks to Liam Mark for his help to get this working. Cc: Sumit Semwal Cc: Andrew F. Davis Cc: Benjamin Gaignard Cc: Liam Mark Cc: Laura Abbott Cc: Brian Starkey Cc: Hridya Valsaraju Cc: Robin Murphy Cc: linux-me...@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: John Stultz --- v2: * Fix build issue on sh reported-by: kernel test robot * Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and for_each_sg_page() along with numerous other cleanups suggested by Robin Murphy --- drivers/dma-buf/heaps/Kconfig| 10 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/system_uncached_heap.c | 371 +++ 3 files changed, 382 insertions(+) create mode 100644 drivers/dma-buf/heaps/system_uncached_heap.c diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig index a5eef06c4226..420b0ed0a512 100644 --- a/drivers/dma-buf/heaps/Kconfig +++ b/drivers/dma-buf/heaps/Kconfig @@ -5,6 +5,16 @@ config DMABUF_HEAPS_SYSTEM Choose this option to enable the system dmabuf heap. The system heap is backed by pages from the buddy allocator. If in doubt, say Y. +config DMABUF_HEAPS_SYSTEM_UNCACHED + bool "DMA-BUF Uncached System Heap" + depends on DMABUF_HEAPS + help + Choose this option to enable the uncached system dmabuf heap. This + heap is backed by pages from the buddy allocator, but pages are setup + for write combining. This avoids cache management overhead, and can + be faster if pages are mostly untouched by the cpu. If in doubt, + say Y. + config DMABUF_HEAPS_CMA bool "DMA-BUF CMA Heap" depends on DMABUF_HEAPS && DMA_CMA diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile index 6e54cdec3da0..085685ec478f 100644 --- a/drivers/dma-buf/heaps/Makefile +++ b/drivers/dma-buf/heaps/Makefile @@ -1,4 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 obj-y += heap-helpers.o obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o +obj-$(CONFIG_DMABUF_HEAPS_SYSTEM_UNCACHED) += system_uncached_heap.o obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o diff --git a/drivers/dma-buf/heaps/system_uncached_heap.c b/drivers/dma-buf/heaps/system_uncached_heap.c new file mode 100644 index ..3b8e699bcae7 --- /dev/null +++ b/drivers/dma-buf/heaps/system_uncached_heap.c @@ -0,0 +1,371 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Uncached System DMA-Heap exporter + * + * Copyright (C) 2020 Linaro Ltd. + * + * Based off of Andrew Davis' SRAM heap: + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/ + * Andrew F. Davis + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct uncached_heap { + struct dma_heap *heap; +}; + +struct uncached_heap_buffer { + struct dma_heap *heap; + struct list_head attachments; + struct mutex lock; + unsigned long len; + struct sg_table sg_table; + int vmap_cnt; + void *vaddr; +}; + +struct dma_heap_attachment { + struct device *dev; + struct sg_table *table; + struct list_head list; +}; + +static struct sg_table *dup_sg_table(struct sg_table *table) +{ + struct sg_table *new_table; + int ret, i; + struct scatterlist *sg, *new_sg; + + new_table = kzalloc(sizeof(*new_table), GFP_KERNEL); + if (!new_table) + return ERR_PTR(-ENOMEM); + + ret = sg_alloc_table(new_table, table->nents, GFP_KERNEL); + if (ret) { + kfree(new_table); + return ERR_PTR(-ENOMEM); + } + + new_sg = new_table->sgl; + for_each_sgtable_sg(table, sg, i) { + sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset); + new_sg = sg_next(new_sg); + } + + return new_table; +} + +static int dma_heap_attach(struct dma_buf *dmabuf, + struct