Re: [PATCH] media: videobuf2: sync caches for dmabuf memory

2024-07-11 Thread Tomasz Figa
On Thu, Jun 20, 2024 at 3:52 PM Hans Verkuil  wrote:
>
> On 19/06/2024 06:19, Tomasz Figa wrote:
> > On Wed, Jun 19, 2024 at 1:24 AM Nicolas Dufresne  
> > wrote:
> >>
> >> Le mardi 18 juin 2024 à 16:47 +0900, Tomasz Figa a écrit :
> >>> Hi TaoJiang,
> >>>
> >>> On Tue, Jun 18, 2024 at 4:30 PM TaoJiang  wrote:
> >>>>
> >>>> From: Ming Qian 
> >>>>
> >>>> When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know
> >>>> whether the dma buffer is coherent or synchronized.
> >>>>
> >>>> The videobuf2-core will skip cache syncs as it think the DMA exporter
> >>>> should take care of cache syncs
> >>>>
> >>>> But in fact it's likely that the client doesn't
> >>>> synchronize the dma buf before qbuf() or after dqbuf(). and it's
> >>>> difficult to find this type of error directly.
> >>>>
> >>>> I think it's helpful that videobuf2-core can call
> >>>> dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the
> >>>> cache syncs.
> >>>>
> >>>> Signed-off-by: Ming Qian 
> >>>> Signed-off-by: TaoJiang 
> >>>> ---
> >>>>  .../media/common/videobuf2/videobuf2-core.c   | 22 +++
> >>>>  1 file changed, 22 insertions(+)
> >>>>
> >>>
> >>> Sorry, that patch is incorrect. I believe you're misunderstanding the
> >>> way DMA-buf buffers should be managed in the userspace. It's the
> >>> userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to
> >>> signal start and end of CPU access to the kernel and imply necessary
> >>> cache synchronization.
> >>>
> >>> [1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls
> >>>
> >>> So, really sorry, but it's a NAK.
> >>
> >>
> >>
> >> This patch *could* make sense if it was inside UVC Driver as an example, 
> >> as this
> >> driver can import dmabuf, to CPU memcpy, and does omits the required sync 
> >> calls
> >> (unless that got added recently, I can easily have missed it).
> >
> > Yeah, currently V4L2 drivers don't call the in-kernel
> > dma_buf_{begin,end}_cpu_access() when they need to access the buffers
> > from the CPU, while my quick grep [1] reveals that we have 68 files
> > retrieving plane vaddr by calling vb2_plane_vaddr() (not necessarily a
> > 100% guarantee of CPU access being done, but rather likely so).
> >
> > I also repeated the same thing with VB2_DMABUF [2] and tried to
> > attribute both lists to specific drivers (by retaining the path until
> > the first - or _ [3]; which seemed to be relatively accurate), leading
> > to the following drivers that claim support for DMABUF while also
> > retrieving plane vaddr (without proper synchronization - no drivers
> > currently call any begin/end CPU access):
> >
> >  i2c/video
> >  pci/bt8xx/bttv
> >  pci/cobalt/cobalt
> >  pci/cx18/cx18
> >  pci/tw5864/tw5864
> >  pci/tw686x/tw686x
> >  platform/allegro
> >  platform/amphion/vpu
> >  platform/chips
> >  platform/intel/pxa
> >  platform/marvell/mcam
> >  platform/mediatek/jpeg/mtk
> >  platform/mediatek/vcodec/decoder/mtk
> >  platform/mediatek/vcodec/encoder/mtk
> >  platform/nuvoton/npcm
> >  platform/nvidia/tegra
> >  platform/nxp/imx
> >  platform/renesas/rcar
> >  platform/renesas/vsp1/vsp1
> >  platform/rockchip/rkisp1/rkisp1
> >  platform/samsung/exynos4
> >  platform/samsung/s5p
> >  platform/st/sti/delta/delta
> >  platform/st/sti/hva/hva
> >  platform/verisilicon/hantro
> >  usb/au0828/au0828
> >  usb/cx231xx/cx231xx
> >  usb/dvb
> >  usb/em28xx/em28xx
> >  usb/gspca/gspca.c
> >  usb/hackrf/hackrf.c
> >  usb/stk1160/stk1160
> >  usb/uvc/uvc
> >
> > which means we potentially have ~30 drivers which likely don't handle
> > imported DMABUFs correctly (there is still a chance that DMABUF is
> > advertised for one queue, while vaddr is used for another).
> >
> > I think we have two options:
> > 1) add vb2_{begin/end}_cpu_access() helpers, carefully audit each
> > driver and add calls to those
>
> I actually started on that 9 (!) years ago:
>
> https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vb2-cpu-access
>
&

Re: [PATCH] media: videobuf2: sync caches for dmabuf memory

2024-06-18 Thread Tomasz Figa
On Wed, Jun 19, 2024 at 1:24 AM Nicolas Dufresne  wrote:
>
> Le mardi 18 juin 2024 à 16:47 +0900, Tomasz Figa a écrit :
> > Hi TaoJiang,
> >
> > On Tue, Jun 18, 2024 at 4:30 PM TaoJiang  wrote:
> > >
> > > From: Ming Qian 
> > >
> > > When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know
> > > whether the dma buffer is coherent or synchronized.
> > >
> > > The videobuf2-core will skip cache syncs as it think the DMA exporter
> > > should take care of cache syncs
> > >
> > > But in fact it's likely that the client doesn't
> > > synchronize the dma buf before qbuf() or after dqbuf(). and it's
> > > difficult to find this type of error directly.
> > >
> > > I think it's helpful that videobuf2-core can call
> > > dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the
> > > cache syncs.
> > >
> > > Signed-off-by: Ming Qian 
> > > Signed-off-by: TaoJiang 
> > > ---
> > >  .../media/common/videobuf2/videobuf2-core.c   | 22 +++
> > >  1 file changed, 22 insertions(+)
> > >
> >
> > Sorry, that patch is incorrect. I believe you're misunderstanding the
> > way DMA-buf buffers should be managed in the userspace. It's the
> > userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to
> > signal start and end of CPU access to the kernel and imply necessary
> > cache synchronization.
> >
> > [1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls
> >
> > So, really sorry, but it's a NAK.
>
>
>
> This patch *could* make sense if it was inside UVC Driver as an example, as 
> this
> driver can import dmabuf, to CPU memcpy, and does omits the required sync 
> calls
> (unless that got added recently, I can easily have missed it).

Yeah, currently V4L2 drivers don't call the in-kernel
dma_buf_{begin,end}_cpu_access() when they need to access the buffers
from the CPU, while my quick grep [1] reveals that we have 68 files
retrieving plane vaddr by calling vb2_plane_vaddr() (not necessarily a
100% guarantee of CPU access being done, but rather likely so).

I also repeated the same thing with VB2_DMABUF [2] and tried to
attribute both lists to specific drivers (by retaining the path until
the first - or _ [3]; which seemed to be relatively accurate), leading
to the following drivers that claim support for DMABUF while also
retrieving plane vaddr (without proper synchronization - no drivers
currently call any begin/end CPU access):

 i2c/video
 pci/bt8xx/bttv
 pci/cobalt/cobalt
 pci/cx18/cx18
 pci/tw5864/tw5864
 pci/tw686x/tw686x
 platform/allegro
 platform/amphion/vpu
 platform/chips
 platform/intel/pxa
 platform/marvell/mcam
 platform/mediatek/jpeg/mtk
 platform/mediatek/vcodec/decoder/mtk
 platform/mediatek/vcodec/encoder/mtk
 platform/nuvoton/npcm
 platform/nvidia/tegra
 platform/nxp/imx
 platform/renesas/rcar
 platform/renesas/vsp1/vsp1
 platform/rockchip/rkisp1/rkisp1
 platform/samsung/exynos4
 platform/samsung/s5p
 platform/st/sti/delta/delta
 platform/st/sti/hva/hva
 platform/verisilicon/hantro
 usb/au0828/au0828
 usb/cx231xx/cx231xx
 usb/dvb
 usb/em28xx/em28xx
 usb/gspca/gspca.c
 usb/hackrf/hackrf.c
 usb/stk1160/stk1160
 usb/uvc/uvc

which means we potentially have ~30 drivers which likely don't handle
imported DMABUFs correctly (there is still a chance that DMABUF is
advertised for one queue, while vaddr is used for another).

I think we have two options:
1) add vb2_{begin/end}_cpu_access() helpers, carefully audit each
driver and add calls to those
2) take a heavy gun approach and just call vb2_begin_cpu_access()
whenever vb2_plane_vaddr() is called and then vb2_end_cpu_access()
whenever vb2_buffer_done() is called (if begin was called before).

The latter has the disadvantage of drivers not having control over the
timing of the cache sync, so could end up with less than optimal
performance. Also there could be some more complex cases, where the
driver needs to mix DMA and CPU accesses to the buffer, so the fixed
sequence just wouldn't work for them. (But then they just wouldn't
work today either.)

Hans, Marek, do you have any thoughts? (I'd personally just go with 2
and if any driver in the future needs something else, they could call
begin/end CPU access manually.)

[1] git grep vb2_plane_vaddr | cut -d":" -f 1 | sort | uniq
[2] git grep VB2_DMABUF | cut -d":" -f 1 | sort | uniq
[3] by running [1] and [2] through | cut -d"-" -f 1 | cut -d"_" -f 1 | uniq

Best,
Tomasz

>
> But generally speaking, bracketing all driver with CPU access synchronization
> does not make sense indeed, so I second the rejection.
>
> Nicolas
&

Re: [PATCH] media: videobuf2: sync caches for dmabuf memory

2024-06-18 Thread Tomasz Figa
Hi TaoJiang,

On Tue, Jun 18, 2024 at 4:30 PM TaoJiang  wrote:
>
> From: Ming Qian 
>
> When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know
> whether the dma buffer is coherent or synchronized.
>
> The videobuf2-core will skip cache syncs as it think the DMA exporter
> should take care of cache syncs
>
> But in fact it's likely that the client doesn't
> synchronize the dma buf before qbuf() or after dqbuf(). and it's
> difficult to find this type of error directly.
>
> I think it's helpful that videobuf2-core can call
> dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the
> cache syncs.
>
> Signed-off-by: Ming Qian 
> Signed-off-by: TaoJiang 
> ---
>  .../media/common/videobuf2/videobuf2-core.c   | 22 +++
>  1 file changed, 22 insertions(+)
>

Sorry, that patch is incorrect. I believe you're misunderstanding the
way DMA-buf buffers should be managed in the userspace. It's the
userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to
signal start and end of CPU access to the kernel and imply necessary
cache synchronization.

[1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls

So, really sorry, but it's a NAK.

Best regards,
Tomasz

> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
> b/drivers/media/common/videobuf2/videobuf2-core.c
> index 358f1fe42975..4734ff9cf3ce 100644
> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> @@ -340,6 +340,17 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb)
> vb->synced = 1;
> for (plane = 0; plane < vb->num_planes; ++plane)
> call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> +
> +   if (vb->memory != VB2_MEMORY_DMABUF)
> +   return;
> +   for (plane = 0; plane < vb->num_planes; ++plane) {
> +   struct dma_buf *dbuf = vb->planes[plane].dbuf;
> +
> +   if (!dbuf)
> +   continue;
> +
> +   dma_buf_end_cpu_access(dbuf, vb->vb2_queue->dma_dir);
> +   }
>  }
>
>  /*
> @@ -356,6 +367,17 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb)
> vb->synced = 0;
> for (plane = 0; plane < vb->num_planes; ++plane)
> call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> +
> +   if (vb->memory != VB2_MEMORY_DMABUF)
> +   return;
> +   for (plane = 0; plane < vb->num_planes; ++plane) {
> +   struct dma_buf *dbuf = vb->planes[plane].dbuf;
> +
> +   if (!dbuf)
> +   continue;
> +
> +   dma_buf_begin_cpu_access(dbuf, vb->vb2_queue->dma_dir);
> +   }
>  }
>
>  /*
> --
> 2.43.0-rc1
>


Re: [PATCH v6,12/24] media: mediatek: vcodec: add interface to allocate/free secure memory

2024-06-17 Thread Tomasz Figa
On Mon, Jun 17, 2024 at 3:53 PM Yong Wu (吴勇)  wrote:
>
> On Wed, 2024-06-12 at 14:22 +0900, Tomasz Figa wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  On Thu, May 16, 2024 at 08:20:50PM +0800, Yunfei Dong wrote:
> > > Need to call dma heap interface to allocate/free secure memory when
> > playing
> > > secure video.
> > >
> > > Signed-off-by: Yunfei Dong 
> > > ---
> > >  .../media/platform/mediatek/vcodec/Kconfig|   1 +
> > >  .../mediatek/vcodec/common/mtk_vcodec_util.c  | 122
> > +-
> > >  .../mediatek/vcodec/common/mtk_vcodec_util.h  |   3 +
> > >  3 files changed, 123 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/media/platform/mediatek/vcodec/Kconfig
> > b/drivers/media/platform/mediatek/vcodec/Kconfig
> > > index bc8292232530..707865703e61 100644
> > > --- a/drivers/media/platform/mediatek/vcodec/Kconfig
> > > +++ b/drivers/media/platform/mediatek/vcodec/Kconfig
> > > @@ -17,6 +17,7 @@ config VIDEO_MEDIATEK_VCODEC
>
> [snip]
>
> > > -void mtk_vcodec_mem_free(void *priv, struct mtk_vcodec_mem *mem)
> > > +static int mtk_vcodec_mem_alloc_sec(struct mtk_vcodec_dec_ctx
> > *ctx, struct mtk_vcodec_mem *mem)
> > > +{
> > > +struct device *dev = &ctx->dev->plat_dev->dev;
> > > +struct dma_buf *dma_buffer;
> > > +struct dma_heap *vdec_heap;
> > > +struct dma_buf_attachment *attach;
> > > +struct sg_table *sgt;
> > > +unsigned long size = mem->size;
> > > +int ret = 0;
> > > +
> > > +if (!size)
> > > +return -EINVAL;
> > > +
> > > +vdec_heap = dma_heap_find("restricted_mtk_cma");
> > > +if (!vdec_heap) {
> > > +mtk_v4l2_vdec_err(ctx, "dma heap find failed!");
> > > +return -EPERM;
> > > +}
> >
> > How is the heap name determined here? My recollection is that the
> > heap
> > name comes from the heap node in the DT, so it may vary depending on
> > the
> > board.
> >
> > Is the heap name documented anywhere in the DT bindings?
> >
> > Shouldn't we rather query DT for a phandle to the right heap?
> >
>
> Hi Tomasz,
>
> This heap name does not come from dt-binding. It is hard-coded in the
> driver[1]. Because the heap driver is a pure SW driver, there is no
> corresponding HW unit, and there is no way to add a dtsi node.
>
> [1]
> https://lore.kernel.org/linux-mediatek/20240515112308.10171-10-yong...@mediatek.com/

Okay, I see. Thanks for clarifying.

Best regards,
Tomasz


Re: [PATCH v6,12/24] media: mediatek: vcodec: add interface to allocate/free secure memory

2024-06-11 Thread Tomasz Figa
On Thu, May 16, 2024 at 08:20:50PM +0800, Yunfei Dong wrote:
> Need to call dma heap interface to allocate/free secure memory when playing
> secure video.
> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../media/platform/mediatek/vcodec/Kconfig|   1 +
>  .../mediatek/vcodec/common/mtk_vcodec_util.c  | 122 +-
>  .../mediatek/vcodec/common/mtk_vcodec_util.h  |   3 +
>  3 files changed, 123 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/platform/mediatek/vcodec/Kconfig 
> b/drivers/media/platform/mediatek/vcodec/Kconfig
> index bc8292232530..707865703e61 100644
> --- a/drivers/media/platform/mediatek/vcodec/Kconfig
> +++ b/drivers/media/platform/mediatek/vcodec/Kconfig
> @@ -17,6 +17,7 @@ config VIDEO_MEDIATEK_VCODEC
>   depends on VIDEO_MEDIATEK_VPU || !VIDEO_MEDIATEK_VPU
>   depends on MTK_SCP || !MTK_SCP
>   depends on MTK_SMI || (COMPILE_TEST && MTK_SMI=n)
> + depends on DMABUF_HEAPS
>   select VIDEOBUF2_DMA_CONTIG
>   select V4L2_MEM2MEM_DEV
>   select VIDEO_MEDIATEK_VCODEC_VPU if VIDEO_MEDIATEK_VPU
> diff --git a/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c 
> b/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
> index c60e4c193b25..5958dcd7965a 100644
> --- a/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
> +++ b/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
> @@ -5,9 +5,11 @@
>  *Tiffany Lin 
>  */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "../decoder/mtk_vcodec_dec_drv.h"
>  #include "../encoder/mtk_vcodec_enc_drv.h"
> @@ -45,7 +47,7 @@ int mtk_vcodec_write_vdecsys(struct mtk_vcodec_dec_ctx 
> *ctx, unsigned int reg,
>  }
>  EXPORT_SYMBOL(mtk_vcodec_write_vdecsys);
>  
> -int mtk_vcodec_mem_alloc(void *priv, struct mtk_vcodec_mem *mem)
> +static int mtk_vcodec_mem_alloc_nor(void *priv, struct mtk_vcodec_mem *mem)
>  {
>   enum mtk_instance_type inst_type = *((unsigned int *)priv);
>   struct platform_device *plat_dev;
> @@ -75,9 +77,71 @@ int mtk_vcodec_mem_alloc(void *priv, struct mtk_vcodec_mem 
> *mem)
>  
>   return 0;
>  }
> -EXPORT_SYMBOL(mtk_vcodec_mem_alloc);
>  
> -void mtk_vcodec_mem_free(void *priv, struct mtk_vcodec_mem *mem)
> +static int mtk_vcodec_mem_alloc_sec(struct mtk_vcodec_dec_ctx *ctx, struct 
> mtk_vcodec_mem *mem)
> +{
> + struct device *dev = &ctx->dev->plat_dev->dev;
> + struct dma_buf *dma_buffer;
> + struct dma_heap *vdec_heap;
> + struct dma_buf_attachment *attach;
> + struct sg_table *sgt;
> + unsigned long size = mem->size;
> + int ret = 0;
> +
> + if (!size)
> + return -EINVAL;
> +
> + vdec_heap = dma_heap_find("restricted_mtk_cma");
> + if (!vdec_heap) {
> + mtk_v4l2_vdec_err(ctx, "dma heap find failed!");
> + return -EPERM;
> + }

How is the heap name determined here? My recollection is that the heap
name comes from the heap node in the DT, so it may vary depending on the
board.

Is the heap name documented anywhere in the DT bindings?

Shouldn't we rather query DT for a phandle to the right heap?

> +
> + dma_buffer = dma_heap_buffer_alloc(vdec_heap, size, 
> DMA_HEAP_VALID_FD_FLAGS,
> +DMA_HEAP_VALID_HEAP_FLAGS);
> + if (IS_ERR_OR_NULL(dma_buffer)) {
> + mtk_v4l2_vdec_err(ctx, "dma heap alloc size=0x%lx failed!", 
> size);
> + return PTR_ERR(dma_buffer);

This will be incorrect if NULL was returned, because the function will
return 0. Does dma_heap_buffer_alloc() actually return NULL?

> + }
> +
> + attach = dma_buf_attach(dma_buffer, dev);
> + if (IS_ERR_OR_NULL(attach)) {
> + mtk_v4l2_vdec_err(ctx, "dma attach size=0x%lx failed!", size);
> + ret = PTR_ERR(attach);

Ditto.

> + goto err_attach;
> + }
> +
> + sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> + if (IS_ERR_OR_NULL(sgt)) {
> + mtk_v4l2_vdec_err(ctx, "dma map attach size=0x%lx failed!", 
> size);
> + ret = PTR_ERR(sgt);

Ditto.

> + goto err_sgt;
> + }
> +
> + mem->va = dma_buffer;

Isn't this field supposed to point to the kernel mapping of the buffer
itself? If we need to store the dma_buf pointer, we should probably add
a separate field to avoid (potentially serious) bugs.

> + mem->dma_addr = (dma_addr_t)sg_dma_address((sgt)->sgl);

Why is this type cast necessary here?

> +
> + if (!mem->va || !mem->dma_addr) {

I don't think any of these 2 conditions are possible, since we already
checked for successful completion of the functions above. Also 0 is a
valid DMA address, so it shouldn't be considered an error.

> + mtk_v4l2_vdec_err(ctx, "dma buffer size=0x%lx failed!", size);
> + ret = -EPERM;
> + goto err_addr;
> + }
> +
> + mem->attach = attach;
> + mem->sgt = sgt;
> +
> + return 0;
> +err_addr:

Re: [PATCH v6,04/24] v4l: add documentation for restricted memory flag

2024-06-11 Thread Tomasz Figa
On Wed, May 22, 2024 at 02:16:22PM +0300, Laurent Pinchart wrote:
> Hi Jefrey,
> 
> Thank you for the patch.
> 
> On Thu, May 16, 2024 at 08:20:42PM +0800, Yunfei Dong wrote:
> > From: Jeffrey Kardatzke 
> > 
> > Adds documentation for V4L2_MEMORY_FLAG_RESTRICTED.
> > 
> > Signed-off-by: Jeffrey Kardatzke 
> > Signed-off-by: Yunfei Dong 
> > ---
> >  Documentation/userspace-api/media/v4l/buffer.rst | 10 +-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
> > b/Documentation/userspace-api/media/v4l/buffer.rst
> > index 52bbee81c080..807e43bfed2b 100644
> > --- a/Documentation/userspace-api/media/v4l/buffer.rst
> > +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> > @@ -696,7 +696,7 @@ enum v4l2_memory
> >  
> >  .. _memory-flags:
> >  
> > -Memory Consistency Flags
> > +Memory Flags
> >  
> >  
> >  .. raw:: latex
> > @@ -728,6 +728,14 @@ Memory Consistency Flags
> > only if the buffer is used for :ref:`memory mapping ` I/O and the
> > queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
> > ` capability.
> > +* .. _`V4L2-MEMORY-FLAG-RESTRICTED`:
> > +
> > +  - ``V4L2_MEMORY_FLAG_RESTRICTED``
> > +  - 0x0002
> > +  - The queued buffers are expected to be in restricted memory. If 
> > not, an
> > +   error will be returned. This flag can only be used with 
> > ``V4L2_MEMORY_DMABUF``.
> > +   Typically restricted buffers are allocated using a restricted dma-heap. 
> > This flag
> > +   can only be specified if the ``V4L2_BUF_CAP_SUPPORTS_RESTRICTED_MEM`` 
> > is set.
> 
> Why is this flag needed ? Given that the usage model requires the V4L2
> device to be a dma buf importer, why would userspace set the
> V4L2_BUF_CAP_SUPPORTS_RESTRICTED_MEM flag and pass a non-restricted
> buffer to the device ?

Given that the flag is specified at REQBUF / CREATE_BUFS time, it's
actually useful to tell the driver the queue is operating in restricted
(aka secure) mode.

I suppose we could handle that at the time of a first QBUF, but that
would make the driver initialization and validation quite a bit of pain.
So I'd say that the design being proposed here makes things simpler and
more clear, even if it doesn't add any extra functionality.

> 
> The V4L2_BUF_CAP_SUPPORTS_RESTRICTED_MEM flag also needs to be
> documented in the relevant section, I don't think that's done in this
> series.
> 

+1

Best regards,
Tomasz

> >  
> >  .. raw:: latex
> >  
> 
> -- 
> Regards,
> 
> Laurent Pinchart


Re: [PATCH v6,03/24] v4l2: verify restricted dmabufs are used in restricted queue

2024-06-11 Thread Tomasz Figa
On Thu, May 16, 2024 at 08:20:41PM +0800, Yunfei Dong wrote:
> From: Jeffrey Kardatzke 
> 
> Verfies in the dmabuf implementations that if the restricted memory
> flag is set for a queue that the dmabuf submitted to the queue is
> unmappable.
> 
> Signed-off-by: Jeffrey Kardatzke 
> Signed-off-by: Yunfei Dong 
> ---
>  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 8 
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 8 
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index 3d4fd4ef5310..35a3c1c01eae 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -710,6 +710,14 @@ static int vb2_dc_map_dmabuf(void *mem_priv)
>   return -EINVAL;
>   }
>  
> + /* Verify the dmabuf is restricted if we are in restricted mode, this 
> is done
> +  * by validating there is no page entry for the dmabuf.
> +  */

Kernel coding style [1] defines multi-line comments to start with an empty
line.

[1] https://www.kernel.org/doc/html/latest/process/coding-style.html#commenting

> + if (buf->vb->vb2_queue->restricted_mem && 
> !sg_dma_is_restricted(sgt->sgl)) {
> + pr_err("restricted queue requires restricted dma_buf");
> + return -EINVAL;

This would leak the mapping. We need to unmap the attachment here.

> + }
> +
>   /* checking if dmabuf is big enough to store contiguous chunk */
>   contig_size = vb2_dc_get_contiguous_size(sgt);
>   if (contig_size < buf->size) {
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
> b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> index 6975a71d740f..2399a9c074ba 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> @@ -570,6 +570,14 @@ static int vb2_dma_sg_map_dmabuf(void *mem_priv)
>   return -EINVAL;
>   }
>  
> + /* Verify the dmabuf is restricted if we are in restricted mode, this 
> is done
> +  * by validating there is no page entry for the dmabuf.
> +  */

Ditto.

> + if (buf->vb->vb2_queue->restricted_mem && 
> !sg_dma_is_restricted(sgt->sgl)) {
> + pr_err("restricted queue requires restricted dma_buf");
> + return -EINVAL;

Ditto.

Best regards,
Tomasz

> + }
> +
>   buf->dma_sgt = sgt;
>   buf->vaddr = NULL;
>  
> -- 
> 2.25.1
> 


Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-23 Thread Tomasz Figa
On Wed, Aug 23, 2023 at 4:11 PM Hsia-Jun Li  wrote:
>
>
>
> On 8/23/23 12:46, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Hi Hsia-Jun,
> >
> > On Tue, Aug 22, 2023 at 8:14 PM Hsia-Jun Li  wrote:
> >>
> >> Hello
> >>
> >> I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
> >> purpose of that is sharing metadata or just a pure container for cross
> >> drivers.
> >>
> >> We need to exchange some sort of metadata between drivers, likes dynamic
> >> HDR data between video4linux2 and DRM.
> >
> > If the metadata isn't too big, would it be enough to just have the
> > kernel copy_from_user() to a kernel buffer in the ioctl code?
> >
> >> Or the graphics frame buffer is
> >> too complex to be described with plain plane's DMA-buf fd.
> >> An issue between DRM and V4L2 is that DRM could only support 4 planes
> >> while it is 8 for V4L2. It would be pretty hard for DRM to expend its
> >> interface to support that 4 more planes which would lead to revision of
> >> many standard likes Vulkan, EGL.
> >
> > Could you explain how a shmem buffer could be used to support frame
> > buffers with more than 4 planes?
> > If you are asking why we need this:

I'm asking how your proposal to use shmem FD solves the problem for those cases.

> 1. metadata likes dynamic HDR tone data
> 2. DRM also challenges with this problem, let me quote what sima said:
> "another trick that we iirc used for afbc is that sometimes the planes
> have a fixed layout
> like nv12
> and so logically it's multiple planes, but you only need one plane slot
> to describe the buffer
> since I think afbc had the "we need more than 4 planes" issue too"
>
> Unfortunately, there are vendor pixel formats are not fixed layout.
>
> 3. Secure(REE, trusted video piepline) info.
>
> For how to assign such metadata data.
> In case with a drm fb_id, it is simple, we just add a drm plane property
> for it. The V4L2 interface is not flexible, we could only leave into
> CAPTURE request_fd as a control.
> >>
> >> Also, there is no reason to consume a device's memory for the content
> >> that device can't read it, or wasting an entry of IOMMU for such data.
> >
> > That's right, but DMA-buf doesn't really imply any of those. DMA-buf
> > is just a kernel object with some backing memory. It's up to the
> > allocator to decide how the backing memory is allocated and up to the
> > importer on whether it would be mapped into an IOMMU.
> >
> I just want to say it can't be allocated at the same place which was for
> those DMA bufs(graphics or compressed bitstream).
> This also could be answer for your first question, if we place this kind
> of buffer in a plane for DMABUF(importing) in V4L2, V4L2 core would try
> to prepare it, which could map it into IOMMU.
>

V4L2 core will prepare it according to the struct device that is given
to it. For the planes that don't have to go to the hardware a struct
device could be given that doesn't require any DMA mapping. Also you
can check how the uvcvideo driver handles it. It doesn't use the vb2
buffers directly, but always writes to them using CPU (due to how the
UVC protocol is designed).

> >> Usually, such a metadata would be the value should be written to a
> >> hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.
> >>
> >> Still, I have some problems with SHMEM:
> >> 1. I don't want the userspace modify the context of the SHMEM allocated
> >> by the kernel, is there a way to do so?
> >
> > This is generally impossible without doing any of the two:
> > 1) copying the contents to an internal buffer not accessible to the
> > userspace, OR
> > 2) modifying any of the buffer mappings to read-only
> >
> > 2) can actually be more costly than 1) (depending on the architecture,
> > data size, etc.), so we shouldn't just discard the option of a simple
> > copy_from_user() in the ioctl.
> >
> I don't want the userspace access it at all. So that won't be a problem.

In this case, wouldn't it be enough to have a DMA-buf exporter that
doesn't provide the mmap op?

> >> 2. Should I create a helper function for installing the SHMEM file as a fd?
> >
> > We already have the udmabuf device [1] to turn a memfd into a DMA-buf,
> > so maybe that would be enough?
> >
> > [1] 
> > https://elixir.bootlin.com/linux/v6.5-rc7/source/drivers/dma-buf/udmabuf.c
> >
> It is the kernel driver that allocate this buffer. For example, v4l2
> CAPTURE allocate a buffer for metadata when VIDIOC_REQBUFS.
> Or GBM give you a fd which is assigned with a surface.
>
> So we need a kernel interface.

Sorry, I'm confused. If we're talking about buffers allocated by the
specific allocators like V4L2 or GBM, why do we need SHMEM at all?

Best,
Tomasz

> > Best,
> > Tomasz
> >
> >>
> >> --
> >> Hsia-Jun(Randy) Li
>
> --
> Hsia-Jun(Randy) Li


Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-22 Thread Tomasz Figa
Hi Hsia-Jun,

On Tue, Aug 22, 2023 at 8:14 PM Hsia-Jun Li  wrote:
>
> Hello
>
> I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
> purpose of that is sharing metadata or just a pure container for cross
> drivers.
>
> We need to exchange some sort of metadata between drivers, likes dynamic
> HDR data between video4linux2 and DRM.

If the metadata isn't too big, would it be enough to just have the
kernel copy_from_user() to a kernel buffer in the ioctl code?

> Or the graphics frame buffer is
> too complex to be described with plain plane's DMA-buf fd.
> An issue between DRM and V4L2 is that DRM could only support 4 planes
> while it is 8 for V4L2. It would be pretty hard for DRM to expend its
> interface to support that 4 more planes which would lead to revision of
> many standard likes Vulkan, EGL.

Could you explain how a shmem buffer could be used to support frame
buffers with more than 4 planes?

>
> Also, there is no reason to consume a device's memory for the content
> that device can't read it, or wasting an entry of IOMMU for such data.

That's right, but DMA-buf doesn't really imply any of those. DMA-buf
is just a kernel object with some backing memory. It's up to the
allocator to decide how the backing memory is allocated and up to the
importer on whether it would be mapped into an IOMMU.

> Usually, such a metadata would be the value should be written to a
> hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.
>
> Still, I have some problems with SHMEM:
> 1. I don't want thhe userspace modify the context of the SHMEM allocated
> by the kernel, is there a way to do so?

This is generally impossible without doing any of the two:
1) copying the contents to an internal buffer not accessible to the
userspace, OR
2) modifying any of the buffer mappings to read-only

2) can actually be more costly than 1) (depending on the architecture,
data size, etc.), so we shouldn't just discard the option of a simple
copy_from_user() in the ioctl.

> 2. Should I create a helper function for installing the SHMEM file as a fd?

We already have the udmabuf device [1] to turn a memfd into a DMA-buf,
so maybe that would be enough?

[1] https://elixir.bootlin.com/linux/v6.5-rc7/source/drivers/dma-buf/udmabuf.c

Best,
Tomasz

>
> --
> Hsia-Jun(Randy) Li


Re: Try to address the DMA-buf coherency problem

2022-12-11 Thread Tomasz Figa
On Fri, Dec 9, 2022 at 6:32 PM Pekka Paalanen  wrote:
>
> On Fri, 9 Dec 2022 17:26:06 +0900
> Tomasz Figa  wrote:
>
> > On Mon, Dec 5, 2022 at 5:29 PM Christian König  
> > wrote:
> > >
> > > Hi Tomasz,
> > >
> > > Am 05.12.22 um 07:41 schrieb Tomasz Figa:
> > > > [SNIP]
> > > >> In other words explicit ownership transfer is not something we would
> > > >> want as requirement in the framework, cause otherwise we break tons of
> > > >> use cases which require concurrent access to the underlying buffer.
> > > >>
> > > >> When a device driver needs explicit ownership transfer it's perfectly
> > > >> possible to implement this using the dma_fence objects mentioned above.
> > > >> E.g. drivers can already look at who is accessing a buffer currently 
> > > >> and
> > > >> can even grab explicit ownership of it by adding their own dma_fence
> > > >> objects.
> > > >>
> > > >> The only exception is CPU based access, e.g. when something is written
> > > >> with the CPU a cache flush might be necessary and when something is 
> > > >> read
> > > >> with the CPU a cache invalidation might be necessary.
> > > >>
> > > > Okay, that's much clearer now, thanks for clarifying this. So we
> > > > should be covered for the cache maintenance needs originating from CPU
> > > > accesses already, +/- the broken cases which don't call the begin/end
> > > > CPU access routines that I mentioned above.
> > > >
> > > > Similarly, for any ownership transfer between different DMA engines,
> > > > we should be covered either by the userspace explicitly flushing the
> > > > hardware pipeline or attaching a DMA-buf fence to the buffer.
> > > >
> > > > But then, what's left to be solved? :) (Besides the cases of missing
> > > > begin/end CPU access calls.)
> > >
> > > Well there are multiple problems here:
> > >
> > > 1. A lot of userspace applications/frameworks assume that it can
> > > allocate the buffer anywhere and it just works.
> > >
> > > This isn't true at all, we have tons of cases where device can only
> > > access their special memory for certain use cases.
> > > Just look at scanout for displaying on dGPU, neither AMD nor NVidia
> > > supports system memory here. Similar cases exists for audio/video codecs
> > > where intermediate memory is only accessible by certain devices because
> > > of content protection.
> >
> > Ack.
> >
> > Although I think the most common case on mainstream Linux today is
> > properly allocating for device X (e.g. V4L2 video decoder or DRM-based
> > GPU) and hoping that other devices would accept the buffers just fine,
> > which isn't a given on most platforms (although often it's just about
> > pixel format, width/height/stride alignment, tiling, etc. rather than
> > the memory itself). That's why ChromiumOS has minigbm and Android has
> > gralloc that act as the central point of knowledge on buffer
> > allocation.
>
> Hi,
>
> as an anecdote, when I was improving Mutter's cross-DRM-device handling
> (for DisplayLink uses) a few years ago, I implemented several different
> approaches of where to allocate, to try until going for the slowest but
> guaranteed to work case of copying every update into and out of sysram.
>
> It seems there are two different approaches in general for allocation
> and sharing:
>
> 1. Try different things until it works or you run out of options
>
> pro:
> - no need for a single software component to know everything about
>   every device in the system
>
> con:
> - who bothers with fallbacks, if the first try works on my system for
>   my use case I test with? I.e. cost of code in users.
> - trial-and-error can be very laborious (allocate, share with all
>   devices, populate, test)
> - the search space might be huge
>
>
> 2. Have a central component that knows what to do
>
> pro:
> - It might work on the first attempt, so no fallbacks in users.
> - It might be optimal.
>
> con:
> - You need a software component that knows everything about every
>   single combination of hardware in existence, multiplied by use cases.
>
>
> Neither seems good, which brings us back to 
> https://github.com/cubanismo/allocator .
>

I need to refresh my memory on how far we went with that and what the
stoppers were, but it real

Re: Try to address the DMA-buf coherency problem

2022-12-11 Thread Tomasz Figa
On Fri, Dec 9, 2022 at 7:27 PM Christian König  wrote:
>
> Am 09.12.22 um 09:26 schrieb Tomasz Figa:
[snip]
> Yes, same what Daniel said as well. We need to provide some more hints
> which allocator to use from the kernel.
>
> >>>>>> So if a device driver uses cached system memory on an architecture 
> >>>>>> which
> >>>>>> devices which can't access it the right approach is clearly to reject
> >>>>>> the access.
> >>>>> I'd like to accent the fact that "requires cache maintenance" != "can't 
> >>>>> access".
> >>>> Well that depends. As said above the exporter exports the buffer as it
> >>>> was allocated.
> >>>>
> >>>> If that means the the exporter provides a piece of memory which requires
> >>>> CPU cache snooping to access correctly then the best thing we can do is
> >>>> to prevent an importer which can't do this from attaching.
> >>> Could you elaborate more about this case? Does it exist in practice?
> >>> Do I assume correctly that it's about sharing a buffer between one DMA
> >>> engine that is cache-coherent and another that is non-coherent, where
> >>> the first one ends up having its accesses always go through some kind
> >>> of a cache (CPU cache, L2/L3/... cache, etc.)?
> >> Yes, exactly that. What happens in this particular use case is that one
> >> device driver wrote to it's internal buffer with the CPU (so some cache
> >> lines where dirty) and then a device which couldn't deal with that tried
> >> to access it.
> > If so, shouldn't that driver surround its CPU accesses with
> > begin/end_cpu_access() in the first place?
>
> The problem is that the roles are reversed. The callbacks let the
> exporter knows that it needs to flush the caches when the importer is
> done accessing the buffer with the CPU.
>
> But here the exporter is the one accessing the buffer with the CPU and
> the importer then accesses stale data because it doesn't snoop the caches.
>
> What we could do is to force all exporters to use begin/end_cpu_access()
> even on it's own buffers and look at all the importers when the access
> is completed. But that would increase the complexity of the handling in
> the exporter.

I feel like they should be doing so anyway, because it often depends
on the SoC integration whether the DMA can do cache snooping or not.

Although arguably, there is a corner case today where if one uses
dma_alloc_coherent() to get a buffer with a coherent CPU mapping for
device X that is declared as cache-coherent, one also expects not to
need to call begin/end_cpu_access(), but those would be needed if the
buffer was to be imported by device Y that is not cache-coherent...

Sounds like after all it's a mess. I guess your patches make it one
step closer to something sensible, import would fail in such cases.
Although arguably we should be able to still export from driver Y and
import to driver X just fine if Y allocated the buffer as coherent -
otherwise we would break existing users for whom things worked fine.

>
> In other words we would then have code in the exporters which is only
> written for handling the constrains of the importers. This has a wide
> variety of consequences, especially that this functionality of the
> exporter can't be tested without a proper importer.
>
> I was also thinking about reversing the role of exporter and importer in
> the kernel, but came to the conclusion that doing this under the hood
> without userspace knowing about it is probably not going to work either.
>
> > The case that I was suggesting was of a hardware block that actually
> > sits behind the CPU cache and thus dirties it on writes, not the
> > driver doing that. (I haven't personally encountered such a system,
> > though.)
>
> Never heard of anything like that either, but who knows.
>
> >> We could say that all device drivers must always look at the coherency
> >> of the devices which want to access their buffers. But that would
> >> horrible complicate things for maintaining the drivers because then
> >> drivers would need to take into account requirements from other drivers
> >> while allocating their internal buffers.
> > I think it's partially why we have the allocation part of the DMA
> > mapping API, but currently it's only handling requirements of one
> > device. And we don't have any information from the userspace what
> > other devices the buffer would be used with...
>
> Exactly that, yes.

Re: Try to address the DMA-buf coherency problem

2022-12-09 Thread Tomasz Figa
On Mon, Dec 5, 2022 at 5:29 PM Christian König  wrote:
>
> Hi Tomasz,
>
> Am 05.12.22 um 07:41 schrieb Tomasz Figa:
> > [SNIP]
> >> In other words explicit ownership transfer is not something we would
> >> want as requirement in the framework, cause otherwise we break tons of
> >> use cases which require concurrent access to the underlying buffer.
> >>
> >> When a device driver needs explicit ownership transfer it's perfectly
> >> possible to implement this using the dma_fence objects mentioned above.
> >> E.g. drivers can already look at who is accessing a buffer currently and
> >> can even grab explicit ownership of it by adding their own dma_fence
> >> objects.
> >>
> >> The only exception is CPU based access, e.g. when something is written
> >> with the CPU a cache flush might be necessary and when something is read
> >> with the CPU a cache invalidation might be necessary.
> >>
> > Okay, that's much clearer now, thanks for clarifying this. So we
> > should be covered for the cache maintenance needs originating from CPU
> > accesses already, +/- the broken cases which don't call the begin/end
> > CPU access routines that I mentioned above.
> >
> > Similarly, for any ownership transfer between different DMA engines,
> > we should be covered either by the userspace explicitly flushing the
> > hardware pipeline or attaching a DMA-buf fence to the buffer.
> >
> > But then, what's left to be solved? :) (Besides the cases of missing
> > begin/end CPU access calls.)
>
> Well there are multiple problems here:
>
> 1. A lot of userspace applications/frameworks assume that it can
> allocate the buffer anywhere and it just works.
>
> This isn't true at all, we have tons of cases where device can only
> access their special memory for certain use cases.
> Just look at scanout for displaying on dGPU, neither AMD nor NVidia
> supports system memory here. Similar cases exists for audio/video codecs
> where intermediate memory is only accessible by certain devices because
> of content protection.

Ack.

Although I think the most common case on mainstream Linux today is
properly allocating for device X (e.g. V4L2 video decoder or DRM-based
GPU) and hoping that other devices would accept the buffers just fine,
which isn't a given on most platforms (although often it's just about
pixel format, width/height/stride alignment, tiling, etc. rather than
the memory itself). That's why ChromiumOS has minigbm and Android has
gralloc that act as the central point of knowledge on buffer
allocation.

>
> 2. We don't properly communicate allocation requirements to userspace.
>
> E.g. even if you allocate from DMA-Heaps userspace can currently only
> guess if normal, CMA or even device specific memory is needed.

DMA-buf heaps actually make it even more difficult for the userspace,
because now it needs to pick the right heap. With allocation built
into the specific UAPI (like V4L2), it's at least possible to allocate
for one specific device without having any knowledge about allocation
constraints in the userspace.

>
> 3. We seem to lack some essential parts of those restrictions in the
> documentation.
>

Ack.

> >>>> So if a device driver uses cached system memory on an architecture which
> >>>> devices which can't access it the right approach is clearly to reject
> >>>> the access.
> >>> I'd like to accent the fact that "requires cache maintenance" != "can't 
> >>> access".
> >> Well that depends. As said above the exporter exports the buffer as it
> >> was allocated.
> >>
> >> If that means the the exporter provides a piece of memory which requires
> >> CPU cache snooping to access correctly then the best thing we can do is
> >> to prevent an importer which can't do this from attaching.
> > Could you elaborate more about this case? Does it exist in practice?
> > Do I assume correctly that it's about sharing a buffer between one DMA
> > engine that is cache-coherent and another that is non-coherent, where
> > the first one ends up having its accesses always go through some kind
> > of a cache (CPU cache, L2/L3/... cache, etc.)?
>
> Yes, exactly that. What happens in this particular use case is that one
> device driver wrote to it's internal buffer with the CPU (so some cache
> lines where dirty) and then a device which couldn't deal with that tried
> to access it.

If so, shouldn't that driver surround its CPU accesses with
begin/end_cpu_access() in the first place?

The case that I was s

Re: Try to address the DMA-buf coherency problem

2022-12-04 Thread Tomasz Figa
Hi Christian,

On Thu, Nov 17, 2022 at 9:11 PM Christian König
 wrote:
>
> Hi Tomasz,
>
> Am 17.11.22 um 10:35 schrieb Tomasz Figa:
> > Hi Christian and everyone,
> >
> > On Thu, Nov 3, 2022 at 4:14 AM Christian König
> >  wrote:
> >> Am 02.11.22 um 18:10 schrieb Lucas Stach:
> >>> Am Mittwoch, dem 02.11.2022 um 13:21 +0100 schrieb Christian König:
> >>> [SNIP]
> >>>> It would just be doing this for the importer and exactly that
> >>>> would be bad design because we then have handling for the display driver
> >>>> outside of the driver.
> >>>>
> >>> The driver would have to do those cache maintenance operations if it
> >>> directly worked with a non-coherent device. Doing it for the importer
> >>> is just doing it for another device, not the one directly managed by
> >>> the exporter.
> >>>
> >>> I really don't see the difference to the other dma-buf ops: in
> >>> dma_buf_map_attachment the exporter maps the dma-buf on behalf and into
> >>> the address space of the importer. Why would cache maintenance be any
> >>> different?
> >> The issue here is the explicit ownership transfer.
> >>
> >> We intentionally decided against that because it breaks tons of use
> >> cases and is at least by me and a couple of others seen as generally
> >> design failure of the Linux DMA-API.
> > First of all, thanks for starting the discussion and sorry for being
> > late to the party. May I ask you to keep me on CC for any changes that
> > touch the V4L2 videobuf2 framework, as a maintainer of it? I'm okay
> > being copied on the entire series, no need to pick the specific
> > patches. Thanks in advance.
>
> Sorry for that, I've only added people involved in the previous
> discussion. Going to keep you in the loop.
>

No worries. Thanks.

Sorry, for being late with the reply, had a bit of vacation and then
some catching up last week.

> > I agree that we have some design issues in the current DMA-buf
> > framework, but I'd try to approach it a bit differently. Instead of
> > focusing on the issues in the current design, could we write down our
> > requirements and try to come up with how a correct design would look
> > like? (A lot of that has been already mentioned in this thread, but I
> > find it quite difficult to follow and it might not be a complete view
> > either.)
>
> Well, exactly that's what we disagree on.
>
> As far as I can see the current design of DMA-buf is perfectly capable
> of handling all the requirements.
>
> A brief summary of the requirements with some implementation notes:
>
> 1. Device drivers export their memory as it is. E.g. no special handling
> for importers on the exporter side.
>  If an importer can't deal with a specific format, layout, caching
> etc... of the data the correct approach is to reject the attachment.
>  Those parameters are controlled by userspace and negotiating them
> is explicitly not part of the framework.

Ack. I believe it matches the current implementation of the DMA-buf
framework, although as you mentioned, the swiotlb feature of the DMA
mapping framework kind of violates this.

>
> 2. Accesses of the CPU to a buffer are bracket int begin_cpu_access()
> and end_cpu_access() calls.
>  Here we can insert the CPU cache invalidation/flushes as necessary.

Ack. I think a part of the problem today is that there exist userspace
and kernel code instances that don't insert them and assume that some
magic keeps the cache clean...

>
> 3. Accesses of the device HW to a buffer are represented with dma_fence
> objects.
>  It's explicitly allowed to have multiple devices access the buffer
> at the same time.
>

Ack. Again there exists kernel code that doesn't honor or provide DMA
fences (e.g. V4L2).

> 4. Access to the DMA-buf by the HW of an importer is setup by the exporter.
>  In other words the exporter provides a bunch of DMA addresses the
> importer should access.
>  The importer should not try to come up with those on its own.
>
> > That said, let me address a few aspects already mentioned, to make
> > sure that everyone is on the same page.
> >
> >> DMA-Buf let's the exporter setup the DMA addresses the importer uses to
> >> be able to directly decided where a certain operation should go. E.g. we
> >> have cases where for example a P2P write doesn't even go to memory, but
> >> rather a doorbell BAR to trigger another operation. Throwing in CPU
> >> round trips for explicit ownership transfer 

Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-28 Thread Tomasz Figa
On Mon, Nov 28, 2022 at 5:19 PM David Hildenbrand  wrote:
>
> On 28.11.22 09:17, Hans Verkuil wrote:
> > Hi David,
> >
> > On 27/11/2022 11:35, David Hildenbrand wrote:
> >> On 16.11.22 11:26, David Hildenbrand wrote:
> >>> FOLL_FORCE is really only for ptrace access. According to commit
> >>> 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
> >>> writable"), get_vaddr_frames() currently pins all pages writable as a
> >>> workaround for issues with read-only buffers.
> >>>
> >>> FOLL_FORCE, however, seems to be a legacy leftover as it predates
> >>> commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
> >>> always writable"). Let's just remove it.
> >>>
> >>> Once the read-only buffer issue has been resolved, FOLL_WRITE could
> >>> again be set depending on the DMA direction.
> >>>
> >>> Cc: Hans Verkuil 
> >>> Cc: Marek Szyprowski 
> >>> Cc: Tomasz Figa 
> >>> Cc: Marek Szyprowski 
> >>> Cc: Mauro Carvalho Chehab 
> >>> Signed-off-by: David Hildenbrand 
> >>> ---
> >>>drivers/media/common/videobuf2/frame_vector.c | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
> >>> b/drivers/media/common/videobuf2/frame_vector.c
> >>> index 542dde9d2609..062e98148c53 100644
> >>> --- a/drivers/media/common/videobuf2/frame_vector.c
> >>> +++ b/drivers/media/common/videobuf2/frame_vector.c
> >>> @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> >>> nr_frames,
> >>>start = untagged_addr(start);
> >>>  ret = pin_user_pages_fast(start, nr_frames,
> >>> -  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> >>> +  FOLL_WRITE | FOLL_LONGTERM,
> >>>  (struct page **)(vec->ptrs));
> >>>if (ret > 0) {
> >>>vec->got_ref = true;
> >>
> >>
> >> Hi Andrew,
> >>
> >> see the discussion at [1] regarding a conflict and how to proceed with
> >> upstreaming. The conflict would be easy to resolve, however, also
> >> the patch description doesn't make sense anymore with [1].
> >
> > Might it be easier and less confusing if you post a v2 of this series
> > with my patch first? That way it is clear that 1) my patch has to come
> > first, and 2) that it is part of a single series and should be merged
> > by the mm subsystem.
> >
> > Less chances of things going wrong that way.
> >
> > Just mention in the v2 cover letter that the first patch was added to
> > make it easy to backport that fix without being hampered by merge
> > conflicts if it was added after your frame_vector.c patch.
>
> Yes, that's the way I would naturally do, it, however, Andrew prefers
> delta updates for minor changes.
>
> @Andrew, whatever you prefer!
>
> Thanks!
>

However you folks proceed with taking this patch, feel free to add my
Acked-by. Thanks!

Best regards,
Tomasz

> --
> Thanks,
>
> David / dhildenb
>


Re: Try to address the DMA-buf coherency problem

2022-11-17 Thread Tomasz Figa
Hi Christian and everyone,

On Thu, Nov 3, 2022 at 4:14 AM Christian König
 wrote:
>
> Am 02.11.22 um 18:10 schrieb Lucas Stach:
> > Am Mittwoch, dem 02.11.2022 um 13:21 +0100 schrieb Christian König:
> > [SNIP]
> >> It would just be doing this for the importer and exactly that
> >> would be bad design because we then have handling for the display driver
> >> outside of the driver.
> >>
> > The driver would have to do those cache maintenance operations if it
> > directly worked with a non-coherent device. Doing it for the importer
> > is just doing it for another device, not the one directly managed by
> > the exporter.
> >
> > I really don't see the difference to the other dma-buf ops: in
> > dma_buf_map_attachment the exporter maps the dma-buf on behalf and into
> > the address space of the importer. Why would cache maintenance be any
> > different?
>
> The issue here is the explicit ownership transfer.
>
> We intentionally decided against that because it breaks tons of use
> cases and is at least by me and a couple of others seen as generally
> design failure of the Linux DMA-API.

First of all, thanks for starting the discussion and sorry for being
late to the party. May I ask you to keep me on CC for any changes that
touch the V4L2 videobuf2 framework, as a maintainer of it? I'm okay
being copied on the entire series, no need to pick the specific
patches. Thanks in advance.

I agree that we have some design issues in the current DMA-buf
framework, but I'd try to approach it a bit differently. Instead of
focusing on the issues in the current design, could we write down our
requirements and try to come up with how a correct design would look
like? (A lot of that has been already mentioned in this thread, but I
find it quite difficult to follow and it might not be a complete view
either.)

That said, let me address a few aspects already mentioned, to make
sure that everyone is on the same page.

>
> DMA-Buf let's the exporter setup the DMA addresses the importer uses to
> be able to directly decided where a certain operation should go. E.g. we
> have cases where for example a P2P write doesn't even go to memory, but
> rather a doorbell BAR to trigger another operation. Throwing in CPU
> round trips for explicit ownership transfer completely breaks that concept.

It sounds like we should have a dma_dev_is_coherent_with_dev() which
accepts two (or an array?) of devices and tells the caller whether the
devices need explicit ownership transfer. Based on that, your drivers
would install the DMA completion (presumably IRQ) handlers or not.
It's necessary since it's not uncommon that devices A and B could be
in the same coherency domain, while C could be in a different one, but
you may still want them to exchange data through DMA-bufs. Even if it
means the need for some extra round trips it would likely be more
efficient than a full memory copy (might not be true 100% of the
time).

>
> Additional to that a very basic concept of DMA-buf is that the exporter
> provides the buffer as it is and just double checks if the importer can
> access it. For example we have XGMI links which makes memory accessible
> to other devices on the same bus, but not to PCIe device and not even to
> the CPU. Otherwise you wouldn't be able to implement things like secure
> decoding where the data isn't even accessible outside the device to
> device link.

Fully agreed.

>
> So if a device driver uses cached system memory on an architecture which
> devices which can't access it the right approach is clearly to reject
> the access.

I'd like to accent the fact that "requires cache maintenance" != "can't access".

>
> What we can do is to reverse the role of the exporter and importer and
> let the device which needs uncached memory take control. This way this
> device can insert operations as needed, e.g. flush read caches or
> invalidate write caches.
>

(Putting aside the cases when the access is really impossible at all.)
Correct me if I'm wrong, but isn't that because we don't have a proper
hook for the importer to tell the DMA-buf framework to prepare the
buffer for its access?

> This is what we have already done in DMA-buf and what already works
> perfectly fine with use cases which are even more complicated than a
> simple write cache invalidation.
>
>  This is just a software solution which works because of coincident and
>  not because of engineering.
> >>> By mandating a software fallback for the cases where you would need
> >>> bracketed access to the dma-buf, you simply shift the problem into
> >>> userspace. Userspace then creates the bracket by falling back to some
> >>> other import option that mostly do a copy and then the appropriate
> >>> cache maintenance.
> >>>
> >>> While I understand your sentiment about the DMA-API design being
> >>> inconvenient when things are just coherent by system design, the DMA-
> >>> API design wasn't done this way due to bad engineering, but due to the
> >>> fact that performant 

Re: [PATCH v1 5/6] media: videobuf2: Assert held reservation lock for dma-buf mmapping

2022-11-10 Thread Tomasz Figa
On Fri, Nov 11, 2022 at 5:15 AM Dmitry Osipenko
 wrote:
>
> When userspace mmaps dma-buf's fd, the dma-buf reservation lock must be
> held. Add locking sanity checks to the dma-buf mmaping callbacks to ensure
> that the locking assumptions won't regress in the future.
>
> Suggested-by: Daniel Vetter 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 3 +++
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 3 +++
>  drivers/media/common/videobuf2/videobuf2-vmalloc.c| 3 +++
>  3 files changed, 9 insertions(+)
>

Acked-by: Tomasz Figa 

Best regards,
Tomasz

> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index 555bd40fa472..7f45a62969f2 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -11,6 +11,7 @@
>   */
>
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -455,6 +456,8 @@ static int vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf, 
> struct iosys_map *map)
>  static int vb2_dc_dmabuf_ops_mmap(struct dma_buf *dbuf,
> struct vm_area_struct *vma)
>  {
> +   dma_resv_assert_held(dbuf->resv);
> +
> return vb2_dc_mmap(dbuf->priv, vma);
>  }
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
> b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> index 36981a5b5c53..b7f39ee49ed8 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> @@ -10,6 +10,7 @@
>   * the Free Software Foundation.
>   */
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -495,6 +496,8 @@ static int vb2_dma_sg_dmabuf_ops_vmap(struct dma_buf 
> *dbuf,
>  static int vb2_dma_sg_dmabuf_ops_mmap(struct dma_buf *dbuf,
> struct vm_area_struct *vma)
>  {
> +   dma_resv_assert_held(dbuf->resv);
> +
> return vb2_dma_sg_mmap(dbuf->priv, vma);
>  }
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c 
> b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
> index 41db707e43a4..f9b665366365 100644
> --- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
> +++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
> @@ -10,6 +10,7 @@
>   * the Free Software Foundation.
>   */
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -316,6 +317,8 @@ static int vb2_vmalloc_dmabuf_ops_vmap(struct dma_buf 
> *dbuf,
>  static int vb2_vmalloc_dmabuf_ops_mmap(struct dma_buf *dbuf,
> struct vm_area_struct *vma)
>  {
> +   dma_resv_assert_held(dbuf->resv);
> +
> return vb2_vmalloc_mmap(dbuf->priv, vma);
>  }
>
> --
> 2.37.3
>


Re: [PATCH RFC 16/19] mm/frame-vector: remove FOLL_FORCE usage

2022-11-07 Thread Tomasz Figa
Hi David,

On Tue, Nov 8, 2022 at 1:19 AM David Hildenbrand  wrote:
>
> FOLL_FORCE is really only for debugger access. According to commit
> 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
> writable"), the pinned pages are always writable.

Actually that patch is only a workaround to temporarily disable
support for read-only pages as they seemed to suffer from some
corruption issues in the retrieved user pages. We expect to support
read-only pages as hardware input after. That said, FOLL_FORCE doesn't
sound like the right thing even in that case, but I don't know the
background behind it being added here in the first place. +Hans
Verkuil +Marek Szyprowski do you happen to remember anything about it?

Best regards,
Tomasz

>
> FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove
> it.
>
> Cc: Tomasz Figa 
> Cc: Marek Szyprowski 
> Cc: Mauro Carvalho Chehab 
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
> b/drivers/media/common/videobuf2/frame_vector.c
> index 542dde9d2609..062e98148c53 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
> start = untagged_addr(start);
>
> ret = pin_user_pages_fast(start, nr_frames,
> - FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> + FOLL_WRITE | FOLL_LONGTERM,
>   (struct page **)(vec->ptrs));
> if (ret > 0) {
> vec->got_ref = true;
> --
> 2.38.1
>


Re: [PATCH v3 1/9] dma-buf: Add _unlocked postfix to function names

2022-08-31 Thread Tomasz Figa
On Wed, Aug 24, 2022 at 01:22:40PM +0300, Dmitry Osipenko wrote:
> Add _unlocked postfix to the dma-buf API function names in a preparation
> to move all non-dynamic dma-buf users over to the dynamic locking
> specification. This patch only renames API functions, preparing drivers
> to the common locking convention. Later on, we will make the "unlocked"
> functions to take the reservation lock.
> 
> Acked-by: Christian König 
> Suggested-by: Christian König 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/dma-buf/dma-buf.c | 76 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |  4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  4 +-
>  drivers/gpu/drm/armada/armada_gem.c   | 14 ++--
>  drivers/gpu/drm/drm_gem_dma_helper.c  |  6 +-
>  drivers/gpu/drm/drm_gem_shmem_helper.c|  8 +-
>  drivers/gpu/drm/drm_prime.c   | 12 +--
>  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c   |  6 +-
>  drivers/gpu/drm/exynos/exynos_drm_gem.c   |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 12 +--
>  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 20 ++---
>  drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  8 +-
>  drivers/gpu/drm/tegra/gem.c   | 27 +++
>  drivers/infiniband/core/umem_dmabuf.c | 11 +--
>  .../common/videobuf2/videobuf2-dma-contig.c   | 15 ++--
>  .../media/common/videobuf2/videobuf2-dma-sg.c | 12 +--
>  .../common/videobuf2/videobuf2-vmalloc.c  |  6 +-
>  .../platform/nvidia/tegra-vde/dmabuf-cache.c  | 12 +--
>  drivers/misc/fastrpc.c| 12 +--
>  drivers/xen/gntdev-dmabuf.c   | 14 ++--
>  include/linux/dma-buf.h   | 34 +----
>  21 files changed, 162 insertions(+), 153 deletions(-)
> 


For drivers/media/videobu2:

Acked-by: Tomasz Figa 

Best regards,
Tomasz

> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 1c912255c5d6..452a6a1f1e60 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -795,7 +795,7 @@ static struct sg_table * __map_dma_buf(struct 
> dma_buf_attachment *attach,
>  }
>  
>  /**
> - * dma_buf_dynamic_attach - Add the device to dma_buf's attachments list
> + * dma_buf_dynamic_attach_unlocked - Add the device to dma_buf's attachments 
> list
>   * @dmabuf:  [in]buffer to attach device to.
>   * @dev: [in]device to be attached.
>   * @importer_ops:[in]importer operations for the attachment
> @@ -817,9 +817,9 @@ static struct sg_table * __map_dma_buf(struct 
> dma_buf_attachment *attach,
>   * indicated with the error code -EBUSY.
>   */
>  struct dma_buf_attachment *
> -dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
> -const struct dma_buf_attach_ops *importer_ops,
> -void *importer_priv)
> +dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, struct device *dev,
> + const struct dma_buf_attach_ops *importer_ops,
> + void *importer_priv)
>  {
>   struct dma_buf_attachment *attach;
>   int ret;
> @@ -892,25 +892,25 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
> device *dev,
>   if (dma_buf_is_dynamic(attach->dmabuf))
>   dma_resv_unlock(attach->dmabuf->resv);
>  
> - dma_buf_detach(dmabuf, attach);
> + dma_buf_detach_unlocked(dmabuf, attach);
>   return ERR_PTR(ret);
>  }
> -EXPORT_SYMBOL_NS_GPL(dma_buf_dynamic_attach, DMA_BUF);
> +EXPORT_SYMBOL_NS_GPL(dma_buf_dynamic_attach_unlocked, DMA_BUF);
>  
>  /**
> - * dma_buf_attach - Wrapper for dma_buf_dynamic_attach
> + * dma_buf_attach_unlocked - Wrapper for dma_buf_dynamic_attach
>   * @dmabuf:  [in]buffer to attach device to.
>   * @dev: [in]device to be attached.
>   *
> - * Wrapper to call dma_buf_dynamic_attach() for drivers which still use a 
> static
> - * mapping.
> + * Wrapper to call dma_buf_dynamic_attach_unlocked() for drivers which still
> + * use a static mapping.
>   */
> -struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> -   struct device *dev)
> +struct dma_buf_attachment *dma_buf_attach_unlocked(struct dma_buf *dmabuf,
> +struct device *dev)
>  {
> - return dma_buf_dynamic_attach(dmabuf, dev, NULL, NULL);
> + return dma_buf_dynamic_attach_unlocked(dmabuf, dev, NULL, NULL);
>  }
> -EXPORT_SYMBOL_NS_GPL(dma_buf_attach, DMA_BUF);
> +EXPORT_SYMBOL_NS_GPL(dma_buf_attach_unlocked, DMA_BUF);
>  
>  static void __unmap_dma_buf(struct dma_buf_attachment *at

Re: [PATCH] [Draft]: media: videobuf2-dma-heap: add a vendor defined memory runtine

2022-08-24 Thread Tomasz Figa
Hi Randy,

Sorry for the late reply, I went on vacation last week.

On Sun, Aug 7, 2022 at 12:23 AM Hsia-Jun Li  wrote:
>
>
>
> On 8/5/22 18:09, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> >
> >
> > On Tue, Aug 2, 2022 at 9:21 PM ayaka  wrote:
> >>
> >> Sorry, the previous one contains html data.
> >>
> >>> On Aug 2, 2022, at 3:33 PM, Tomasz Figa  wrote:
> >>>
> >>> On Mon, Aug 1, 2022 at 8:43 PM ayaka  wrote:
> >>>> Sent from my iPad
> >>>>>> On Aug 1, 2022, at 5:46 PM, Tomasz Figa  wrote:
> >>>>> CAUTION: Email originated externally, do not click links or open 
> >>>>> attachments unless you recognize the sender and know the content is 
> >>>>> safe.
> >>>>>> On Mon, Aug 1, 2022 at 3:44 PM Hsia-Jun Li  
> >>>>>> wrote:
> >>>>>>> On 8/1/22 14:19, Tomasz Figa wrote:
> >>>>>> Hello Tomasz
> >>>>>>> ?Hi Randy,
> >>>>>>> On Mon, Aug 1, 2022 at 5:21 AM  wrote:
> >>>>>>>> From: Randy Li 
> >>>>>>>> This module is still at a early stage, I wrote this for showing what
> >>>>>>>> APIs we need here.
> >>>>>>>> Let me explain why we need such a module here.
> >>>>>>>> If you won't allocate buffers from a V4L2 M2M device, this module
> >>>>>>>> may not be very useful. I am sure the most of users won't know a
> >>>>>>>> device would require them allocate buffers from a DMA-Heap then
> >>>>>>>> import those buffers into a V4L2's queue.
> >>>>>>>> Then the question goes back to why DMA-Heap. From the Android's
> >>>>>>>> description, we know it is about the copyright's DRM.
> >>>>>>>> When we allocate a buffer in a DMA-Heap, it may register that buffer
> >>>>>>>> in the trusted execution environment so the firmware which is running
> >>>>>>>> or could only be acccesed from there could use that buffer later.
> >>>>>>>> The answer above leads to another thing which is not done in this
> >>>>>>>> version, the DMA mapping. Although in some platforms, a DMA-Heap
> >>>>>>>> responses a IOMMU device as well. For the genernal purpose, we would
> >>>>>>>> be better assuming the device mapping should be done for each device
> >>>>>>>> itself. The problem here we only know alloc_devs in those DMAbuf
> >>>>>>>> methods, which are DMA-heaps in my design, the device from the queue
> >>>>>>>> is not enough, a plane may requests another IOMMU device or table
> >>>>>>>> for mapping.
> >>>>>>>> Signed-off-by: Randy Li 
> >>>>>>>> ---
> >>>>>>>> drivers/media/common/videobuf2/Kconfig|   6 +
> >>>>>>>> drivers/media/common/videobuf2/Makefile   |   1 +
> >>>>>>>> .../common/videobuf2/videobuf2-dma-heap.c | 350 
> >>>>>>>> ++
> >>>>>>>> include/media/videobuf2-dma-heap.h|  30 ++
> >>>>>>>> 4 files changed, 387 insertions(+)
> >>>>>>>> create mode 100644 
> >>>>>>>> drivers/media/common/videobuf2/videobuf2-dma-heap.c
> >>>>>>>> create mode 100644 include/media/videobuf2-dma-heap.h
> >>>>>>> First of all, thanks for the series.
> >>>>>>> Possibly a stupid question, but why not just allocate the DMA-bufs
> >>>>>>> directly from the DMA-buf heap device in the userspace and just import
> >>>>>>> the buffers to the V4L2 device using V4L2_MEMORY_DMABUF?
> >>>>>> Sometimes the allocation policy could be very complex, let's suppose a
> >>>>>> multiple planes pixel format enabling with frame buffer compression.
> >>>>>> Its luma, chroma data could be allocated from a pool which is delegated
> >>>>>> for large buffers while its metadata would come from a pool which many
> >>>>>> use

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-22 Thread Tomasz Figa
On Sat, Aug 20, 2022 at 12:44 AM Hsia-Jun Li  wrote:
>
>
>
> On 8/19/22 23:28, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
> >> On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
> >>> On 8/18/22 14:06, Tomasz Figa wrote:
> >>>> On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  
> >>>> wrote:
> >>>>>
> >>>>> From: "Hsia-Jun(Randy) Li" 
> >>>>>
> >>>>> The most of detail has been written in the drm.
> >>
> >> This patch still needs a description of the format, which should go to
> >> Documentation/userspace-api/media/v4l/.
> >>
> >>>>> Please notice that the tiled formats here request
> >>>>> one more plane for storing the motion vector metadata.
> >>>>> This buffer won't be compressed, so you can't append
> >>>>> it to luma or chroma plane.
> >>>>
> >>>> Does the motion vector buffer need to be exposed to userspace? Is the
> >>>> decoder stateless (requires userspace to specify the reference frames)
> >>>> or stateful (manages the entire decoding process internally)?
> >>>
> >>> No, users don't need to access them at all. Just they need a different
> >>> dma-heap.
> >>>
> >>> You would only get the stateful version of both encoder and decoder.
> >>
> >> Shouldn't the motion vectors be stored in a separate V4L2 buffer,
> >> submitted through a different queue then ?
> >
> > Imho, I believe these should be invisible to users and pooled separately to
> > reduce the overhead. The number of reference is usually lower then the 
> > number of
> > allocated display buffers.
> >
> You can't. The motion vector buffer can't share with the luma and chroma
> data planes, nor the data plane for the compression meta data.

I believe what Nicolas is suggesting is to just keep the MV buffer
handling completely separate from video buffers. Just keep a map
between frame buffer and MV buffer in the driver and use the right
buffer when triggering a decode.

>
> You could consider this as a security requirement(the memory region for
> the MV could only be accessed by the decoder) or hardware limitation.
>
> It is also not very easy to manage such a large buffer that would change
> when the resolution changed.

How does it differ from managing additional planes of video buffers?

Best regards,
Tomasz

> >>
> >>>>> Signed-off-by: Hsia-Jun(Randy) Li 
> >>>>> ---
> >>>>>drivers/media/v4l2-core/v4l2-common.c | 1 +
> >>>>>drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
> >>>>>include/uapi/linux/videodev2.h| 2 ++
> >>>>>3 files changed, 5 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/media/v4l2-core/v4l2-common.c 
> >>>>> b/drivers/media/v4l2-core/v4l2-common.c
> >>>>> index e0fbe6ba4b6c..f645278b3055 100644
> >>>>> --- a/drivers/media/v4l2-core/v4l2-common.c
> >>>>> +++ b/drivers/media/v4l2-core/v4l2-common.c
> >>>>> @@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 
> >>>>> format)
> >>>>>   { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
> >>>>> V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> >>>>> 0, 0 }, .hdiv = 1, .vdiv = 1 },
> >>>>>   { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
> >>>>> V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> >>>>> 0, 0 }, .hdiv = 1, .vdiv = 1 },
> >>>>>   { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
> >>>>> V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> >>>>> 0, 0 }, .hdiv = 1, .vdiv = 1 },
> >>>>> +   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
> >>>>> V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 
> >>>>> 0, 0 }, .hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 
> >>>>> 128, 128 } },
> >>>>>

Re: [PATCH 1/2] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-08-17 Thread Tomasz Figa
Hi Randy,

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:
>
> From: "Hsia-Jun(Randy) Li" 
>
> Memory Traffic Reduction(MTR) is a module in Synaptics
> VideoSmart platform could process lossless compression image
> and cache the tile memory line.
>
> Those modifiers only record the parameters would effort pixel
> layout or memory layout. Whether physical memory page mapping
> is used is not a part of format.
>
> We would allocate the same size of memory for uncompressed
> and compressed luma and chroma data, while the compressed buffer
> would request two extra planes holding the metadata for
> the decompression.
>
> The reason why we need to allocate the same size of memory for
> the compressed frame:
> 1. The compression ratio is not fixed and differnt platforms could
> use a different compression protocol. These protocols are complete
> vendor proprietaries, the other device won't be able to use them.
> It is not necessary to define the version of them here.
>
> 2. Video codec could discard the compression attribute when the
> intra block copy applied to this frame. It would waste lots of
> time on re-allocation.
>
> I am wondering if it is better to add an addtional plane property to
> describe whether the current framebuffer is compressed?
> While the compression flag is still a part of format modifier,
> because it would have two extra meta data planes in the compression
> version.
>
> Signed-off-by: Hsia-Jun(Randy) Li 
> ---
>  include/uapi/drm/drm_fourcc.h | 49 +++
>  1 file changed, 49 insertions(+)
>
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 0206f812c569..b67884e8bc69 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -381,6 +381,7 @@ extern "C" {
>  #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
>  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
>  #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> +#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b
>
>  /* add more to the end as needed */
>
> @@ -1452,6 +1453,54 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 
> modifier)
>  #define AMD_FMT_MOD_CLEAR(field) \
> (~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
>
> +/*
> + * Synaptics VideoSmart modifiers
> + *
> + *   Macro
> + * Bits  Param Description
> + *   - 
> -
> + *
> + *  7:0  f Scan direction description.
> + *
> + *   0 = Invalid
> + *   1 = V4, the scan would always start from vertical for 4 
> pixel
> + *   then move back to the start pixel of the next horizontal
> + *   direction.
> + *   2 = Reserved for future use.

I guess 2..255 are all reserved for future use?

> + *
> + * 15:8  m The times of pattern repeat in the right angle direction from
> + * the first scan direction.
> + *
> + * 19:16 p The padding bits after the whole scan, could be zero.

What is the meaning of "scan" and "whole scan" here?

Best regards,
Tomasz

> + *
> + * 35:20 - Reserved for future use.  Must be zero.
> + *
> + * 36:36 c Compression flag.
> + *
> + * 55:37 - Reserved for future use.  Must be zero.
> + *
> + */
> +
> +#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
> +
> +#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, c) \
> +   fourcc_mod_code(SYNAPTICS, (((f) & 0xff) | \
> +(((m) & 0xff) << 8) | \
> +(((p) & 0xf) << 16) | \
> +(((c) & 0x1) << 36)))
> +
> +#define DRM_FORMAT_MOD_SYNA_V4H1 \
> +   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0)
> +
> +#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
> +   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0)
> +
> +#define DRM_FORMAT_MOD_SYNA_V4H1_COMPRESSED \
> +   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1)
> +
> +#define DRM_FORMAT_MOD_SYNA_V4H3P8_COMPRESSED \
> +   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1)
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> --
> 2.17.1
>


Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-17 Thread Tomasz Figa
Hi Randy,

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:
>
> From: "Hsia-Jun(Randy) Li" 
>
> The most of detail has been written in the drm.
> Please notice that the tiled formats here request
> one more plane for storing the motion vector metadata.
> This buffer won't be compressed, so you can't append
> it to luma or chroma plane.

Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?

Best regards,
Tomasz

>
> Signed-off-by: Hsia-Jun(Randy) Li 
> ---
>  drivers/media/v4l2-core/v4l2-common.c | 1 +
>  drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
>  include/uapi/linux/videodev2.h| 2 ++
>  3 files changed, 5 insertions(+)
>
> diff --git a/drivers/media/v4l2-core/v4l2-common.c 
> b/drivers/media/v4l2-core/v4l2-common.c
> index e0fbe6ba4b6c..f645278b3055 100644
> --- a/drivers/media/v4l2-core/v4l2-common.c
> +++ b/drivers/media/v4l2-core/v4l2-common.c
> @@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 
> format)
> { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
> V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 
> }, .hdiv = 1, .vdiv = 1 },
> { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
> V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 
> }, .hdiv = 1, .vdiv = 1 },
> { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
> V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 
> }, .hdiv = 1, .vdiv = 1 },
> +   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
> V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, 
> .hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
> };
> unsigned int i;
>
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
> b/drivers/media/v4l2-core/v4l2-ioctl.c
> index e6fd355a2e92..8f65964aff08 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> case V4L2_PIX_FMT_MT21C:descr = "Mediatek Compressed 
> Format"; break;
> case V4L2_PIX_FMT_QC08C:descr = "QCOM Compressed 
> 8-bit Format"; break;
> case V4L2_PIX_FMT_QC10C:descr = "QCOM Compressed 
> 10-bit Format"; break;
> +   case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics Compressed 
> 8-bit tiled Format";break;
> +   case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = "Synaptics 
> Compressed 10-bit tiled Format";break;
> default:
> if (fmt->description[0])
> return;
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 01e630f2ec78..7e928cb69e7c 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -661,6 +661,8 @@ struct v4l2_pix_format {
>  #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12  
> Y/CbCr 4:2:0 16x16 tiles */
>  #define V4L2_PIX_FMT_NV12M_8L128  v4l2_fourcc('N', 'A', '1', '2') /* 
> Y/CbCr 4:2:0 8x128 tiles */
>  #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', '2') /* 
> Y/CbCr 4:2:0 10-bit 8x128 tiles */
> +#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 12  
> Y/CbCr 4:2:0 tiles */
> +#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0')   /* 
> 12  Y/CbCr 4:2:0 10-bits tiles */
>
>  /* Bayer formats - see http://www.siliconimaging.com/RGB%20Bayer.htm */
>  #define V4L2_PIX_FMT_SBGGR8  v4l2_fourcc('B', 'A', '8', '1') /*  8  BGBG.. 
> GRGR.. */
> --
> 2.17.1
>


Re: [PATCH] [Draft]: media: videobuf2-dma-heap: add a vendor defined memory runtine

2022-08-05 Thread Tomasz Figa
On Tue, Aug 2, 2022 at 9:21 PM ayaka  wrote:
>
> Sorry, the previous one contains html data.
>
> > On Aug 2, 2022, at 3:33 PM, Tomasz Figa  wrote:
> >
> > On Mon, Aug 1, 2022 at 8:43 PM ayaka  wrote:
> >> Sent from my iPad
> >>>> On Aug 1, 2022, at 5:46 PM, Tomasz Figa  wrote:
> >>> CAUTION: Email originated externally, do not click links or open 
> >>> attachments unless you recognize the sender and know the content is safe.
> >>>> On Mon, Aug 1, 2022 at 3:44 PM Hsia-Jun Li  
> >>>> wrote:
> >>>>> On 8/1/22 14:19, Tomasz Figa wrote:
> >>>> Hello Tomasz
> >>>>> ?Hi Randy,
> >>>>> On Mon, Aug 1, 2022 at 5:21 AM  wrote:
> >>>>>> From: Randy Li 
> >>>>>> This module is still at a early stage, I wrote this for showing what
> >>>>>> APIs we need here.
> >>>>>> Let me explain why we need such a module here.
> >>>>>> If you won't allocate buffers from a V4L2 M2M device, this module
> >>>>>> may not be very useful. I am sure the most of users won't know a
> >>>>>> device would require them allocate buffers from a DMA-Heap then
> >>>>>> import those buffers into a V4L2's queue.
> >>>>>> Then the question goes back to why DMA-Heap. From the Android's
> >>>>>> description, we know it is about the copyright's DRM.
> >>>>>> When we allocate a buffer in a DMA-Heap, it may register that buffer
> >>>>>> in the trusted execution environment so the firmware which is running
> >>>>>> or could only be acccesed from there could use that buffer later.
> >>>>>> The answer above leads to another thing which is not done in this
> >>>>>> version, the DMA mapping. Although in some platforms, a DMA-Heap
> >>>>>> responses a IOMMU device as well. For the genernal purpose, we would
> >>>>>> be better assuming the device mapping should be done for each device
> >>>>>> itself. The problem here we only know alloc_devs in those DMAbuf
> >>>>>> methods, which are DMA-heaps in my design, the device from the queue
> >>>>>> is not enough, a plane may requests another IOMMU device or table
> >>>>>> for mapping.
> >>>>>> Signed-off-by: Randy Li 
> >>>>>> ---
> >>>>>> drivers/media/common/videobuf2/Kconfig|   6 +
> >>>>>> drivers/media/common/videobuf2/Makefile   |   1 +
> >>>>>> .../common/videobuf2/videobuf2-dma-heap.c | 350 ++
> >>>>>> include/media/videobuf2-dma-heap.h|  30 ++
> >>>>>> 4 files changed, 387 insertions(+)
> >>>>>> create mode 100644 drivers/media/common/videobuf2/videobuf2-dma-heap.c
> >>>>>> create mode 100644 include/media/videobuf2-dma-heap.h
> >>>>> First of all, thanks for the series.
> >>>>> Possibly a stupid question, but why not just allocate the DMA-bufs
> >>>>> directly from the DMA-buf heap device in the userspace and just import
> >>>>> the buffers to the V4L2 device using V4L2_MEMORY_DMABUF?
> >>>> Sometimes the allocation policy could be very complex, let's suppose a
> >>>> multiple planes pixel format enabling with frame buffer compression.
> >>>> Its luma, chroma data could be allocated from a pool which is delegated
> >>>> for large buffers while its metadata would come from a pool which many
> >>>> users could take some few slices from it(likes system pool).
> >>>> Then when we have a new users knowing nothing about this platform, if we
> >>>> just configure the alloc_devs in each queues well. The user won't need
> >>>> to know those complex rules.
> >>>> The real situation could be more complex, Samsung MFC's left and right
> >>>> banks could be regarded as two pools, many devices would benefit from
> >>>> this either from the allocation times or the security buffers policy.
> >>>> In our design, when we need to do some security decoding(DRM video),
> >>>> codecs2 would allocate buffers from the pool delegated for that. While
> >>>> the non-DRM video, users could not care about this.
> >>> I'm a little bit surprised about this, because on 

Re: [PATCH] [Draft]: media: videobuf2-dma-heap: add a vendor defined memory runtine

2022-08-02 Thread Tomasz Figa
On Mon, Aug 1, 2022 at 8:43 PM ayaka  wrote:
>
>
>
> Sent from my iPad
>
> > On Aug 1, 2022, at 5:46 PM, Tomasz Figa  wrote:
> >
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> >
> >
> >> On Mon, Aug 1, 2022 at 3:44 PM Hsia-Jun Li  wrote:
> >>> On 8/1/22 14:19, Tomasz Figa wrote:
> >> Hello Tomasz
> >>> ?Hi Randy,
> >>> On Mon, Aug 1, 2022 at 5:21 AM  wrote:
> >>>> From: Randy Li 
> >>>> This module is still at a early stage, I wrote this for showing what
> >>>> APIs we need here.
> >>>> Let me explain why we need such a module here.
> >>>> If you won't allocate buffers from a V4L2 M2M device, this module
> >>>> may not be very useful. I am sure the most of users won't know a
> >>>> device would require them allocate buffers from a DMA-Heap then
> >>>> import those buffers into a V4L2's queue.
> >>>> Then the question goes back to why DMA-Heap. From the Android's
> >>>> description, we know it is about the copyright's DRM.
> >>>> When we allocate a buffer in a DMA-Heap, it may register that buffer
> >>>> in the trusted execution environment so the firmware which is running
> >>>> or could only be acccesed from there could use that buffer later.
> >>>> The answer above leads to another thing which is not done in this
> >>>> version, the DMA mapping. Although in some platforms, a DMA-Heap
> >>>> responses a IOMMU device as well. For the genernal purpose, we would
> >>>> be better assuming the device mapping should be done for each device
> >>>> itself. The problem here we only know alloc_devs in those DMAbuf
> >>>> methods, which are DMA-heaps in my design, the device from the queue
> >>>> is not enough, a plane may requests another IOMMU device or table
> >>>> for mapping.
> >>>> Signed-off-by: Randy Li 
> >>>> ---
> >>>> drivers/media/common/videobuf2/Kconfig|   6 +
> >>>> drivers/media/common/videobuf2/Makefile   |   1 +
> >>>> .../common/videobuf2/videobuf2-dma-heap.c | 350 ++
> >>>> include/media/videobuf2-dma-heap.h|  30 ++
> >>>> 4 files changed, 387 insertions(+)
> >>>> create mode 100644 drivers/media/common/videobuf2/videobuf2-dma-heap.c
> >>>> create mode 100644 include/media/videobuf2-dma-heap.h
> >>> First of all, thanks for the series.
> >>> Possibly a stupid question, but why not just allocate the DMA-bufs
> >>> directly from the DMA-buf heap device in the userspace and just import
> >>> the buffers to the V4L2 device using V4L2_MEMORY_DMABUF?
> >> Sometimes the allocation policy could be very complex, let's suppose a
> >> multiple planes pixel format enabling with frame buffer compression.
> >> Its luma, chroma data could be allocated from a pool which is delegated
> >> for large buffers while its metadata would come from a pool which many
> >> users could take some few slices from it(likes system pool).
> >> Then when we have a new users knowing nothing about this platform, if we
> >> just configure the alloc_devs in each queues well. The user won't need
> >> to know those complex rules.
> >> The real situation could be more complex, Samsung MFC's left and right
> >> banks could be regarded as two pools, many devices would benefit from
> >> this either from the allocation times or the security buffers policy.
> >> In our design, when we need to do some security decoding(DRM video),
> >> codecs2 would allocate buffers from the pool delegated for that. While
> >> the non-DRM video, users could not care about this.
> >
> > I'm a little bit surprised about this, because on Android all the
> > graphics buffers are allocated from the system IAllocator and imported
> > to the specific devices.
> In the non-tunnel mode, yes it is. While the tunnel mode is completely vendor 
> defined. Neither HWC nor codec2 cares about where the buffers coming from, 
> you could do what ever you want.
>
> Besides there are DRM video in GNU Linux platform, I heard the webkit has 
> made huge effort here and Playready is one could work in non-Android Linux.
> > Would it make sense to instead extend the UAPI to expose enough
> > information about the allocatio

Re: [PATCH v1 0/6] Move all drivers to a common dma-buf locking convention

2022-07-19 Thread Tomasz Figa
On Fri, Jul 15, 2022 at 9:53 AM Dmitry Osipenko
 wrote:
>
> Hello,
>
> This series moves all drivers to a dynamic dma-buf locking specification.
> From now on all dma-buf importers are made responsible for holding
> dma-buf's reservation lock around all operations performed over dma-bufs.
> This common locking convention allows us to utilize reservation lock more
> broadly around kernel without fearing of potential dead locks.
>
> This patchset passes all i915 selftests. It was also tested using VirtIO,
> Panfrost, Lima and Tegra drivers. I tested cases of display+GPU,
> display+V4L and GPU+V4L dma-buf sharing, which covers majority of kernel
> drivers since rest of the drivers share same or similar code paths.
>
> This is a continuation of [1] where Christian König asked to factor out
> the dma-buf locking changes into separate series.
>
> [1] 
> https://lore.kernel.org/dri-devel/20220526235040.678984-1-dmitry.osipe...@collabora.com/
>
> Dmitry Osipenko (6):
>   dma-buf: Add _unlocked postfix to function names
>   drm/gem: Take reservation lock for vmap/vunmap operations
>   dma-buf: Move all dma-bufs to dynamic locking specification
>   dma-buf: Acquire wait-wound context on attachment
>   media: videobuf2: Stop using internal dma-buf lock
>   dma-buf: Remove internal lock
>
>  drivers/dma-buf/dma-buf.c | 198 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   4 +-
>  drivers/gpu/drm/armada/armada_gem.c   |  14 +-
>  drivers/gpu/drm/drm_client.c  |   4 +-
>  drivers/gpu/drm/drm_gem.c |  28 +++
>  drivers/gpu/drm/drm_gem_cma_helper.c  |   6 +-
>  drivers/gpu/drm/drm_gem_framebuffer_helper.c  |   6 +-
>  drivers/gpu/drm/drm_gem_shmem_helper.c|   6 +-
>  drivers/gpu/drm/drm_prime.c   |  12 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c   |   6 +-
>  drivers/gpu/drm/exynos/exynos_drm_gem.c   |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  20 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h|   6 +-
>  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  |  20 +-
>  drivers/gpu/drm/i915/i915_gem_evict.c |   2 +-
>  drivers/gpu/drm/i915/i915_gem_ww.c|  26 ++-
>  drivers/gpu/drm/i915/i915_gem_ww.h|  15 +-
>  drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |   8 +-
>  drivers/gpu/drm/qxl/qxl_object.c  |  17 +-
>  drivers/gpu/drm/qxl/qxl_prime.c   |   4 +-
>  drivers/gpu/drm/tegra/gem.c   |  27 +--
>  drivers/infiniband/core/umem_dmabuf.c |  11 +-
>  .../common/videobuf2/videobuf2-dma-contig.c   |  26 +--
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  23 +-
>  .../common/videobuf2/videobuf2-vmalloc.c  |  17 +-

For the videobuf2 changes:

Acked-by: Tomasz Figa 

Best regards,
Tomasz


Re: [PATCH v6 09/17] media/videbuf1|2: Mark follow_pfn usage as unsafe

2020-11-20 Thread Tomasz Figa
On Fri, Nov 20, 2020 at 9:08 PM Hans Verkuil  wrote:
>
> On 20/11/2020 11:51, Daniel Vetter wrote:
> > On Fri, Nov 20, 2020 at 11:39 AM Hans Verkuil  wrote:
> >>
> >> On 20/11/2020 10:18, Daniel Vetter wrote:
> >>> On Fri, Nov 20, 2020 at 9:28 AM Hans Verkuil  wrote:
> >>>>
> >>>> On 20/11/2020 09:06, Hans Verkuil wrote:
> >>>>> On 19/11/2020 15:41, Daniel Vetter wrote:
> >>>>>> The media model assumes that buffers are all preallocated, so that
> >>>>>> when a media pipeline is running we never miss a deadline because the
> >>>>>> buffers aren't allocated or available.
> >>>>>>
> >>>>>> This means we cannot fix the v4l follow_pfn usage through
> >>>>>> mmu_notifier, without breaking how this all works. The only real fix
> >>>>>> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> >>>>>> tell everyone to cut over to dma-buf memory sharing for zerocopy.
> >>>>>>
> >>>>>> userptr for normal memory will keep working as-is, this only affects
> >>>>>> the zerocopy userptr usage enabled in 50ac952d2263 ("[media]
> >>>>>> videobuf2-dma-sg: Support io userptr operations on io memory").
> >>>>>>
> >>>>>> Acked-by: Tomasz Figa 
> >>>>>
> >>>>> Acked-by: Hans Verkuil 
> >>>>
> >>>> Actually, cancel this Acked-by.
> >>>>
> >>>> So let me see if I understand this right: VM_IO | VM_PFNMAP mappings can
> >>>> move around. There is a mmu_notifier that can be used to be notified when
> >>>> that happens, but that can't be used with media buffers since those 
> >>>> buffers
> >>>> must always be available and in the same place.
> >>>>
> >>>> So follow_pfn is replaced by unsafe_follow_pfn to signal that what is 
> >>>> attempted
> >>>> is unsafe and unreliable.
> >>>>
> >>>> If CONFIG_STRICT_FOLLOW_PFN is set, then unsafe_follow_pfn will fail, if 
> >>>> it
> >>>> is unset, then it writes a warning to the kernel log but just continues 
> >>>> while
> >>>> still unsafe.
> >>>>
> >>>> I am very much inclined to just drop VM_IO | VM_PFNMAP support in the 
> >>>> media
> >>>> subsystem. For vb2 there is a working alternative in the form of dmabuf, 
> >>>> and
> >>>> frankly for vb1 I don't care. If someone really needs this for a vb1 
> >>>> driver,
> >>>> then they can do the work to convert that driver to vb2.
> >>>>
> >>>> I've added Mauro to the CC list and I'll ping a few more people to see 
> >>>> what
> >>>> they think, but in my opinion support for USERPTR + VM_IO | VM_PFNMAP
> >>>> should just be killed off.
> >>>>
> >>>> If others would like to keep it, then frame_vector.c needs a comment 
> >>>> before
> >>>> the 'while' explaining why the unsafe_follow_pfn is there and that using
> >>>> dmabuf is the proper alternative to use. That will make it easier for
> >>>> developers to figure out why they see a kernel warning and what to do to
> >>>> fix it, rather than having to dig through the git history for the reason.
> >>>
> >>> I'm happy to add a comment, but otherwise if you all want to ditch
> >>> this, can we do this as a follow up on top? There's quite a bit of
> >>> code that can be deleted and I'd like to not hold up this patch set
> >>> here on that - it's already a fairly sprawling pain touching about 7
> >>> different subsystems (ok only 6-ish now since the s390 patch landed).
> >>> For the comment, is the explanation next to unsafe_follow_pfn not good
> >>> enough?
> >>
> >> No, because that doesn't mention that you should use dma-buf as a 
> >> replacement.
> >> That's really the critical piece of information I'd like to see. That 
> >> doesn't
> >> belong in unsafe_follow_pfn, it needs to be in frame_vector.c since it's
> >> vb2 specific.
> >
> > Ah makes sense, I'll add that.
> >
> >>>
> >>> So ... can I get you to un-cancel yo

Re: [PATCH v6 09/17] media/videbuf1|2: Mark follow_pfn usage as unsafe

2020-11-20 Thread Tomasz Figa
On Fri, Nov 20, 2020 at 5:28 PM Hans Verkuil  wrote:
>
> On 20/11/2020 09:06, Hans Verkuil wrote:
> > On 19/11/2020 15:41, Daniel Vetter wrote:
> >> The media model assumes that buffers are all preallocated, so that
> >> when a media pipeline is running we never miss a deadline because the
> >> buffers aren't allocated or available.
> >>
> >> This means we cannot fix the v4l follow_pfn usage through
> >> mmu_notifier, without breaking how this all works. The only real fix
> >> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> >> tell everyone to cut over to dma-buf memory sharing for zerocopy.
> >>
> >> userptr for normal memory will keep working as-is, this only affects
> >> the zerocopy userptr usage enabled in 50ac952d2263 ("[media]
> >> videobuf2-dma-sg: Support io userptr operations on io memory").
> >>
> >> Acked-by: Tomasz Figa 
> >
> > Acked-by: Hans Verkuil 
>
> Actually, cancel this Acked-by.
>
> So let me see if I understand this right: VM_IO | VM_PFNMAP mappings can
> move around. There is a mmu_notifier that can be used to be notified when
> that happens, but that can't be used with media buffers since those buffers
> must always be available and in the same place.
>
> So follow_pfn is replaced by unsafe_follow_pfn to signal that what is 
> attempted
> is unsafe and unreliable.
>
> If CONFIG_STRICT_FOLLOW_PFN is set, then unsafe_follow_pfn will fail, if it
> is unset, then it writes a warning to the kernel log but just continues while
> still unsafe.
>
> I am very much inclined to just drop VM_IO | VM_PFNMAP support in the media
> subsystem. For vb2 there is a working alternative in the form of dmabuf, and
> frankly for vb1 I don't care. If someone really needs this for a vb1 driver,
> then they can do the work to convert that driver to vb2.
>
> I've added Mauro to the CC list and I'll ping a few more people to see what
> they think, but in my opinion support for USERPTR + VM_IO | VM_PFNMAP
> should just be killed off.
>
> If others would like to keep it, then frame_vector.c needs a comment before
> the 'while' explaining why the unsafe_follow_pfn is there and that using
> dmabuf is the proper alternative to use. That will make it easier for
> developers to figure out why they see a kernel warning and what to do to
> fix it, rather than having to dig through the git history for the reason.

I'm all for dropping that code.

Best regards,
Tomasz

>
> Regards,
>
> Hans
>
> >
> > Thanks!
> >
> >   Hans
> >
> >> Signed-off-by: Daniel Vetter 
> >> Cc: Jason Gunthorpe 
> >> Cc: Kees Cook 
> >> Cc: Dan Williams 
> >> Cc: Andrew Morton 
> >> Cc: John Hubbard 
> >> Cc: Jérôme Glisse 
> >> Cc: Jan Kara 
> >> Cc: Dan Williams 
> >> Cc: linux...@kvack.org
> >> Cc: linux-arm-ker...@lists.infradead.org
> >> Cc: linux-samsung-...@vger.kernel.org
> >> Cc: linux-me...@vger.kernel.org
> >> Cc: Pawel Osciak 
> >> Cc: Marek Szyprowski 
> >> Cc: Kyungmin Park 
> >> Cc: Tomasz Figa 
> >> Cc: Laurent Dufour 
> >> Cc: Vlastimil Babka 
> >> Cc: Daniel Jordan 
> >> Cc: Michel Lespinasse 
> >> Signed-off-by: Daniel Vetter 
> >> --
> >> v3:
> >> - Reference the commit that enabled the zerocopy userptr use case to
> >>   make it abundandtly clear that this patch only affects that, and not
> >>   normal memory userptr. The old commit message already explained that
> >>   normal memory userptr is unaffected, but I guess that was not clear
> >>   enough.
> >> ---
> >>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
> >>  drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
> >>  2 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
> >> b/drivers/media/common/videobuf2/frame_vector.c
> >> index a0e65481a201..1a82ec13ea00 100644
> >> --- a/drivers/media/common/videobuf2/frame_vector.c
> >> +++ b/drivers/media/common/videobuf2/frame_vector.c
> >> @@ -70,7 +70,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> >> nr_frames,
> >>  break;
> >>
> >>  while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> >> -err = follow_pfn(vma, start, &nums[ret]);
> >> +err = unsafe_follow_p

Re: [PATCH v5 05/15] mm/frame-vector: Use FOLL_LONGTERM

2020-11-02 Thread Tomasz Figa
On Fri, Oct 30, 2020 at 3:38 PM Daniel Vetter  wrote:
>
> On Fri, Oct 30, 2020 at 3:11 PM Tomasz Figa  wrote:
> >
> > On Fri, Oct 30, 2020 at 11:08 AM Daniel Vetter  
> > wrote:
> > >
> > > This is used by media/videbuf2 for persistent dma mappings, not just
> > > for a single dma operation and then freed again, so needs
> > > FOLL_LONGTERM.
> > >
> > > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> > > locking issues. Rework the code to pull the pup path out from the
> > > mmap_sem critical section as suggested by Jason.
> > >
> > > By relying entirely on the vma checks in pin_user_pages and follow_pfn
> > > (for vm_flags and vma_is_fsdax) we can also streamline the code a lot.
> > >
> > > Signed-off-by: Daniel Vetter 
> > > Cc: Jason Gunthorpe 
> > > Cc: Pawel Osciak 
> > > Cc: Marek Szyprowski 
> > > Cc: Kyungmin Park 
> > > Cc: Tomasz Figa 
> > > Cc: Mauro Carvalho Chehab 
> > > Cc: Andrew Morton 
> > > Cc: John Hubbard 
> > > Cc: Jérôme Glisse 
> > > Cc: Jan Kara 
> > > Cc: Dan Williams 
> > > Cc: linux...@kvack.org
> > > Cc: linux-arm-ker...@lists.infradead.org
> > > Cc: linux-samsung-...@vger.kernel.org
> > > Cc: linux-me...@vger.kernel.org
> > > Signed-off-by: Daniel Vetter 
> > > --
> > > v2: Streamline the code and further simplify the loop checks (Jason)
> > >
> > > v5: Review from Tomasz:
> > > - fix page counting for the follow_pfn case by resetting ret
> > > - drop gup_flags paramater, now unused
> > > ---
> > >  .../media/common/videobuf2/videobuf2-memops.c |  3 +-
> > >  include/linux/mm.h|  2 +-
> > >  mm/frame_vector.c | 53 ++-
> > >  3 files changed, 19 insertions(+), 39 deletions(-)
> > >
> >
> > Thanks, looks good to me now.
> >
> > Acked-by: Tomasz Figa 
> >
> > From reading the code, this is quite unlikely to introduce any
> > behavior changes, but just to be safe, did you have a chance to test
> > this with some V4L2 driver?
>
> Nah, unfortunately not.

I believe we don't have any setup that could exercise the IO/PFNMAP
user pointers, but it should be possible to exercise the basic userptr
path by enabling the virtual (fake) video driver, vivid or
CONFIG_VIDEO_VIVID, in your kernel and then using yavta [1] with
--userptr and --capture= (and possibly some more
options) to grab a couple of frames from the test pattern generator.

Does it sound like something that you could give a try? Feel free to
ping me on IRC (tfiga on #v4l or #dri-devel) if you need any help.

[1] https://git.ideasonboard.org/yavta.git

Best regards,
Tomasz

> -Daniel
>
> >
> > Best regards,
> > Tomasz
> >
> > > diff --git a/drivers/media/common/videobuf2/videobuf2-memops.c 
> > > b/drivers/media/common/videobuf2/videobuf2-memops.c
> > > index 6e9e05153f4e..9dd6c27162f4 100644
> > > --- a/drivers/media/common/videobuf2/videobuf2-memops.c
> > > +++ b/drivers/media/common/videobuf2/videobuf2-memops.c
> > > @@ -40,7 +40,6 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> > > start,
> > > unsigned long first, last;
> > > unsigned long nr;
> > > struct frame_vector *vec;
> > > -   unsigned int flags = FOLL_FORCE | FOLL_WRITE;
> > >
> > > first = start >> PAGE_SHIFT;
> > > last = (start + length - 1) >> PAGE_SHIFT;
> > > @@ -48,7 +47,7 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> > > start,
> > > vec = frame_vector_create(nr);
> > > if (!vec)
> > > return ERR_PTR(-ENOMEM);
> > > -   ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
> > > +   ret = get_vaddr_frames(start & PAGE_MASK, nr, vec);
> > > if (ret < 0)
> > > goto out_destroy;
> > > /* We accept only complete set of PFNs */
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index ef360fe70aaf..d6b8e30dce2e 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -1765,7 +1765,7 @@ struct frame_vector {
> > >  struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > >  void frame_vector_destroy(struct frame_vector *vec);
> > >  int get_vaddr_frames(unsigned long start, unsi

Re: [PATCH v5 05/15] mm/frame-vector: Use FOLL_LONGTERM

2020-10-30 Thread Tomasz Figa
On Fri, Oct 30, 2020 at 11:08 AM Daniel Vetter  wrote:
>
> This is used by media/videbuf2 for persistent dma mappings, not just
> for a single dma operation and then freed again, so needs
> FOLL_LONGTERM.
>
> Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> locking issues. Rework the code to pull the pup path out from the
> mmap_sem critical section as suggested by Jason.
>
> By relying entirely on the vma checks in pin_user_pages and follow_pfn
> (for vm_flags and vma_is_fsdax) we can also streamline the code a lot.
>
> Signed-off-by: Daniel Vetter 
> Cc: Jason Gunthorpe 
> Cc: Pawel Osciak 
> Cc: Marek Szyprowski 
> Cc: Kyungmin Park 
> Cc: Tomasz Figa 
> Cc: Mauro Carvalho Chehab 
> Cc: Andrew Morton 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Jan Kara 
> Cc: Dan Williams 
> Cc: linux...@kvack.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-samsung-...@vger.kernel.org
> Cc: linux-me...@vger.kernel.org
> Signed-off-by: Daniel Vetter 
> --
> v2: Streamline the code and further simplify the loop checks (Jason)
>
> v5: Review from Tomasz:
> - fix page counting for the follow_pfn case by resetting ret
> - drop gup_flags paramater, now unused
> ---
>  .../media/common/videobuf2/videobuf2-memops.c |  3 +-
>  include/linux/mm.h|  2 +-
>  mm/frame_vector.c | 53 ++++++-
>  3 files changed, 19 insertions(+), 39 deletions(-)
>

Thanks, looks good to me now.

Acked-by: Tomasz Figa 

From reading the code, this is quite unlikely to introduce any
behavior changes, but just to be safe, did you have a chance to test
this with some V4L2 driver?

Best regards,
Tomasz

> diff --git a/drivers/media/common/videobuf2/videobuf2-memops.c 
> b/drivers/media/common/videobuf2/videobuf2-memops.c
> index 6e9e05153f4e..9dd6c27162f4 100644
> --- a/drivers/media/common/videobuf2/videobuf2-memops.c
> +++ b/drivers/media/common/videobuf2/videobuf2-memops.c
> @@ -40,7 +40,6 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> start,
> unsigned long first, last;
> unsigned long nr;
> struct frame_vector *vec;
> -   unsigned int flags = FOLL_FORCE | FOLL_WRITE;
>
> first = start >> PAGE_SHIFT;
> last = (start + length - 1) >> PAGE_SHIFT;
> @@ -48,7 +47,7 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> start,
> vec = frame_vector_create(nr);
> if (!vec)
> return ERR_PTR(-ENOMEM);
> -   ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
> +   ret = get_vaddr_frames(start & PAGE_MASK, nr, vec);
> if (ret < 0)
> goto out_destroy;
> /* We accept only complete set of PFNs */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ef360fe70aaf..d6b8e30dce2e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1765,7 +1765,7 @@ struct frame_vector {
>  struct frame_vector *frame_vector_create(unsigned int nr_frames);
>  void frame_vector_destroy(struct frame_vector *vec);
>  int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -unsigned int gup_flags, struct frame_vector *vec);
> +struct frame_vector *vec);
>  void put_vaddr_frames(struct frame_vector *vec);
>  int frame_vector_to_pages(struct frame_vector *vec);
>  void frame_vector_to_pfns(struct frame_vector *vec);
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index 10f82d5643b6..f8c34b895c76 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -32,13 +32,12 @@
>   * This function takes care of grabbing mmap_lock as necessary.
>   */
>  int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> -unsigned int gup_flags, struct frame_vector *vec)
> +struct frame_vector *vec)
>  {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma;
> int ret = 0;
> int err;
> -   int locked;
>
> if (nr_frames == 0)
> return 0;
> @@ -48,40 +47,26 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
>
> start = untagged_addr(start);
>
> -   mmap_read_lock(mm);
> -   locked = 1;
> -   vma = find_vma_intersection(mm, start, start + 1);
> -   if (!vma) {
> -   ret = -EFAULT;
> -   goto out;
> -   }
> -
> -   /*
> -* While get_vaddr_frames() could be used for transient (kernel
> -* controlled lifetime) pinning of memory pages all current
> -* users establish long term (userspace c

Re: [PATCH v4 05/15] mm/frame-vector: Use FOLL_LONGTERM

2020-10-26 Thread Tomasz Figa
Hi Daniel,

On Mon, Oct 26, 2020 at 11:58:08AM +0100, Daniel Vetter wrote:
> This is used by media/videbuf2 for persistent dma mappings, not just
> for a single dma operation and then freed again, so needs
> FOLL_LONGTERM.
> 
> Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> locking issues. Rework the code to pull the pup path out from the
> mmap_sem critical section as suggested by Jason.
> 
> By relying entirely on the vma checks in pin_user_pages and follow_pfn
> (for vm_flags and vma_is_fsdax) we can also streamline the code a lot.
> 
> Signed-off-by: Daniel Vetter 
> Cc: Jason Gunthorpe 
> Cc: Pawel Osciak 
> Cc: Marek Szyprowski 
> Cc: Kyungmin Park 
> Cc: Tomasz Figa 
> Cc: Mauro Carvalho Chehab 
> Cc: Andrew Morton 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Jan Kara 
> Cc: Dan Williams 
> Cc: linux...@kvack.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-samsung-...@vger.kernel.org
> Cc: linux-me...@vger.kernel.org
> Signed-off-by: Daniel Vetter 
> --
> v2: Streamline the code and further simplify the loop checks (Jason)
> ---
>  mm/frame_vector.c | 50 ++-
>  1 file changed, 15 insertions(+), 35 deletions(-)
> 

Thank you for the patch. Please see my comments inline.

> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index 10f82d5643b6..d44779e56313 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
>   struct vm_area_struct *vma;
>   int ret = 0;
>   int err;
> - int locked;
>  
>   if (nr_frames == 0)
>   return 0;
> @@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
>  
>   start = untagged_addr(start);
>  
> - mmap_read_lock(mm);
> - locked = 1;
> - vma = find_vma_intersection(mm, start, start + 1);
> - if (!vma) {
> - ret = -EFAULT;
> - goto out;
> - }
> -
> - /*
> -  * While get_vaddr_frames() could be used for transient (kernel
> -  * controlled lifetime) pinning of memory pages all current
> -  * users establish long term (userspace controlled lifetime)
> -  * page pinning. Treat get_vaddr_frames() like
> -  * get_user_pages_longterm() and disallow it for filesystem-dax
> -  * mappings.
> -  */
> - if (vma_is_fsdax(vma)) {
> - ret = -EOPNOTSUPP;
> - goto out;
> - }
> -
> - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> + ret = pin_user_pages_fast(start, nr_frames,
> +   FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> +   (struct page **)(vec->ptrs));
> + if (ret > 0) {
>   vec->got_ref = true;
>   vec->is_pfns = false;
> - ret = pin_user_pages_locked(start, nr_frames,
> - gup_flags, (struct page **)(vec->ptrs), &locked);

Should we drop the gup_flags argument, since it's ignored now?

> - goto out;
> + goto out_unlocked;
>   }
>  

Should we initialize ret with 0 here, since pin_user_pages_fast() can
return a negative error code, but below we use it as a counter for the
looked up frames?

Best regards,
Tomasz

> + mmap_read_lock(mm);
>   vec->got_ref = false;
>   vec->is_pfns = true;
>   do {
>   unsigned long *nums = frame_vector_pfns(vec);
>  
> + vma = find_vma_intersection(mm, start, start + 1);
> + if (!vma)
> + break;
> +
>   while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
>   err = follow_pfn(vma, start, &nums[ret]);
>   if (err) {
> @@ -92,17 +76,13 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
>   start += PAGE_SIZE;
>   ret++;
>   }
> - /*
> -  * We stop if we have enough pages or if VMA doesn't completely
> -  * cover the tail page.
> -  */
> - if (ret >= nr_frames || start < vma->vm_end)
> + /* Bail out if VMA doesn't completely cover the tail page. */
> + if (start < vma->vm_end)
>   break;
> - vma = find_vma_intersection(mm, start, start + 1);
> - } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> + } while (ret < nr_frames);
>  out:
> - if (locked)
> - mmap_read_unlock(mm);
> + mmap_read_unlock(mm);
> +out_unlocked:
>   if (!ret)
>   ret = -EFAULT;
>   if (ret > 0)
> -- 
> 2.28.0
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 06/15] media: videobuf2: Move frame_vector into media subsystem

2020-10-26 Thread Tomasz Figa
On Mon, Oct 26, 2020 at 11:58:09AM +0100, Daniel Vetter wrote:
> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Reviewed-by: John Hubbard 
> Acked-by: Mauro Carvalho Chehab 
> Signed-off-by: Daniel Vetter 
> Cc: Jason Gunthorpe 
> Cc: Pawel Osciak 
> Cc: Marek Szyprowski 
> Cc: Kyungmin Park 
> Cc: Tomasz Figa 
> Cc: Mauro Carvalho Chehab 
> Cc: Andrew Morton 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Jan Kara 
> Cc: Dan Williams 
> Cc: linux...@kvack.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-samsung-...@vger.kernel.org
> Cc: linux-me...@vger.kernel.org
> Cc: Daniel Vetter 
> Signed-off-by: Daniel Vetter 
> --
> v3:
> - Create a new frame_vector.h header for this (Mauro)
> ---
>  drivers/media/common/videobuf2/Kconfig|  1 -
>  drivers/media/common/videobuf2/Makefile   |  1 +
>  .../media/common/videobuf2}/frame_vector.c|  2 +
>  drivers/media/platform/omap/Kconfig   |  1 -
>  include/linux/mm.h| 42 -
>  include/media/frame_vector.h  | 47 +++
>  include/media/videobuf2-core.h|  1 +
>  mm/Kconfig|  3 --
>  mm/Makefile   |  1 -
>  9 files changed, 51 insertions(+), 48 deletions(-)
>  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
>  create mode 100644 include/media/frame_vector.h
> 

Acked-by: Tomasz Figa 

Best regards,
Tomasz

> diff --git a/drivers/media/common/videobuf2/Kconfig 
> b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>  
>  config VIDEOBUF2_MEMOPS
>   tristate
> - select FRAME_VECTOR
>  
>  config VIDEOBUF2_DMA_CONTIG
>   tristate
> diff --git a/drivers/media/common/videobuf2/Makefile 
> b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>  
>  ifeq ($(CONFIG_TRACEPOINTS),y)
>videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index d44779e56313..6590987c14bd 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  /**
>   * get_vaddr_frames() - map virtual addresses to pfns
>   * @start:   starting user address
> diff --git a/drivers/media/platform/omap/Kconfig 
> b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>   depends on VIDEO_V4L2
>   select VIDEOBUF2_DMA_CONTIG
>   select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> - select FRAME_VECTOR
>   help
> V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned 
> long pages, bool inc);
>  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>   struct task_struct *task, bool bypass_rlim);
>  
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> - unsigned int nr_allocated;  /* Number of frames we have space for */
> - unsigned int nr_frames; /* Number of frames stored in ptrs array */
> - bool got_ref;   /* Did we pin pages by getting page ref? */
> - bool is_pfns;   /* Does array contain pages or pfns? */
> - void *ptrs[];   /* Array of pinned pfns / pages. Use
> -  * pfns_vector_pages() or pfns_vector_pfns()
> -  * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, u

Re: [PATCH v4 09/15] media/videbuf1|2: Mark follow_pfn usage as unsafe

2020-10-26 Thread Tomasz Figa
Hi Daniel,

On Mon, Oct 26, 2020 at 11:58:12AM +0100, Daniel Vetter wrote:
> The media model assumes that buffers are all preallocated, so that
> when a media pipeline is running we never miss a deadline because the
> buffers aren't allocated or available.
> 
> This means we cannot fix the v4l follow_pfn usage through
> mmu_notifier, without breaking how this all works. The only real fix
> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> tell everyone to cut over to dma-buf memory sharing for zerocopy.
> 
> userptr for normal memory will keep working as-is, this only affects
> the zerocopy userptr usage enabled in 50ac952d2263 ("[media]
> videobuf2-dma-sg: Support io userptr operations on io memory").

Note that this is true only for the videobuf2 change. The videobuf1 code
was like this all the time and does not support normal memory in the
dma_contig variant (because normal memory is rarely physically contiguous).

If my understanding is correct that the CONFIG_STRICT_FOLLOW_PFN is not
enabled by default, we stay backwards compatible, with only whoever
decides to turn it on risking a breakage.

I agree that this is a good first step towards deprecating this legacy
code, so:

Acked-by: Tomasz Figa 

Of course the last word goes to Mauro. :)

Best regards,
Tomasz

> 
> Signed-off-by: Daniel Vetter 
> Cc: Jason Gunthorpe 
> Cc: Kees Cook 
> Cc: Dan Williams 
> Cc: Andrew Morton 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Jan Kara 
> Cc: Dan Williams 
> Cc: linux...@kvack.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-samsung-...@vger.kernel.org
> Cc: linux-me...@vger.kernel.org
> Cc: Pawel Osciak 
> Cc: Marek Szyprowski 
> Cc: Kyungmin Park 
> Cc: Tomasz Figa 
> Cc: Laurent Dufour 
> Cc: Vlastimil Babka 
> Cc: Daniel Jordan 
> Cc: Michel Lespinasse 
> Signed-off-by: Daniel Vetter 
> --
> v3:
> - Reference the commit that enabled the zerocopy userptr use case to
>   make it abundandtly clear that this patch only affects that, and not
>   normal memory userptr. The old commit message already explained that
>   normal memory userptr is unaffected, but I guess that was not clear
>   enough.
> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
> b/drivers/media/common/videobuf2/frame_vector.c
> index 6590987c14bd..e630494da65c 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames,
>   break;
>  
>   while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> - err = follow_pfn(vma, start, &nums[ret]);
> + err = unsafe_follow_pfn(vma, start, &nums[ret]);
>   if (err) {
>   if (ret == 0)
>   ret = err;
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c 
> b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index 52312ce2ba05..821c4a76ab96 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct 
> videobuf_dma_contig_memory *mem,
>   user_address = untagged_baddr;
>  
>   while (pages_done < (mem->size >> PAGE_SHIFT)) {
> - ret = follow_pfn(vma, user_address, &this_pfn);
> + ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
>   if (ret)
>   break;
>  
> -- 
> 2.28.0
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn

2020-10-10 Thread Tomasz Figa
Hi Mauro,

On Fri, Oct 9, 2020 at 2:37 PM Mauro Carvalho Chehab
 wrote:
>
> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe  escreveu:
>
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi,
> > >
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter  escreveu:
> > >
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > >
> > > NACK, as this breaks an existing userspace API on media.
> >
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> >
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.
>
> Well, if distros disable it, then apps will break.
>

Do we have any information on userspace that actually needs this functionality?

Note that we're _not_ talking here about the complete USERPTR
functionality, but rather just the very corner case of carveout memory
not backed by struct pages.

Given that the current in-tree ways of reserving carveout memory, such
as shared-dma-pool, actually give memory backed by struct pages, do we
even have a source of such legacy memory in the kernel today?

I think that given that this is a very niche functionality, we could
have it disabled by default for security reasons and if someone
_really_ (i.e. there is no replacement) needs it, they probably need
to use a custom kernel build anyway for their exotic hardware setup
(with PFN-backed carveout memory), so they can enable it.

> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no,
> > > as it may break usersapace.
> >
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.
>
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> >
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> >
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.
>
>
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
>
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
>
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

Note that USERPTR as a whole generally has been considered deprecated
in V4L2 for many years and people have been actively discouraged to
use it. And, still, we're just talking here about the very rare corner
case, not the whole USERPTR API.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn

2020-10-10 Thread Tomasz Figa
Hi Daniel,

On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter  wrote:
>
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe  wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
>
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.
>
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
>
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l

I assume you mean V4L2 and not the obsolete V4L that is emulated in
the userspace by libv4l. If so, every video device that uses videobuf2
gets DMA-buf export for free and there is nothing needed to enable it.

We probably still have a few legacy drivers using videobuf (non-2),
but IMHO those should be safe to put behind some disabled-by-default
Kconfig symbol or even completely drop, as the legacy framework has
been deprecated for many years already.

> - roll this out everywhere we still need it.
>
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
>
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
>
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

I think we need to phase out such userspace indeed. The Kconfig symbol
allows enabling the unsafe functionality for anyone who still needs
it, so I think it's not entirely a breakage.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Tomasz Figa
On Wed, Oct 7, 2020 at 4:23 PM Daniel Vetter  wrote:
>
> On Wed, Oct 7, 2020 at 4:12 PM Tomasz Figa  wrote:
> >
> > On Wed, Oct 7, 2020 at 4:09 PM Daniel Vetter  wrote:
> > >
> > > On Wed, Oct 7, 2020 at 3:34 PM Tomasz Figa  wrote:
> > > >
> > > > On Wed, Oct 7, 2020 at 3:06 PM Jason Gunthorpe  wrote:
> > > > >
> > > > > On Wed, Oct 07, 2020 at 02:58:33PM +0200, Daniel Vetter wrote:
> > > > > > On Wed, Oct 7, 2020 at 2:48 PM Tomasz Figa  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Oct 7, 2020 at 2:44 PM Jason Gunthorpe  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 07, 2020 at 02:33:56PM +0200, Marek Szyprowski 
> > > > > > > > wrote:
> > > > > > > > > Well, it was in vb2_get_vma() function, but now I see that it 
> > > > > > > > > has been
> > > > > > > > > lost in fb639eb39154 and 6690c8c78c74 some time ago...
> > > > > > > >
> > > > > > > > There is no guarentee that holding a get on the file says 
> > > > > > > > anthing
> > > > > > > > about the VMA. This needed to check that the file was some 
> > > > > > > > special
> > > > > > > > kind of file that promised the VMA layout and file lifetime are
> > > > > > > > connected.
> > > > > > > >
> > > > > > > > Also, cloning a VMA outside the mm world is just really bad. 
> > > > > > > > That
> > > > > > > > would screw up many assumptions the drivers make.
> > > > > > > >
> > > > > > > > If it is all obsolete I say we hide it behind a default n config
> > > > > > > > symbol and taint the kernel if anything uses it.
> > > > > > > >
> > > > > > > > Add a big comment above the follow_pfn to warn others away from 
> > > > > > > > this
> > > > > > > > code.
> > > > > > >
> > > > > > > Sadly it's just verbally declared as deprecated and not formally 
> > > > > > > noted
> > > > > > > anyway. There are a lot of userspace applications relying on user
> > > > > > > pointer support.
> > > > > >
> > > > > > userptr can stay, it's the userptr abuse for zerocpy buffer sharing
> > > > > > which doesn't work anymore. At least without major surgery (you'd 
> > > > > > need
> > > > > > an mmu notifier to zap mappings and recreate them, and that pretty
> > > > > > much breaks the v4l model of preallocating all buffers to make sure 
> > > > > > we
> > > > > > never underflow the buffer queue). And static mappings are not 
> > > > > > coming
> > > > > > back I think, we'll go ever more into the direction of dynamic
> > > > > > mappings and moving stuff around as needed.
> > > > >
> > > > > Right, and to be clear, the last time I saw a security flaw of this
> > > > > magnitude from a subsystem badly mis-designing itself, Linus's
> > > > > knee-jerk reaction was to propose to remove the whole subsystem.
> > > > >
> > > > > Please don't take status-quo as acceptable, V4L community has to work
> > > > > to resolve this, uABI breakage or not. The follow_pfn related code
> > > > > must be compiled out of normal distro kernel builds.
> > > >
> > > > I think the userptr zero-copy hack should be able to go away indeed,
> > > > given that we now have CMA that allows having carveouts backed by
> > > > struct pages and having the memory represented as DMA-buf normally.
> > >
> > > Not sure whether there's a confusion here: dma-buf supports memory not
> > > backed by struct page.
> > >
> >
> > That's new to me. The whole API relies on sg_tables a lot, which in
> > turn rely on struct page pointers to describe the physical memory.
>
> You're not allowed to look at struct page pointers from the importer
> side, those might not be there. Which isn't the prettiest thing, but
> it works. And even if there's a struct

Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Tomasz Figa
On Wed, Oct 7, 2020 at 4:09 PM Daniel Vetter  wrote:
>
> On Wed, Oct 7, 2020 at 3:34 PM Tomasz Figa  wrote:
> >
> > On Wed, Oct 7, 2020 at 3:06 PM Jason Gunthorpe  wrote:
> > >
> > > On Wed, Oct 07, 2020 at 02:58:33PM +0200, Daniel Vetter wrote:
> > > > On Wed, Oct 7, 2020 at 2:48 PM Tomasz Figa  wrote:
> > > > >
> > > > > On Wed, Oct 7, 2020 at 2:44 PM Jason Gunthorpe  wrote:
> > > > > >
> > > > > > On Wed, Oct 07, 2020 at 02:33:56PM +0200, Marek Szyprowski wrote:
> > > > > > > Well, it was in vb2_get_vma() function, but now I see that it has 
> > > > > > > been
> > > > > > > lost in fb639eb39154 and 6690c8c78c74 some time ago...
> > > > > >
> > > > > > There is no guarentee that holding a get on the file says anthing
> > > > > > about the VMA. This needed to check that the file was some special
> > > > > > kind of file that promised the VMA layout and file lifetime are
> > > > > > connected.
> > > > > >
> > > > > > Also, cloning a VMA outside the mm world is just really bad. That
> > > > > > would screw up many assumptions the drivers make.
> > > > > >
> > > > > > If it is all obsolete I say we hide it behind a default n config
> > > > > > symbol and taint the kernel if anything uses it.
> > > > > >
> > > > > > Add a big comment above the follow_pfn to warn others away from this
> > > > > > code.
> > > > >
> > > > > Sadly it's just verbally declared as deprecated and not formally noted
> > > > > anyway. There are a lot of userspace applications relying on user
> > > > > pointer support.
> > > >
> > > > userptr can stay, it's the userptr abuse for zerocpy buffer sharing
> > > > which doesn't work anymore. At least without major surgery (you'd need
> > > > an mmu notifier to zap mappings and recreate them, and that pretty
> > > > much breaks the v4l model of preallocating all buffers to make sure we
> > > > never underflow the buffer queue). And static mappings are not coming
> > > > back I think, we'll go ever more into the direction of dynamic
> > > > mappings and moving stuff around as needed.
> > >
> > > Right, and to be clear, the last time I saw a security flaw of this
> > > magnitude from a subsystem badly mis-designing itself, Linus's
> > > knee-jerk reaction was to propose to remove the whole subsystem.
> > >
> > > Please don't take status-quo as acceptable, V4L community has to work
> > > to resolve this, uABI breakage or not. The follow_pfn related code
> > > must be compiled out of normal distro kernel builds.
> >
> > I think the userptr zero-copy hack should be able to go away indeed,
> > given that we now have CMA that allows having carveouts backed by
> > struct pages and having the memory represented as DMA-buf normally.
>
> Not sure whether there's a confusion here: dma-buf supports memory not
> backed by struct page.
>

That's new to me. The whole API relies on sg_tables a lot, which in
turn rely on struct page pointers to describe the physical memory.

> > How about the regular userptr use case, though?
> >
> > The existing code resolves the user pointer into pages by following
> > the get_vaddr_frames() -> frame_vector_to_pages() ->
> > sg_alloc_table_from_pages() / vm_map_ram() approach.
> > get_vaddr_frames() seems to use pin_user_pages() behind the scenes if
> > the vma is not an IO or a PFNMAP, falling back to follow_pfn()
> > otherwise.
>
> Yeah pin_user_pages is fine, it's just the VM_IO | VM_PFNMAP vma that
> don't work.

Ack.

> >
> > Is your intention to drop get_vaddr_frames() or we could still keep
> > using it and if vec->is_pfns is true:
> > a) if CONFIG_VIDEO_LEGACY_PFN_USERPTR is set, taint the kernel
> > b) otherwise just undo and fail?
>
> I'm typing that patch series (plus a pile more) right now.

Cool, thanks!

We also need to bring back the vma_open() that somehow disappeared
around 4.2, as Marek found.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Tomasz Figa
On Wed, Oct 7, 2020 at 3:06 PM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 02:58:33PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 2:48 PM Tomasz Figa  wrote:
> > >
> > > On Wed, Oct 7, 2020 at 2:44 PM Jason Gunthorpe  wrote:
> > > >
> > > > On Wed, Oct 07, 2020 at 02:33:56PM +0200, Marek Szyprowski wrote:
> > > > > Well, it was in vb2_get_vma() function, but now I see that it has been
> > > > > lost in fb639eb39154 and 6690c8c78c74 some time ago...
> > > >
> > > > There is no guarentee that holding a get on the file says anthing
> > > > about the VMA. This needed to check that the file was some special
> > > > kind of file that promised the VMA layout and file lifetime are
> > > > connected.
> > > >
> > > > Also, cloning a VMA outside the mm world is just really bad. That
> > > > would screw up many assumptions the drivers make.
> > > >
> > > > If it is all obsolete I say we hide it behind a default n config
> > > > symbol and taint the kernel if anything uses it.
> > > >
> > > > Add a big comment above the follow_pfn to warn others away from this
> > > > code.
> > >
> > > Sadly it's just verbally declared as deprecated and not formally noted
> > > anyway. There are a lot of userspace applications relying on user
> > > pointer support.
> >
> > userptr can stay, it's the userptr abuse for zerocpy buffer sharing
> > which doesn't work anymore. At least without major surgery (you'd need
> > an mmu notifier to zap mappings and recreate them, and that pretty
> > much breaks the v4l model of preallocating all buffers to make sure we
> > never underflow the buffer queue). And static mappings are not coming
> > back I think, we'll go ever more into the direction of dynamic
> > mappings and moving stuff around as needed.
>
> Right, and to be clear, the last time I saw a security flaw of this
> magnitude from a subsystem badly mis-designing itself, Linus's
> knee-jerk reaction was to propose to remove the whole subsystem.
>
> Please don't take status-quo as acceptable, V4L community has to work
> to resolve this, uABI breakage or not. The follow_pfn related code
> must be compiled out of normal distro kernel builds.

I think the userptr zero-copy hack should be able to go away indeed,
given that we now have CMA that allows having carveouts backed by
struct pages and having the memory represented as DMA-buf normally.

How about the regular userptr use case, though?

The existing code resolves the user pointer into pages by following
the get_vaddr_frames() -> frame_vector_to_pages() ->
sg_alloc_table_from_pages() / vm_map_ram() approach.
get_vaddr_frames() seems to use pin_user_pages() behind the scenes if
the vma is not an IO or a PFNMAP, falling back to follow_pfn()
otherwise.

Is your intention to drop get_vaddr_frames() or we could still keep
using it and if vec->is_pfns is true:
a) if CONFIG_VIDEO_LEGACY_PFN_USERPTR is set, taint the kernel
b) otherwise just undo and fail?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Tomasz Figa
On Sat, Oct 3, 2020 at 1:31 AM Jason Gunthorpe  wrote:
>
> On Fri, Oct 02, 2020 at 08:16:48PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe  wrote:
> > > On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote:
> > > > For $reasons I've stumbled over this code and I'm not sure the change
> > > > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector:
> > > > convert get_user_pages() --> pin_user_pages()") was entirely correct.
> > > >
> > > > This here is used for long term buffers (not just quick I/O) like
> > > > RDMA, and John notes this in his patch. But I thought the rule for
> > > > these is that they need to add FOLL_LONGTERM, which John's patch
> > > > didn't do.
> > > >
> > > > There is already a dax specific check (added in b7f0554a56f2 ("mm:
> > > > fail get_vaddr_frames() for filesystem-dax mappings")), so this seems
> > > > like the prudent thing to do.
> > > >
> > > > Signed-off-by: Daniel Vetter 
> > > > Cc: Andrew Morton 
> > > > Cc: John Hubbard 
> > > > Cc: Jérôme Glisse 
> > > > Cc: Jan Kara 
> > > > Cc: Dan Williams 
> > > > Cc: linux...@kvack.org
> > > > Cc: linux-arm-ker...@lists.infradead.org
> > > > Cc: linux-samsung-...@vger.kernel.org
> > > > Cc: linux-me...@vger.kernel.org
> > > > Hi all,
> > > >
> > > > I stumbled over this and figured typing this patch can't hurt. Really
> > > > just to maybe learn a few things about how gup/pup is supposed to be
> > > > used (we have a bit of that in drivers/gpu), this here isn't really
> > > > ralated to anything I'm doing.
> > >
> > > FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO
> >
> > Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib
> > verb access mode indicates possible writes. I'm not really clear on
> > why FOLL_WRITE isn't enough any why you need to be able to write
> > through a vma that's write protected currently.
>
> Ah, FOLL_FORCE | FOLL_WRITE means *read* confusingly enough, and the
> only reason you'd want this version for read is if you are doing
> longterm stuff. I wrote about this recently:
>
> https://lore.kernel.org/linux-mm/20200928235739.gu9...@ziepe.ca/
>
> > > Since every driver does this wrong anything that uses this is creating
> > > terrifying security issues.
> > >
> > > IMHO this whole API should be deleted :(
> >
> > Yeah that part I just tried to conveniently ignore. I guess this dates
> > back to a time when ioremaps where at best fixed, and there wasn't
> > anything like a gpu driver dynamically managing vram around, resulting
> > in random entirely unrelated things possibly being mapped to that set
> > of pfns.
>
> No, it was always wrong. Prior to GPU like cases the lifetime of the
> PTE was tied to the vma and when the vma becomes free the driver can
> move the things in the PTEs to 'free'. Easy to trigger use-after-free
> issues and devices like RDMA have security contexts attached to these
> PTEs so it becomes a serious security bug to do something like this.
>
> > The underlying follow_pfn is also used in other places within
> > drivers/media, so this doesn't seem to be an accident, but actually
> > intentional.
>
> Looking closely, there are very few users, most *seem* pointless, but
> maybe there is a crazy reason?
>
> The sequence
>   get_vaddr_frames();
>   frame_vector_to_pages();
>   sg_alloc_table_from_pages();
>
> Should be written
>   pin_user_pages_fast(FOLL_LONGTERM);
>   sg_alloc_table_from_pages()
>
> There is some 'special' code in frame_vector_to_pages() that tries to
> get a struct page for things from a VM_IO or VM_PFNMAP...
>
> Oh snap, that is *completely* broken! If the first VMA is IO|PFNMAP
> then get_vaddr_frames() iterates over all VMAs in the range, of any
> kind and extracts the PTEs then blindly references them! This means it
> can be used to use after free normal RAM struct pages!! Gah!
>
> Wow. Okay. That has to go.
>
> So, I *think* we can assume there is no sane cases where
> frame_vector_to_pages() succeeds but pin_user_pages() wasn't called.
>
> That means the users here:
>  - habanalabs:  Hey Oded can you fix this up?
>
>  - gpu/exynos: Daniel can you get someone there to stop using it?
>
>  - media/videobuf via vb2_dma_sg_get_userptr()
>
> Should all be switched to the standard pin_user_pages sequence
> above.
>
> That leaves the only interesting places as vb2_dc_get_userptr() and
> vb2_vmalloc_get_userptr() which both completely fail to follow the
> REQUIRED behavior in the function's comment about checking PTEs. It
> just DMA maps them. Badly broken.

Note that vb2_vmalloc is only used for in-kernel CPU usage, e.g. the
contents being copied by the driver between vb2 buffers and some
hardware FIFO or other dedicated buffers. The memory does not go to
any hardware DMA.

Could you elaborate on what "the REQUIRED behavior is"? I can see that
both follow the get_vaddr_frames() -> frame_vector_to_pages() flow, as
you mentioned. Perhaps the only change needed is switching to
pin_user_pa

Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Tomasz Figa
On Wed, Oct 7, 2020 at 2:44 PM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 02:33:56PM +0200, Marek Szyprowski wrote:
> > Well, it was in vb2_get_vma() function, but now I see that it has been
> > lost in fb639eb39154 and 6690c8c78c74 some time ago...
>
> There is no guarentee that holding a get on the file says anthing
> about the VMA. This needed to check that the file was some special
> kind of file that promised the VMA layout and file lifetime are
> connected.
>
> Also, cloning a VMA outside the mm world is just really bad. That
> would screw up many assumptions the drivers make.
>
> If it is all obsolete I say we hide it behind a default n config
> symbol and taint the kernel if anything uses it.
>
> Add a big comment above the follow_pfn to warn others away from this
> code.

Sadly it's just verbally declared as deprecated and not formally noted
anyway. There are a lot of userspace applications relying on user
pointer support.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/2] mm/frame-vec: Drop gup_flags from get_vaddr_frames()

2020-10-02 Thread Tomasz Figa
On Fri, Oct 2, 2020 at 7:53 PM Daniel Vetter  wrote:
>
> FOLL_WRITE | FOLL_FORCE is really the only reasonable thing to do for
> simple dma device that can't guarantee write protection. Which is also
> what all the callers are using.
>
> So just simplify this.
>
> Signed-off-by: Daniel Vetter 
> Cc: Inki Dae 
> Cc: Joonyoung Shim 
> Cc: Seung-Woo Kim 
> Cc: Kyungmin Park 
> Cc: Kukjin Kim 
> Cc: Krzysztof Kozlowski 
> Cc: Pawel Osciak 
> Cc: Marek Szyprowski 
> Cc: Tomasz Figa 
> Cc: Andrew Morton 
> Cc: Oded Gabbay 
> Cc: Omer Shpigelman 
> Cc: Tomer Tayar 
> Cc: Greg Kroah-Hartman 
> Cc: Pawel Piskorski 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-samsung-...@vger.kernel.org
> Cc: linux-me...@vger.kernel.org
> Cc: linux...@kvack.org
> ---
>  drivers/gpu/drm/exynos/exynos_drm_g2d.c   | 3 +--
>  drivers/media/common/videobuf2/videobuf2-memops.c | 3 +--
>  drivers/misc/habanalabs/common/memory.c   | 3 +--
>  include/linux/mm.h| 2 +-
>  mm/frame_vector.c | 4 ++--
>  5 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
> b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index 967a5cdc120e..ac452842bab3 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -480,8 +480,7 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
> g2d_data *g2d,
> goto err_free;
> }
>
> -   ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -   g2d_userptr->vec);
> +   ret = get_vaddr_frames(start, npages, g2d_userptr->vec);
> if (ret != npages) {
> DRM_DEV_ERROR(g2d->dev,
>   "failed to get user pages from userptr.\n");
> diff --git a/drivers/media/common/videobuf2/videobuf2-memops.c 
> b/drivers/media/common/videobuf2/videobuf2-memops.c
> index 6e9e05153f4e..9dd6c27162f4 100644
> --- a/drivers/media/common/videobuf2/videobuf2-memops.c
> +++ b/drivers/media/common/videobuf2/videobuf2-memops.c
> @@ -40,7 +40,6 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> start,
> unsigned long first, last;
> unsigned long nr;
> struct frame_vector *vec;
> -   unsigned int flags = FOLL_FORCE | FOLL_WRITE;
>
> first = start >> PAGE_SHIFT;
> last = (start + length - 1) >> PAGE_SHIFT;
> @@ -48,7 +47,7 @@ struct frame_vector *vb2_create_framevec(unsigned long 
> start,
> vec = frame_vector_create(nr);
> if (!vec)
> return ERR_PTR(-ENOMEM);
> -   ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
> +   ret = get_vaddr_frames(start & PAGE_MASK, nr, vec);
> if (ret < 0)
> goto out_destroy;
> /* We accept only complete set of PFNs */

For drivers/media/common/videobuf2/:

Acked-by: Tomasz Figa 

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 0/4] dma-buf: Flag vmap'ed memory as system or I/O memory

2020-09-25 Thread Tomasz Figa
Hi Thomas,

On Fri, Sep 25, 2020 at 01:55:57PM +0200, Thomas Zimmermann wrote:
> Dma-buf provides vmap() and vunmap() for retriving and releasing mappings
> of dma-buf memory in kernel address space. The functions operate with plain
> addresses and the assumption is that the memory can be accessed with load
> and store operations. This is not the case on some architectures (e.g.,
> sparc64) where I/O memory can only be accessed with dedicated instructions.
> 
> This patchset introduces struct dma_buf_map, which contains the address of
> a buffer and a flag that tells whether system- or I/O-memory instructions
> are required.
> 
> Some background: updating the DRM framebuffer console on sparc64 makes the
> kernel panic. This is because the framebuffer memory cannot be accessed with
> system-memory instructions. We currently employ a workaround in DRM to
> address this specific problem. [1]
> 
> To resolve the problem, we'd like to address it at the most common point,
> which is the dma-buf framework. The dma-buf mapping ideally knows if I/O
> instructions are required and exports this information to it's users. The
> new structure struct dma_buf_map stores the buffer address and a flag that
> signals I/O memory. Affected users of the buffer (e.g., drivers, frameworks)
> can then access the memory accordingly.
> 
> This patchset only introduces struct dma_buf_map, and updates struct dma_buf
> and it's interfaces. Further patches can update dma-buf users. For example,
> there's a prototype patchset for DRM that fixes the framebuffer problem. [2]
> 
> Further work: TTM, one of DRM's memory managers, already exports an
> is_iomem flag of its own. It could later be switched over to exporting struct
> dma_buf_map, thus simplifying some code. Several DRM drivers expect their
> fbdev console to operate on I/O memory. These could possibly be switched over
> to the generic fbdev emulation, as soon as the generic code uses struct
> dma_buf_map.
> 
> v3:
>   * update fastrpc driver (kernel test robot)
>   * expand documentation (Daniel)
>   * move documentation into separate patch
> v2:
>   * always clear map parameter in dma_buf_vmap() (Daniel)
>   * include dma-buf-heaps and i915 selftests (kernel test robot)
>   * initialize cma_obj before using it in drm_gem_cma_free_object()
> (kernel test robot)
> 
> [1] https://lore.kernel.org/dri-devel/20200725191012.ga434...@ravnborg.org/
> [2] 
> https://lore.kernel.org/dri-devel/20200806085239.4606-1-tzimmerm...@suse.de/
> 
> Thomas Zimmermann (4):
>   dma-buf: Add struct dma-buf-map for storing struct dma_buf.vaddr_ptr
>   dma-buf: Use struct dma_buf_map in dma_buf_vmap() interfaces
>   dma-buf: Use struct dma_buf_map in dma_buf_vunmap() interfaces
>   dma-buf: Document struct dma_buf_map
> 
>  Documentation/driver-api/dma-buf.rst  |   9 +
>  drivers/dma-buf/dma-buf.c |  42 ++--
>  drivers/dma-buf/heaps/heap-helpers.c  |  10 +-
>  drivers/gpu/drm/drm_gem_cma_helper.c  |  20 +-
>  drivers/gpu/drm/drm_gem_shmem_helper.c|  17 +-
>  drivers/gpu/drm/drm_prime.c   |  15 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c   |  13 +-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  13 +-
>  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  |  18 +-
>  .../gpu/drm/i915/gem/selftests/mock_dmabuf.c  |  14 +-
>  drivers/gpu/drm/tegra/gem.c   |  23 ++-
>  .../common/videobuf2/videobuf2-dma-contig.c   |  17 +-
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  19 +-
>  .../common/videobuf2/videobuf2-vmalloc.c  |  21 +-
>  drivers/misc/fastrpc.c|   6 +-
>  include/drm/drm_prime.h   |   5 +-
>  include/linux/dma-buf-map.h   | 193 ++++++
>  include/linux/dma-buf.h   |  11 +-
>  18 files changed, 372 insertions(+), 94 deletions(-)
>  create mode 100644 include/linux/dma-buf-map.h

For drivers/media/common/videobuf2 changes:

Acked-by: Tomasz Figa 

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-10 Thread Tomasz Figa
On Thu, Sep 10, 2020 at 11:17 AM Hans Verkuil  wrote:
>
> On 04/09/2020 15:17, Marek Szyprowski wrote:
> > Use recently introduced common wrappers operating directly on the struct
> > sg_table objects and scatterlist page iterators to make the code a bit
> > more compact, robust, easier to follow and copy/paste safe.
> >
> > No functional change, because the code already properly did all the
> > scatterlist related calls.
> >
> > Signed-off-by: Marek Szyprowski 
> > Reviewed-by: Robin Murphy 
>
> Acked-by: Hans Verkuil 
>
> Note that I agree with Marek to keep returning -EIO. If we want to propagate
> low-level errors, then that should be done in a separate patch. But I think 
> EIO
> is fine.

As I mentioned, there are 2 different cases here - UAPI and kAPI. I
agree that we should keep -EIO for UAPI, but kAPI is another story.
But if we're convinced that -EIO is also fine for the latter, I'm fine
with that.

Best regards,
Tomasz

>
> Regards,
>
> Hans
>
> > ---
> >  .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
> >  .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
> >  .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
> >  3 files changed, 31 insertions(+), 47 deletions(-)
> >
> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> > b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > index ec3446cc45b8..1b242d844dde 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > @@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
> > sg_table *sgt)
> >   unsigned int i;
> >   unsigned long size = 0;
> >
> > - for_each_sg(sgt->sgl, s, sgt->nents, i) {
> > + for_each_sgtable_dma_sg(sgt, s, i) {
> >   if (sg_dma_address(s) != expected)
> >   break;
> > - expected = sg_dma_address(s) + sg_dma_len(s);
> > + expected += sg_dma_len(s);
> >   size += sg_dma_len(s);
> >   }
> >   return size;
> > @@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
> >   if (!sgt)
> >   return;
> >
> > - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> > -buf->dma_dir);
> > + dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> >  }
> >
> >  static void vb2_dc_finish(void *buf_priv)
> > @@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
> >   if (!sgt)
> >   return;
> >
> > - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, 
> > buf->dma_dir);
> > + dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> >  }
> >
> >  /*/
> > @@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf 
> > *dbuf,
> >* memory locations do not require any explicit cache
> >* maintenance prior or after being used by the device.
> >*/
> > - dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > + dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> > +   DMA_ATTR_SKIP_CPU_SYNC);
> >   sg_free_table(sgt);
> >   kfree(attach);
> >   db_attach->priv = NULL;
> > @@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >
> >   /* release any previous cache */
> >   if (attach->dma_dir != DMA_NONE) {
> > - dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > + dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> > +   DMA_ATTR_SKIP_CPU_SYNC);
> >   attach->dma_dir = DMA_NONE;
> >   }
> >
> > @@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >* mapping to the client with new direction, no cache sync
> >* required see comment in vb2_dc_dmabuf_ops_detach()
> >*/
> > - sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, 
> > sgt->orig_nents,
> > -   dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > - if (!sgt->nents) {
> > + if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
> > + DMA_ATTR_SKIP_CPU_SYNC)) {
> >   pr_err("failed to map scatterlist\n");
> >   mutex_unlock(lock);
> >   return ERR_PTR(-EIO);
> > @@ -455,8 +453,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
> >* No need to sync to CPU, it's already synced to the CPU
> >* since the finish() memop will have been called before this.
> >*/
> > - dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> > -buf->dma_dir, DMA_ATTR_SKIP_CP

Re: [PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-07 Thread Tomasz Figa
On Mon, Sep 7, 2020 at 4:02 PM Marek Szyprowski
 wrote:
>
> Hi Tomasz,
>
> On 07.09.2020 15:07, Tomasz Figa wrote:
> > On Fri, Sep 4, 2020 at 3:35 PM Marek Szyprowski
> >  wrote:
> >> Use recently introduced common wrappers operating directly on the struct
> >> sg_table objects and scatterlist page iterators to make the code a bit
> >> more compact, robust, easier to follow and copy/paste safe.
> >>
> >> No functional change, because the code already properly did all the
> >> scatterlist related calls.
> >>
> >> Signed-off-by: Marek Szyprowski 
> >> Reviewed-by: Robin Murphy 
> >> ---
> >>   .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
> >>   .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
> >>   .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
> >>   3 files changed, 31 insertions(+), 47 deletions(-)
> >>
> > Thanks for the patch! Please see my comments inline.
> >
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> >> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> index ec3446cc45b8..1b242d844dde 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> @@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
> >> sg_table *sgt)
> >>  unsigned int i;
> >>  unsigned long size = 0;
> >>
> >> -   for_each_sg(sgt->sgl, s, sgt->nents, i) {
> >> +   for_each_sgtable_dma_sg(sgt, s, i) {
> >>  if (sg_dma_address(s) != expected)
> >>  break;
> >> -   expected = sg_dma_address(s) + sg_dma_len(s);
> >> +   expected += sg_dma_len(s);
> >>  size += sg_dma_len(s);
> >>  }
> >>  return size;
> >> @@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
> >>  if (!sgt)
> >>  return;
> >>
> >> -   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> >> -  buf->dma_dir);
> >> +   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> >>   }
> >>
> >>   static void vb2_dc_finish(void *buf_priv)
> >> @@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
> >>  if (!sgt)
> >>  return;
> >>
> >> -   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, 
> >> buf->dma_dir);
> >> +   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> >>   }
> >>
> >>   /*/
> >> @@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf 
> >> *dbuf,
> >>   * memory locations do not require any explicit cache
> >>   * maintenance prior or after being used by the device.
> >>   */
> >> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, 
> >> sgt->orig_nents,
> >> -  attach->dma_dir, 
> >> DMA_ATTR_SKIP_CPU_SYNC);
> >> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> >> + DMA_ATTR_SKIP_CPU_SYNC);
> >>  sg_free_table(sgt);
> >>  kfree(attach);
> >>  db_attach->priv = NULL;
> >> @@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >>
> >>  /* release any previous cache */
> >>  if (attach->dma_dir != DMA_NONE) {
> >> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, 
> >> sgt->orig_nents,
> >> -  attach->dma_dir, 
> >> DMA_ATTR_SKIP_CPU_SYNC);
> >> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> >> + DMA_ATTR_SKIP_CPU_SYNC);
> >>  attach->dma_dir = DMA_NONE;
> >>  }
> >>
> >> @@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >>   * mapping to the client with new direction, no cache sync
> >>   * required see comment in vb2_dc_dmabuf_ops_detach()
> >>   */
> >> -   sgt->nents = dma_map_sg_attrs(db_attac

Re: [PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-07 Thread Tomasz Figa
Hi Marek,

On Fri, Sep 4, 2020 at 3:35 PM Marek Szyprowski
 wrote:
>
> Use recently introduced common wrappers operating directly on the struct
> sg_table objects and scatterlist page iterators to make the code a bit
> more compact, robust, easier to follow and copy/paste safe.
>
> No functional change, because the code already properly did all the
> scatterlist related calls.
>
> Signed-off-by: Marek Szyprowski 
> Reviewed-by: Robin Murphy 
> ---
>  .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
>  .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
>  .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
>  3 files changed, 31 insertions(+), 47 deletions(-)
>

Thanks for the patch! Please see my comments inline.

> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index ec3446cc45b8..1b242d844dde 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
> sg_table *sgt)
> unsigned int i;
> unsigned long size = 0;
>
> -   for_each_sg(sgt->sgl, s, sgt->nents, i) {
> +   for_each_sgtable_dma_sg(sgt, s, i) {
> if (sg_dma_address(s) != expected)
> break;
> -   expected = sg_dma_address(s) + sg_dma_len(s);
> +   expected += sg_dma_len(s);
> size += sg_dma_len(s);
> }
> return size;
> @@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
> if (!sgt)
> return;
>
> -   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> -  buf->dma_dir);
> +   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
>  }
>
>  static void vb2_dc_finish(void *buf_priv)
> @@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
> if (!sgt)
> return;
>
> -   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, 
> buf->dma_dir);
> +   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
>  }
>
>  /*/
> @@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
>  * memory locations do not require any explicit cache
>  * maintenance prior or after being used by the device.
>  */
> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> sg_free_table(sgt);
> kfree(attach);
> db_attach->priv = NULL;
> @@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>
> /* release any previous cache */
> if (attach->dma_dir != DMA_NONE) {
> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> attach->dma_dir = DMA_NONE;
> }
>
> @@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>  * mapping to the client with new direction, no cache sync
>  * required see comment in vb2_dc_dmabuf_ops_detach()
>  */
> -   sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, 
> sgt->orig_nents,
> - dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> -   if (!sgt->nents) {
> +   if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
> +   DMA_ATTR_SKIP_CPU_SYNC)) {
> pr_err("failed to map scatterlist\n");
> mutex_unlock(lock);
> return ERR_PTR(-EIO);

As opposed to dma_map_sg_attrs(), dma_map_sgtable() now returns an
error code on its own. Is it expected to ignore it and return -EIO?

> @@ -455,8 +453,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
>  * No need to sync to CPU, it's already synced to the CPU
>  * since the finish() memop will have been called before this.
>  */
> -   dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> -  buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +   dma_unmap_sgtable(buf->dev, sgt, buf->dma_dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> pages = frame_vector_pages(buf->vec);
> /* sgt should exist only if vector contains pages... */
> BUG_ON(IS_ERR(pages));
> @@ -553,9 +551,8 @@ static void *vb2_dc_get_userptr(struct device *dev, 
> unsigned long vaddr,
>   

Re: [PATCH v3] drm/mediatek: check plane visibility in atomic_update

2020-06-22 Thread Tomasz Figa
On Mon, Jun 22, 2020 at 11:31:06PM +0800, Hsin-Yi Wang wrote:
> Disable the plane if it's not visible. Otherwise mtk_ovl_layer_config()
> would proceed with invalid plane and we may see vblank timeout.
> 
> Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
> Signed-off-by: Hsin-Yi Wang 
> Reviewed-by: Chun-Kuang Hu 
> Change-Id: Id5341d60ddfffc88a38d9db0caa089b2d6a1d29c
> ---
> v3: Address comment
> v2: Add fixes tag
> ---
>  drivers/gpu/drm/mediatek/mtk_drm_plane.c | 25 ++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 

+/- the Change-Id pointed out by Matthias:

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2] drm/mediatek: check plane visibility in atomic_update

2020-06-22 Thread Tomasz Figa
Hi Hsin-Yi,

On Mon, Jun 22, 2020 at 11:01:09PM +0800, Hsin-Yi Wang wrote:
> Disable the plane if it's not visible. Otherwise mtk_ovl_layer_config()
> would proceed with invalid plane and we may see vblank timeout.
> 
> Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
> Signed-off-by: Hsin-Yi Wang 
> Reviewed-by: Chun-Kuang Hu 
> ---
> v2: Add fixes tag
> ---
>  drivers/gpu/drm/mediatek/mtk_drm_plane.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 

Thank you for the patch. Please see my comments inline.

> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_plane.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_plane.c
> index c2bd683a87c8..74dc71c7f3b6 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_plane.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_plane.c
> @@ -164,6 +164,16 @@ static int mtk_plane_atomic_check(struct drm_plane 
> *plane,
>  true, true);
>  }
>  
> +static void mtk_plane_atomic_disable(struct drm_plane *plane,
> +  struct drm_plane_state *old_state)
> +{
> + struct mtk_plane_state *state = to_mtk_plane_state(plane->state);
> +
> + state->pending.enable = false;
> + wmb(); /* Make sure the above parameter is set before update */
> + state->pending.dirty = true;
> +}
> +
>  static void mtk_plane_atomic_update(struct drm_plane *plane,
>   struct drm_plane_state *old_state)
>  {
> @@ -178,6 +188,9 @@ static void mtk_plane_atomic_update(struct drm_plane 
> *plane,
>   if (!crtc || WARN_ON(!fb))
>   return;
>  
> + if (!plane->state->visible)
> + return mtk_plane_atomic_disable(plane, old_state);

nit: Both this function and mtk_plane_atomic_disable() have the void return
type. Perhaps we should rather move the return, without a value, to a
separate statement?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [libcamera-devel] [PATCH v2] drm/fourcc: Add bayer formats and modifiers

2020-06-19 Thread Tomasz Figa
Hi Niklas,

On Fri, May 22, 2020 at 01:52:01AM +0200, Niklas Söderlund wrote:
> Bayer formats are used with cameras and contain green, red and blue
> components, with alternating lines of red and green, and blue and green
> pixels in different orders. For each block of 2x2 pixels there is one
> pixel with a red filter, two with a green filter, and one with a blue
> filter. The filters can be arranged in different patterns.
> 
> Add DRM fourcc formats to describe the most common Bayer formats. Also
> add a modifiers to describe the custom packing layouts used by the Intel
> IPU3 and in the MIPI (Mobile Industry Processor Interface) CSI-2
> specification.
> 
> Signed-off-by: Niklas Söderlund 
> ---
> * Changes since v1
> - Rename the defines from DRM_FORMAT_SRGGB8 to DRM_FORMAT_BAYER_RGGB8.
> - Update the fourcc codes passed to fourcc_code() to avoid a conflict.
> - Add diagrams for all Bayer formats memory layout.
> - Update documentation.
> ---
>  include/uapi/drm/drm_fourcc.h | 205 ++
>  1 file changed, 205 insertions(+)
> 
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 8bc0b31597d80737..d07dd24b49bde6c1 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -285,6 +285,73 @@ extern "C" {
>  #define DRM_FORMAT_YUV444fourcc_code('Y', 'U', '2', '4') /* 
> non-subsampled Cb (1) and Cr (2) planes */
>  #define DRM_FORMAT_YVU444fourcc_code('Y', 'V', '2', '4') /* 
> non-subsampled Cr (1) and Cb (2) planes */
>  
> +/*
> + * Bayer formats
> + *
> + * Bayer formats contain green, red and blue components, with alternating 
> lines
> + * of red and green, and blue and green pixels in different orders. For each
> + * block of 2x2 pixels there is one pixel with a red filter, two with a green
> + * filter, and one with a blue filter. The filters can be arranged in 
> different
> + * patterns.
> + *
> + * For example, RGGB:
> + *   row0: RGRGRGRG...
> + *   row1: GBGBGBGB...
> + *   row2: RGRGRGRG...
> + *   row3: GBGBGBGB...
> + *   ...
> + *

I wonder if we're operating on the right level of abstraction within this
proposal.

The sensor itself transfers only sequential pixels, as read
out from its matrix. Whether a given pixel corresponds to a red, green
or blue color filter actually depends on the filter layer, which could
actually vary between integrations of the same sensor. (See Fujifilm
X-Trans, which uses regular Sony sensors with their own filter pattern
[1].)

Moreover, the sensor resolution is specified as the number of pixels
horizontally and the number of lines horizontally, without considering
the color pattern.

If we consider that, wouldn't the data stream coming from the sensor be
essentially DRM_FORMAT_R8/R10/R12/etc.?

Then, on top of that, we would have the packing, which I believe is
defined well in this document +/- being entangled with the Bayer
pattern.

What do you think?

[1] https://en.wikipedia.org/wiki/Fujifilm_X-Trans_sensor

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-10-16 Thread Tomasz Figa
On Wed, Oct 16, 2019 at 6:18 PM Daniel Vetter  wrote:
>
> On Wed, Oct 16, 2019 at 12:19:02PM +0900, Tomasz Figa wrote:
> > On Wed, Oct 9, 2019 at 12:04 AM Daniel Vetter  wrote:
> > >
> > > On Tue, Oct 08, 2019 at 07:49:39PM +0900, Tomasz Figa wrote:
> > > > On Tue, Oct 8, 2019 at 7:03 PM Daniel Vetter  wrote:
> > > > >
> > > > > On Sat, Oct 05, 2019 at 02:41:54PM +0900, Tomasz Figa wrote:
> > > > > > Hi Daniel, Gerd,
> > > > > >
> > > > > > On Tue, Sep 17, 2019 at 10:23 PM Daniel Vetter  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Thu, Sep 12, 2019 at 06:41:21PM +0900, Tomasz Figa wrote:
> > > > > > > > This patch is an early RFC to judge the direction we are 
> > > > > > > > following in
> > > > > > > > our virtualization efforts in Chrome OS. The purpose is to 
> > > > > > > > start a
> > > > > > > > discussion on how to handle buffer sharing between multiple 
> > > > > > > > virtio
> > > > > > > > devices.
> > > > > > > >
> > > > > > > > On a side note, we are also working on a virtio video decoder 
> > > > > > > > interface
> > > > > > > > and implementation, with a V4L2 driver for Linux. Those will be 
> > > > > > > > posted
> > > > > > > > for review in the near future as well.
> > > > > > > >
> > > > > > > > Any feedback will be appreciated! Thanks in advance.
> > > > > > > >
> > > > > > > > ===
> > > > > > > >
> > > > > > > > With the range of use cases for virtualization expanding, there 
> > > > > > > > is going
> > > > > > > > to be more virtio devices added to the ecosystem. Devices such 
> > > > > > > > as video
> > > > > > > > decoders, encoders, cameras, etc. typically work together with 
> > > > > > > > the
> > > > > > > > display and GPU in a pipeline manner, which can only be 
> > > > > > > > implemented
> > > > > > > > efficiently by sharing the buffers between producers and 
> > > > > > > > consumers.
> > > > > > > >
> > > > > > > > Existing buffer management framework in Linux, such as the 
> > > > > > > > videobuf2
> > > > > > > > framework in V4L2, implements all the DMA-buf handling inside 
> > > > > > > > generic
> > > > > > > > code and do not expose any low level information about the 
> > > > > > > > buffers to
> > > > > > > > the drivers.
> > > > > > > >
> > > > > > > > To seamlessly enable buffer sharing with drivers using such 
> > > > > > > > frameworks,
> > > > > > > > make the virtio-gpu driver expose the resource handle as the 
> > > > > > > > DMA address
> > > > > > > > of the buffer returned from the DMA-buf mapping operation. 
> > > > > > > > Arguably, the
> > > > > > > > resource handle is a kind of DMA address already, as it is the 
> > > > > > > > buffer
> > > > > > > > identifier that the device needs to access the backing memory, 
> > > > > > > > which is
> > > > > > > > exactly the same role a DMA address provides for native devices.
> > > > > > > >
> > > > > > > > A virtio driver that does memory management fully on its own 
> > > > > > > > would have
> > > > > > > > code similar to following. The code is identical to what a 
> > > > > > > > regular
> > > > > > > > driver for real hardware would do to import a DMA-buf.
> > > > > > > >
> > > > > > > > static int virtio_foo_get_resource_handle(struct virtio_foo 
> > > > > > > > *foo,
> > > > > > > > struct dma_buf 
> > > > > > > > *dma_buf, u32 *id)
> > > > > > > > {
&g

Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-10-16 Thread Tomasz Figa
On Wed, Oct 16, 2019 at 3:12 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > up later when given a buffer index. But we would still need to make
> > the DMA-buf itself importable. For virtio-gpu I guess that would mean
> > returning an sg_table backed by the shadow buffer pages.
>
> The virtio-gpu driver in drm-misc-next supports dma-buf exports.

Good to know, thanks.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-10-15 Thread Tomasz Figa
On Wed, Oct 9, 2019 at 12:04 AM Daniel Vetter  wrote:
>
> On Tue, Oct 08, 2019 at 07:49:39PM +0900, Tomasz Figa wrote:
> > On Tue, Oct 8, 2019 at 7:03 PM Daniel Vetter  wrote:
> > >
> > > On Sat, Oct 05, 2019 at 02:41:54PM +0900, Tomasz Figa wrote:
> > > > Hi Daniel, Gerd,
> > > >
> > > > On Tue, Sep 17, 2019 at 10:23 PM Daniel Vetter  wrote:
> > > > >
> > > > > On Thu, Sep 12, 2019 at 06:41:21PM +0900, Tomasz Figa wrote:
> > > > > > This patch is an early RFC to judge the direction we are following 
> > > > > > in
> > > > > > our virtualization efforts in Chrome OS. The purpose is to start a
> > > > > > discussion on how to handle buffer sharing between multiple virtio
> > > > > > devices.
> > > > > >
> > > > > > On a side note, we are also working on a virtio video decoder 
> > > > > > interface
> > > > > > and implementation, with a V4L2 driver for Linux. Those will be 
> > > > > > posted
> > > > > > for review in the near future as well.
> > > > > >
> > > > > > Any feedback will be appreciated! Thanks in advance.
> > > > > >
> > > > > > ===
> > > > > >
> > > > > > With the range of use cases for virtualization expanding, there is 
> > > > > > going
> > > > > > to be more virtio devices added to the ecosystem. Devices such as 
> > > > > > video
> > > > > > decoders, encoders, cameras, etc. typically work together with the
> > > > > > display and GPU in a pipeline manner, which can only be implemented
> > > > > > efficiently by sharing the buffers between producers and consumers.
> > > > > >
> > > > > > Existing buffer management framework in Linux, such as the videobuf2
> > > > > > framework in V4L2, implements all the DMA-buf handling inside 
> > > > > > generic
> > > > > > code and do not expose any low level information about the buffers 
> > > > > > to
> > > > > > the drivers.
> > > > > >
> > > > > > To seamlessly enable buffer sharing with drivers using such 
> > > > > > frameworks,
> > > > > > make the virtio-gpu driver expose the resource handle as the DMA 
> > > > > > address
> > > > > > of the buffer returned from the DMA-buf mapping operation. 
> > > > > > Arguably, the
> > > > > > resource handle is a kind of DMA address already, as it is the 
> > > > > > buffer
> > > > > > identifier that the device needs to access the backing memory, 
> > > > > > which is
> > > > > > exactly the same role a DMA address provides for native devices.
> > > > > >
> > > > > > A virtio driver that does memory management fully on its own would 
> > > > > > have
> > > > > > code similar to following. The code is identical to what a regular
> > > > > > driver for real hardware would do to import a DMA-buf.
> > > > > >
> > > > > > static int virtio_foo_get_resource_handle(struct virtio_foo *foo,
> > > > > > struct dma_buf *dma_buf, 
> > > > > > u32 *id)
> > > > > > {
> > > > > >   struct dma_buf_attachment *attach;
> > > > > >   struct sg_table *sgt;
> > > > > >   int ret = 0;
> > > > > >
> > > > > >   attach = dma_buf_attach(dma_buf, foo->dev);
> > > > > >   if (IS_ERR(attach))
> > > > > >   return PTR_ERR(attach);
> > > > > >
> > > > > >   sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> > > > > >   if (IS_ERR(sgt)) {
> > > > > >   ret = PTR_ERR(sgt);
> > > > > >   goto err_detach;
> > > > > >   }
> > > > > >
> > > > > >   if (sgt->nents != 1) {
> > > > > >   ret = -EINVAL;
> > > > > >   goto err_unmap;
> > > > > >   }
> > > > > >
> > > > > >   *id = sg_dma_address(sgt->sgl);
> &g

Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-10-08 Thread Tomasz Figa
On Tue, Oct 8, 2019 at 7:03 PM Daniel Vetter  wrote:
>
> On Sat, Oct 05, 2019 at 02:41:54PM +0900, Tomasz Figa wrote:
> > Hi Daniel, Gerd,
> >
> > On Tue, Sep 17, 2019 at 10:23 PM Daniel Vetter  wrote:
> > >
> > > On Thu, Sep 12, 2019 at 06:41:21PM +0900, Tomasz Figa wrote:
> > > > This patch is an early RFC to judge the direction we are following in
> > > > our virtualization efforts in Chrome OS. The purpose is to start a
> > > > discussion on how to handle buffer sharing between multiple virtio
> > > > devices.
> > > >
> > > > On a side note, we are also working on a virtio video decoder interface
> > > > and implementation, with a V4L2 driver for Linux. Those will be posted
> > > > for review in the near future as well.
> > > >
> > > > Any feedback will be appreciated! Thanks in advance.
> > > >
> > > > ===
> > > >
> > > > With the range of use cases for virtualization expanding, there is going
> > > > to be more virtio devices added to the ecosystem. Devices such as video
> > > > decoders, encoders, cameras, etc. typically work together with the
> > > > display and GPU in a pipeline manner, which can only be implemented
> > > > efficiently by sharing the buffers between producers and consumers.
> > > >
> > > > Existing buffer management framework in Linux, such as the videobuf2
> > > > framework in V4L2, implements all the DMA-buf handling inside generic
> > > > code and do not expose any low level information about the buffers to
> > > > the drivers.
> > > >
> > > > To seamlessly enable buffer sharing with drivers using such frameworks,
> > > > make the virtio-gpu driver expose the resource handle as the DMA address
> > > > of the buffer returned from the DMA-buf mapping operation. Arguably, the
> > > > resource handle is a kind of DMA address already, as it is the buffer
> > > > identifier that the device needs to access the backing memory, which is
> > > > exactly the same role a DMA address provides for native devices.
> > > >
> > > > A virtio driver that does memory management fully on its own would have
> > > > code similar to following. The code is identical to what a regular
> > > > driver for real hardware would do to import a DMA-buf.
> > > >
> > > > static int virtio_foo_get_resource_handle(struct virtio_foo *foo,
> > > > struct dma_buf *dma_buf, u32 
> > > > *id)
> > > > {
> > > >   struct dma_buf_attachment *attach;
> > > >   struct sg_table *sgt;
> > > >   int ret = 0;
> > > >
> > > >   attach = dma_buf_attach(dma_buf, foo->dev);
> > > >   if (IS_ERR(attach))
> > > >   return PTR_ERR(attach);
> > > >
> > > >   sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> > > >   if (IS_ERR(sgt)) {
> > > >   ret = PTR_ERR(sgt);
> > > >   goto err_detach;
> > > >   }
> > > >
> > > >   if (sgt->nents != 1) {
> > > >   ret = -EINVAL;
> > > >   goto err_unmap;
> > > >   }
> > > >
> > > >   *id = sg_dma_address(sgt->sgl);
> > >
> > > I agree with Gerd, this looks pretty horrible to me.
> > >
> > > The usual way we've done these kind of special dma-bufs is:
> > >
> > > - They all get allocated at the same place, through some library or
> > >   whatever.
> > >
> > > - You add a dma_buf_is_virtio(dma_buf) function, or maybe something that
> > >   also upcasts or returns NULL, which checks for dma_buf->ops.
> > >
> >
> > Thanks for a lot of valuable feedback and sorry for the late reply.
> >
> > While I agree that stuffing the resource ID in sg_dma_address() is
> > quite ugly (for example, the regular address arithmetic doesn't work),
> > I still believe we need to convey information about these buffers
> > using regular kernel interfaces.
> >
> > Drivers in some subsystems like DRM tend to open code any buffer
> > management and then it wouldn't be any problem to do what you
> > suggested. However, other subsystems have generic frameworks for
> > buffer management, like videobuf2 for V4L2. Those assume regular
> > DMA-bu

Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-10-04 Thread Tomasz Figa
Hi Daniel, Gerd,

On Tue, Sep 17, 2019 at 10:23 PM Daniel Vetter  wrote:
>
> On Thu, Sep 12, 2019 at 06:41:21PM +0900, Tomasz Figa wrote:
> > This patch is an early RFC to judge the direction we are following in
> > our virtualization efforts in Chrome OS. The purpose is to start a
> > discussion on how to handle buffer sharing between multiple virtio
> > devices.
> >
> > On a side note, we are also working on a virtio video decoder interface
> > and implementation, with a V4L2 driver for Linux. Those will be posted
> > for review in the near future as well.
> >
> > Any feedback will be appreciated! Thanks in advance.
> >
> > ===
> >
> > With the range of use cases for virtualization expanding, there is going
> > to be more virtio devices added to the ecosystem. Devices such as video
> > decoders, encoders, cameras, etc. typically work together with the
> > display and GPU in a pipeline manner, which can only be implemented
> > efficiently by sharing the buffers between producers and consumers.
> >
> > Existing buffer management framework in Linux, such as the videobuf2
> > framework in V4L2, implements all the DMA-buf handling inside generic
> > code and do not expose any low level information about the buffers to
> > the drivers.
> >
> > To seamlessly enable buffer sharing with drivers using such frameworks,
> > make the virtio-gpu driver expose the resource handle as the DMA address
> > of the buffer returned from the DMA-buf mapping operation. Arguably, the
> > resource handle is a kind of DMA address already, as it is the buffer
> > identifier that the device needs to access the backing memory, which is
> > exactly the same role a DMA address provides for native devices.
> >
> > A virtio driver that does memory management fully on its own would have
> > code similar to following. The code is identical to what a regular
> > driver for real hardware would do to import a DMA-buf.
> >
> > static int virtio_foo_get_resource_handle(struct virtio_foo *foo,
> > struct dma_buf *dma_buf, u32 *id)
> > {
> >   struct dma_buf_attachment *attach;
> >   struct sg_table *sgt;
> >   int ret = 0;
> >
> >   attach = dma_buf_attach(dma_buf, foo->dev);
> >   if (IS_ERR(attach))
> >   return PTR_ERR(attach);
> >
> >   sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> >   if (IS_ERR(sgt)) {
> >   ret = PTR_ERR(sgt);
> >   goto err_detach;
> >   }
> >
> >   if (sgt->nents != 1) {
> >   ret = -EINVAL;
> >   goto err_unmap;
> >   }
> >
> >   *id = sg_dma_address(sgt->sgl);
>
> I agree with Gerd, this looks pretty horrible to me.
>
> The usual way we've done these kind of special dma-bufs is:
>
> - They all get allocated at the same place, through some library or
>   whatever.
>
> - You add a dma_buf_is_virtio(dma_buf) function, or maybe something that
>   also upcasts or returns NULL, which checks for dma_buf->ops.
>

Thanks for a lot of valuable feedback and sorry for the late reply.

While I agree that stuffing the resource ID in sg_dma_address() is
quite ugly (for example, the regular address arithmetic doesn't work),
I still believe we need to convey information about these buffers
using regular kernel interfaces.

Drivers in some subsystems like DRM tend to open code any buffer
management and then it wouldn't be any problem to do what you
suggested. However, other subsystems have generic frameworks for
buffer management, like videobuf2 for V4L2. Those assume regular
DMA-bufs that can be handled with regular dma_buf_() API and described
using sgtables and/or pfn vectors and/or DMA addresses.

> - Once you've upcasted at runtime by checking for ->ops, you can add
>   whatever fancy interfaces you want. Including a real&proper interface to
>   get at whatever underlying id you need to for real buffer sharing
>   between virtio devices.
>
> In a way virtio buffer/memory ids are a kind of private bus, entirely
> distinct from the dma_addr_t bus. So can't really stuff them under this
> same thing like we e.g. do with pci peer2peer.

As I mentioned earlier, there is no single "dma_addr_t bus". Each
device (as in struct device) can be on its own different DMA bus, with
a different DMA address space. There is not even a guarantee that a
DMA address obtained for one PCI device will be valid for another if
they are on different buses, which could have different address
mappings.

Putting that aside, we're think

Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-09-13 Thread Tomasz Figa
On Fri, Sep 13, 2019 at 5:07 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > > > To seamlessly enable buffer sharing with drivers using such frameworks,
> > > > make the virtio-gpu driver expose the resource handle as the DMA address
> > > > of the buffer returned from the DMA-buf mapping operation.  Arguably, 
> > > > the
> > > > resource handle is a kind of DMA address already, as it is the buffer
> > > > identifier that the device needs to access the backing memory, which is
> > > > exactly the same role a DMA address provides for native devices.
> >
> > First of all, thanks for taking a look at this.
> >
> > > No.  A scatter list has guest dma addresses, period.  Stuffing something
> > > else into a scatterlist is asking for trouble, things will go seriously
> > > wrong when someone tries to use such a fake scatterlist as real 
> > > scatterlist.
> >
> > What is a "guest dma address"? The definition of a DMA address in the
> > Linux DMA API is an address internal to the DMA master address space.
> > For virtio, the resource handle namespace may be such an address
> > space.
>
> No.  DMA master address space in virtual machines is pretty much the
> same it is on physical machines.  So, on x86 without iommu, identical
> to (guest) physical address space.  You can't re-define it like that.
>

That's not true. Even on x86 without iommu the DMA address space can
be different from the physical address space. That could be still just
a simple addition/subtraction by constant, but still, the two are
explicitly defined without any guarantees about a simple mapping
between one or another existing.

See https://www.kernel.org/doc/Documentation/DMA-API.txt

"A CPU cannot reference a dma_addr_t directly because there may be
translation between its physical
address space and the DMA address space."

> > However, we could as well introduce a separate DMA address
> > space if resource handles are not the right way to refer to the memory
> > from other virtio devices.
>
> s/other virtio devices/other devices/
>
> dma-bufs are for buffer sharing between devices, not limited to virtio.
> You can't re-define that in some virtio-specific way.
>

We don't need to limit this to virtio devices only. In fact I actually
foresee this having a use case with the emulated USB host controller,
which isn't a virtio device.

That said, I deliberately referred to virtio to keep the scope of the
problem in control. If there is a solution that could work without
such assumption, I'm more than open to discuss it, of course.

> > > Also note that "the DMA address of the buffer" is bonkers in virtio-gpu
> > > context.  virtio-gpu resources are not required to be physically
> > > contigous in memory, so typically you actually need a scatter list to
> > > describe them.
> >
> > There is no such requirement even on a bare metal system, see any
> > system that has an IOMMU, which is typical on ARM/ARM64. The DMA
> > address space must be contiguous only from the DMA master point of
> > view.
>
> Yes, the iommu (if present) could remap your scatterlist that way.  You
> can't depend on that though.

The IOMMU doesn't need to exist physically, though. After all, guest
memory may not be physically contiguous in the host already, but with
your definition of DMA address we would refer to it as contiguous. As
per my understanding of the DMA address, anything that lets the DMA
master access the target memory would qualify and there would be no
need for an IOMMU in between.

>
> What is the plan here?  Host-side buffer sharing I guess?  So you are
> looking for some way to pass buffer handles from the guest to the host,
> even in case those buffers are not created by your driver but imported
> from somewhere else?

Exactly. The very specific first scenario that we want to start with
is allocating host memory through virtio-gpu and using that memory
both as output of a video decoder and as input (texture) to Virgil3D.
The memory needs to be specifically allocated by the host as only the
host can know the requirements for memory allocation of the video
decode accelerator hardware.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-09-12 Thread Tomasz Figa
Hi Gerd,

On Thu, Sep 12, 2019 at 9:38 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > To seamlessly enable buffer sharing with drivers using such frameworks,
> > make the virtio-gpu driver expose the resource handle as the DMA address
> > of the buffer returned from the DMA-buf mapping operation.  Arguably, the
> > resource handle is a kind of DMA address already, as it is the buffer
> > identifier that the device needs to access the backing memory, which is
> > exactly the same role a DMA address provides for native devices.

First of all, thanks for taking a look at this.

>
> No.  A scatter list has guest dma addresses, period.  Stuffing something
> else into a scatterlist is asking for trouble, things will go seriously
> wrong when someone tries to use such a fake scatterlist as real scatterlist.

What is a "guest dma address"? The definition of a DMA address in the
Linux DMA API is an address internal to the DMA master address space.
For virtio, the resource handle namespace may be such an address
space. However, we could as well introduce a separate DMA address
space if resource handles are not the right way to refer to the memory
from other virtio devices.

>
> Also note that "the DMA address of the buffer" is bonkers in virtio-gpu
> context.  virtio-gpu resources are not required to be physically
> contigous in memory, so typically you actually need a scatter list to
> describe them.

There is no such requirement even on a bare metal system, see any
system that has an IOMMU, which is typical on ARM/ARM64. The DMA
address space must be contiguous only from the DMA master point of
view.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC PATCH] drm/virtio: Export resource handles via DMA-buf API

2019-09-12 Thread Tomasz Figa
This patch is an early RFC to judge the direction we are following in
our virtualization efforts in Chrome OS. The purpose is to start a
discussion on how to handle buffer sharing between multiple virtio
devices.

On a side note, we are also working on a virtio video decoder interface
and implementation, with a V4L2 driver for Linux. Those will be posted
for review in the near future as well.

Any feedback will be appreciated! Thanks in advance.

===

With the range of use cases for virtualization expanding, there is going
to be more virtio devices added to the ecosystem. Devices such as video
decoders, encoders, cameras, etc. typically work together with the
display and GPU in a pipeline manner, which can only be implemented
efficiently by sharing the buffers between producers and consumers.

Existing buffer management framework in Linux, such as the videobuf2
framework in V4L2, implements all the DMA-buf handling inside generic
code and do not expose any low level information about the buffers to
the drivers.

To seamlessly enable buffer sharing with drivers using such frameworks,
make the virtio-gpu driver expose the resource handle as the DMA address
of the buffer returned from the DMA-buf mapping operation. Arguably, the
resource handle is a kind of DMA address already, as it is the buffer
identifier that the device needs to access the backing memory, which is
exactly the same role a DMA address provides for native devices.

A virtio driver that does memory management fully on its own would have
code similar to following. The code is identical to what a regular
driver for real hardware would do to import a DMA-buf.

static int virtio_foo_get_resource_handle(struct virtio_foo *foo,
  struct dma_buf *dma_buf, u32 *id)
{
struct dma_buf_attachment *attach;
struct sg_table *sgt;
int ret = 0;

attach = dma_buf_attach(dma_buf, foo->dev);
if (IS_ERR(attach))
return PTR_ERR(attach);

sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
goto err_detach;
}

if (sgt->nents != 1) {
ret = -EINVAL;
goto err_unmap;
}

*id = sg_dma_address(sgt->sgl);

err_unmap:
dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
err_detach:
dma_buf_detach(dma_buf, attach);

return ret;
}

On the other hand, a virtio driver that uses an existing kernel
framework to manage buffers would not need to explicitly handle anything
at all, as the framework part responsible for importing DMA-bufs would
already do the work. For example, a V4L2 driver using the videobuf2
framework would just call thee vb2_dma_contig_plane_dma_addr() function
to get what the above open-coded function would return.

Signed-off-by: Tomasz Figa 
---
 drivers/gpu/drm/virtio/virtgpu_drv.c   |  2 +
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  4 ++
 drivers/gpu/drm/virtio/virtgpu_prime.c | 81 ++
 3 files changed, 87 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 0fc32fa0b3c0..ac095f813134 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -210,6 +210,8 @@ static struct drm_driver driver = {
 #endif
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .gem_prime_export = virtgpu_gem_prime_export,
+   .gem_prime_import = virtgpu_gem_prime_import,
.gem_prime_get_sg_table = virtgpu_gem_prime_get_sg_table,
.gem_prime_import_sg_table = virtgpu_gem_prime_import_sg_table,
.gem_prime_vmap = virtgpu_gem_prime_vmap,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index e28829661724..687cfce91885 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -367,6 +367,10 @@ void virtio_gpu_object_free_sg_table(struct 
virtio_gpu_object *bo);
 int virtio_gpu_object_wait(struct virtio_gpu_object *bo, bool no_wait);
 
 /* virtgpu_prime.c */
+struct dma_buf *virtgpu_gem_prime_export(struct drm_gem_object *obj,
+int flags);
+struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev,
+   struct dma_buf *buf);
 struct sg_table *virtgpu_gem_prime_get_sg_table(struct drm_gem_object *obj);
 struct drm_gem_object *virtgpu_gem_prime_import_sg_table(
struct drm_device *dev, struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
b/drivers/gpu/drm/virtio/virtgpu_prime.c
index dc642a884b88..562eb1a2ed5b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -22,6 +22,9 @@
  * Authors: Andreas Pokorny
  */
 
+#include 
+#include 
+

Re: [PATCH] drm/mediatek: make imported PRIME buffers contiguous

2019-07-23 Thread Tomasz Figa
On Tue, Jul 23, 2019 at 2:34 PM Alexandre Courbot  wrote:
>
> This driver requires imported PRIME buffers to appear contiguously in
> its IO address space. Make sure this is the case by setting the maximum
> DMA segment size to a better value than the default 64K on the DMA
> device, and use said DMA device when importing PRIME buffers.
>
> Signed-off-by: Alexandre Courbot 
> ---
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c | 47 --
>  drivers/gpu/drm/mediatek/mtk_drm_drv.h |  2 ++
>  2 files changed, 46 insertions(+), 3 deletions(-)
>

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-04 Thread Tomasz Figa
On Mon, Jun 3, 2019 at 7:48 PM Rob Clark  wrote:
>
> On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
> >
> > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > >
> > > On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> > > >
> > > > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> > > > >
> > > > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > > > > >
> > > > > > This solves a problem we see with drm/msm, caused by getting
> > > > > > iommu_dma_ops while we attach our own domain and manage it directly 
> > > > > > at
> > > > > > the iommu API level:
> > > > > >
> > > > > >   [0038] user address but active_mm is swapper
> > > > > >   Internal error: Oops: 9605 [#1] PREEMPT SMP
> > > > > >   Modules linked in:
> > > > > >   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 
> > > > > > 4.19.3 #90
> > > > > >   Hardware name: xxx (DT)
> > > > > >   Workqueue: events deferred_probe_work_func
> > > > > >   pstate: 80c9 (Nzcv daif +PAN +UAO)
> > > > > >   pc : iommu_dma_map_sg+0x7c/0x2c8
> > > > > >   lr : iommu_dma_map_sg+0x40/0x2c8
> > > > > >   sp : ff80095eb4f0
> > > > > >   x29: ff80095eb4f0 x28: 
> > > > > >   x27: ffc0f9431578 x26: 
> > > > > >   x25:  x24: 0003
> > > > > >   x23: 0001 x22: ffc0fa9ac010
> > > > > >   x21:  x20: ffc0fab40980
> > > > > >   x19: ffc0fab40980 x18: 0003
> > > > > >   x17: 01c4 x16: 0007
> > > > > >   x15: 000e x14: 
> > > > > >   x13:  x12: 0028
> > > > > >   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > > > >   x9 :  x8 : ffc0fab409a0
> > > > > >   x7 :  x6 : 0002
> > > > > >   x5 : 0001 x4 : 
> > > > > >   x3 : 0001 x2 : 0002
> > > > > >   x1 : ffc0f9431578 x0 : 
> > > > > >   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > > > > >   Call trace:
> > > > > >iommu_dma_map_sg+0x7c/0x2c8
> > > > > >__iommu_map_sg_attrs+0x70/0x84
> > > > > >get_pages+0x170/0x1e8
> > > > > >msm_gem_get_iova+0x8c/0x128
> > > > > >_msm_gem_kernel_new+0x6c/0xc8
> > > > > >msm_gem_kernel_new+0x4c/0x58
> > > > > >dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > > > >msm_dsi_host_modeset_init+0xc8/0x108
> > > > > >msm_dsi_modeset_init+0x54/0x18c
> > > > > >_dpu_kms_drm_obj_init+0x430/0x474
> > > > > >dpu_kms_hw_init+0x5f8/0x6b4
> > > > > >msm_drm_bind+0x360/0x6c8
> > > > > >try_to_bring_up_master.part.7+0x28/0x70
> > > > > >component_master_add_with_match+0xe8/0x124
> > > > > >msm_pdev_probe+0x294/0x2b4
> > > > > >platform_drv_probe+0x58/0xa4
> > > > > >really_probe+0x150/0x294
> > > > > >driver_probe_device+0xac/0xe8
> > > > > >__device_attach_driver+0xa4/0xb4
> > > > > >bus_for_each_drv+0x98/0xc8
> > > > > >__device_attach+0xac/0x12c
> > > > > >device_initial_probe+0x24/0x30
> > > > > >bus_probe_device+0x38/0x98
> > > > > >deferred_probe_work_func+0x78/0xa4
> > > > > >process_one_work+0x24c/0x3dc
> > > > > >worker_thread+0x280/0x360
> > > > > >kthread+0x134/0x13c
> > > > > >ret_from_fork+0x10/0x18
> > > > > >   Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > > > > >   ---[ end trace f22dda57f3648e2c ]---
> > > > > >   Kernel panic - not syncing: Fatal exception
> > > > > >   SMP: stopping secondary CPUs
> > > > > >   Kernel Offset: disabled
> > > > > >   CPU features: 0x0,22802a18
> > 

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-02 Thread Tomasz Figa
On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
>
> On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> >
> > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> > >
> > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > > >
> > > > This solves a problem we see with drm/msm, caused by getting
> > > > iommu_dma_ops while we attach our own domain and manage it directly at
> > > > the iommu API level:
> > > >
> > > >   [0038] user address but active_mm is swapper
> > > >   Internal error: Oops: 9605 [#1] PREEMPT SMP
> > > >   Modules linked in:
> > > >   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 
> > > > #90
> > > >   Hardware name: xxx (DT)
> > > >   Workqueue: events deferred_probe_work_func
> > > >   pstate: 80c9 (Nzcv daif +PAN +UAO)
> > > >   pc : iommu_dma_map_sg+0x7c/0x2c8
> > > >   lr : iommu_dma_map_sg+0x40/0x2c8
> > > >   sp : ff80095eb4f0
> > > >   x29: ff80095eb4f0 x28: 
> > > >   x27: ffc0f9431578 x26: 
> > > >   x25:  x24: 0003
> > > >   x23: 0001 x22: ffc0fa9ac010
> > > >   x21:  x20: ffc0fab40980
> > > >   x19: ffc0fab40980 x18: 0003
> > > >   x17: 01c4 x16: 0007
> > > >   x15: 000e x14: 
> > > >   x13:  x12: 0028
> > > >   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > >   x9 :  x8 : ffc0fab409a0
> > > >   x7 :  x6 : 0002
> > > >   x5 : 0001 x4 : 
> > > >   x3 : 0001 x2 : 0002
> > > >   x1 : ffc0f9431578 x0 : 
> > > >   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > > >   Call trace:
> > > >iommu_dma_map_sg+0x7c/0x2c8
> > > >__iommu_map_sg_attrs+0x70/0x84
> > > >get_pages+0x170/0x1e8
> > > >msm_gem_get_iova+0x8c/0x128
> > > >_msm_gem_kernel_new+0x6c/0xc8
> > > >msm_gem_kernel_new+0x4c/0x58
> > > >dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > >msm_dsi_host_modeset_init+0xc8/0x108
> > > >msm_dsi_modeset_init+0x54/0x18c
> > > >_dpu_kms_drm_obj_init+0x430/0x474
> > > >dpu_kms_hw_init+0x5f8/0x6b4
> > > >msm_drm_bind+0x360/0x6c8
> > > >try_to_bring_up_master.part.7+0x28/0x70
> > > >component_master_add_with_match+0xe8/0x124
> > > >msm_pdev_probe+0x294/0x2b4
> > > >platform_drv_probe+0x58/0xa4
> > > >really_probe+0x150/0x294
> > > >driver_probe_device+0xac/0xe8
> > > >__device_attach_driver+0xa4/0xb4
> > > >bus_for_each_drv+0x98/0xc8
> > > >__device_attach+0xac/0x12c
> > > >device_initial_probe+0x24/0x30
> > > >bus_probe_device+0x38/0x98
> > > >deferred_probe_work_func+0x78/0xa4
> > > >process_one_work+0x24c/0x3dc
> > > >worker_thread+0x280/0x360
> > > >kthread+0x134/0x13c
> > > >ret_from_fork+0x10/0x18
> > > >   Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > > >   ---[ end trace f22dda57f3648e2c ]---
> > > >   Kernel panic - not syncing: Fatal exception
> > > >   SMP: stopping secondary CPUs
> > > >   Kernel Offset: disabled
> > > >   CPU features: 0x0,22802a18
> > > >   Memory Limit: none
> > > >
> > > > The problem is that when drm/msm does it's own iommu_attach_device(),
> > > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > > > domain, and it doesn't have domain->iova_cookie.
> > > >
> > > > We kind of avoided this problem prior to sdm845/dpu because the iommu
> > > > was attached to the mdp node in dt, which is a child of the toplevel
> > > > mdss node (which corresponds to the dev passed in dma_map_sg()).  But
> > > > with sdm845, now the iommu is attached at the mdss level so we hit the
> > > > iommu_dma_ops in dma_map_sg().
> > > >
> > > > But auto allocating/attaching a domain before the driver is probed was
> > > > already a blocking problem for enabling per-context pagetables for the
> > > > GPU.  This problem is also now solved with this patch.
> > > >
> > > > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in 
> > > > of_dma_configure
> > > > Tested-by: Douglas Anderson 
> > > > Signed-off-by: Rob Clark 
> > > > ---
> > > > This is an alternative/replacement for [1].  What it lacks in elegance
> > > > it makes up for in practicality ;-)
> > > >
> > > > [1] https://patchwork.freedesktop.org/patch/264930/
> > > >
> > > >  drivers/of/device.c | 22 ++
> > > >  1 file changed, 22 insertions(+)
> > > >
> > > > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > > > index 5957cd4fa262..15ffee00fb22 100644
> > > > --- a/drivers/of/device.c
> > > > +++ b/drivers/of/device.c
> > > > @@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
> > > > return device_add(&ofdev->dev);
> > > >  }
> > > >
> > > > +static const struct of_device_id iommu_blacklist[] = {
> > > > +   { .compatible = "qcom,mdp4" },
> > > 

Re: Support for 2D engines/blitters in V4L2 and DRM

2019-04-21 Thread Tomasz Figa
On Sat, Apr 20, 2019 at 12:31 AM Nicolas Dufresne  wrote:
>
> Le vendredi 19 avril 2019 à 13:27 +0900, Tomasz Figa a écrit :
> > On Fri, Apr 19, 2019 at 9:30 AM Nicolas Dufresne  
> > wrote:
> > > Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > > > It would be cool if both could be used concurrently and not just 
> > > > > return
> > > > > -EBUSY when the device is used with the other subsystem.
> > > >
> > > > We live in this world already :-) I think there's even patches (or 
> > > > merged
> > > > already) to add fences to v4l, for Android.
> > >
> > > This work is currently suspended. It will require some feature on DRM
> > > display to really make this useful, but there is also a lot of
> > > challanges in V4L2. In GFX space, most of the use case are about
> > > rendering as soon as possible. Though, in multimedia we have two
> > > problems, we need to synchronize the frame rendering with the audio,
> > > and output buffers may comes out of order due to how video CODECs are
> > > made.
> > >
> > > In the first, we'd need a mechanism where we can schedule a render at a
> > > specific time or vblank. We can of course already implement this in
> > > software, but with fences, the scheduling would need to be done in the
> > > driver. Then if the fence is signalled earlier, the driver should hold
> > > on until the delay is met. If the fence got signalled late, we also
> > > need to think of a workflow. As we can't schedule more then one render
> > > in DRM at one time, I don't really see yet how to make that work.
> > >
> > > For the second, it's complicated on V4L2 side. Currently we signal
> > > buffers when they are ready in the display order. With fences, we
> > > receive early pairs buffer and fence (in decoding order). There exist
> > > cases where reordering is done by the driver (stateful CODEC). We
> > > cannot schedule these immediately we would need a new mechanism to know
> > > which one come next. If we just reuse current mechnism, it would void
> > > the fence usage since the fence will always be signalled by the time it
> > > reaches DRM or other v4l2 component.
> > >
> > > There also other issues, for video capture pipeline, if you are not
> > > rendering ASAP, you need the HW timestamp in order to schedule. Again,
> > > we'd get the fence early, but the actual timestamp will be signalled at
> > > the very last minutes, so we also risk of turning the fence into pure
> > > overhead. Note that as we speak, I have colleagues who are
> > > experimenting with frame timestamp prediction that slaves to the
> > > effective timestamp (catching up over time). But we still have issues
> > > when the capture driver skipped a frame (missed a capture window).
> >
> > Note that a fence has a timestamp internally and it can be queried for
> > it from the user space if exposed as a sync file:
> > https://elixir.bootlin.com/linux/v5.1-rc5/source/drivers/dma-buf/sync_file.c#L386
>
> Don't we need something the other way around ? This seems to be the
> timestamp of when it was triggered (I'm not familiar with this though).
>

Honestly, I'm not fully sure what this timestamp is expected to be.

For video capture pipeline the fence would signal once the whole frame
is captured, so I think it could be a reasonable value to consider
later in the pipeline?

> >
> > Fences in V4L2 would be also useful for stateless decoders and any
> > mem-to-mem processors that operate in order, like the blitters
> > mentioned here or actually camera ISPs, which can be often chained
> > into relatively sophisticated pipelines.
>
> I agree fence can be used to optimize specific corner cases. They are
> not as critical in V4L2 since we have async queues.

I wouldn't call those corner cases. A stateful decoder is actually one
of the opposite extremes, because one would normally just decode and
show the frame, so not much complexity needed to handle it and async
queues actually work quite well.

I don't think async queues are very helpful for any more complicated
use cases. The userspace still needs to wake up and push the buffers
through the pipeline. If you have some depth across the whole
pipeline, with queues always having some buffers waiting to be
processed, fences indeed wouldn't change too much (+/- the CPU
time/power wasted on context switches). However, with real time use
cases, such as anything involving streaming from cameras, image
processing stage

Re: Support for 2D engines/blitters in V4L2 and DRM

2019-04-18 Thread Tomasz Figa
On Fri, Apr 19, 2019 at 9:30 AM Nicolas Dufresne  wrote:
>
> Le jeudi 18 avril 2019 à 10:18 +0200, Daniel Vetter a écrit :
> > > It would be cool if both could be used concurrently and not just return
> > > -EBUSY when the device is used with the other subsystem.
> >
> > We live in this world already :-) I think there's even patches (or merged
> > already) to add fences to v4l, for Android.
>
> This work is currently suspended. It will require some feature on DRM
> display to really make this useful, but there is also a lot of
> challanges in V4L2. In GFX space, most of the use case are about
> rendering as soon as possible. Though, in multimedia we have two
> problems, we need to synchronize the frame rendering with the audio,
> and output buffers may comes out of order due to how video CODECs are
> made.
>
> In the first, we'd need a mechanism where we can schedule a render at a
> specific time or vblank. We can of course already implement this in
> software, but with fences, the scheduling would need to be done in the
> driver. Then if the fence is signalled earlier, the driver should hold
> on until the delay is met. If the fence got signalled late, we also
> need to think of a workflow. As we can't schedule more then one render
> in DRM at one time, I don't really see yet how to make that work.
>
> For the second, it's complicated on V4L2 side. Currently we signal
> buffers when they are ready in the display order. With fences, we
> receive early pairs buffer and fence (in decoding order). There exist
> cases where reordering is done by the driver (stateful CODEC). We
> cannot schedule these immediately we would need a new mechanism to know
> which one come next. If we just reuse current mechnism, it would void
> the fence usage since the fence will always be signalled by the time it
> reaches DRM or other v4l2 component.
>
> There also other issues, for video capture pipeline, if you are not
> rendering ASAP, you need the HW timestamp in order to schedule. Again,
> we'd get the fence early, but the actual timestamp will be signalled at
> the very last minutes, so we also risk of turning the fence into pure
> overhead. Note that as we speak, I have colleagues who are
> experimenting with frame timestamp prediction that slaves to the
> effective timestamp (catching up over time). But we still have issues
> when the capture driver skipped a frame (missed a capture window).

Note that a fence has a timestamp internally and it can be queried for
it from the user space if exposed as a sync file:
https://elixir.bootlin.com/linux/v5.1-rc5/source/drivers/dma-buf/sync_file.c#L386

Fences in V4L2 would be also useful for stateless decoders and any
mem-to-mem processors that operate in order, like the blitters
mentioned here or actually camera ISPs, which can be often chained
into relatively sophisticated pipelines.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Support for 2D engines/blitters in V4L2 and DRM

2019-04-18 Thread Tomasz Figa
On Thu, Apr 18, 2019 at 6:14 PM Paul Kocialkowski
 wrote:
>
> Hi,
>
> On Thu, 2019-04-18 at 18:09 +0900, Tomasz Figa wrote:
> > On Thu, Apr 18, 2019 at 5:55 PM Paul Kocialkowski
> >  wrote:
> > > Hi Daniel,
> > >
> > > On Thu, 2019-04-18 at 10:18 +0200, Daniel Vetter wrote:
> > > > On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> > > > > Hi Nicolas,
> > > > >
> > > > > I'm detaching this thread from our V4L2 stateless decoding spec since
> > > > > it has drifted off and would certainly be interesting to DRM folks as
> > > > > well!
> > > > >
> > > > > For context: I was initially talking about writing up support for the
> > > > > Allwinner 2D engine as a DRM render driver, where I'd like to be able
> > > > > to batch jobs that affect the same destination buffer to only signal
> > > > > the out fence once when the batch is done. We have a similar issue in
> > > > > v4l2 where we'd like the destination buffer for a set of requests 
> > > > > (each
> > > > > covering one H264 slice) to be marked as done once the set was 
> > > > > decoded.
> > > > >
> >
> > Out of curiosity, what area did you find a 2D blitter useful for?
>
> The initial motivation is to bring up a DDX with that for platforms
> that have 2D engines but no free software GPU drivers yet.
>
> I also have a personal project in the works where I'd like to implement
> accelerated UI rendering in 2D. The idea is to avoid using GL entirely.
>
> That last point is in part because I have a GPU-less device that I want
> to get going with mainline: http://linux-sunxi.org/F60_Action_Camera

Okay, thanks.

I feel like the typical DRM model with a render node and a userspace
library would make sense for these specific use cases on these
specific hardware platforms then.

Hopefully the availability of open drivers for 3D engines continues to improve.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Support for 2D engines/blitters in V4L2 and DRM

2019-04-18 Thread Tomasz Figa
On Thu, Apr 18, 2019 at 5:55 PM Paul Kocialkowski
 wrote:
>
> Hi Daniel,
>
> On Thu, 2019-04-18 at 10:18 +0200, Daniel Vetter wrote:
> > On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote:
> > > Hi Nicolas,
> > >
> > > I'm detaching this thread from our V4L2 stateless decoding spec since
> > > it has drifted off and would certainly be interesting to DRM folks as
> > > well!
> > >
> > > For context: I was initially talking about writing up support for the
> > > Allwinner 2D engine as a DRM render driver, where I'd like to be able
> > > to batch jobs that affect the same destination buffer to only signal
> > > the out fence once when the batch is done. We have a similar issue in
> > > v4l2 where we'd like the destination buffer for a set of requests (each
> > > covering one H264 slice) to be marked as done once the set was decoded.
> > >

Out of curiosity, what area did you find a 2D blitter useful for?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 1/5] drm/rockchip: fix fb references in async update

2019-03-12 Thread Tomasz Figa
On Wed, Mar 13, 2019 at 12:52 AM Boris Brezillon
 wrote:
>
> On Tue, 12 Mar 2019 12:34:45 -0300
> Helen Koike  wrote:
>
> > On 3/12/19 3:34 AM, Boris Brezillon wrote:
> > > On Mon, 11 Mar 2019 23:21:59 -0300
> > > Helen Koike  wrote:
> > >
> > >> In the case of async update, modifications are done in place, i.e. in the
> > >> current plane state, so the new_state is prepared and the new_state is
> > >> cleanup up (instead of the old_state, diferrently on what happen in a
> > >
> > >   ^ cleaned up  ^ differently (but maybe
> > > "unlike what happens" is more appropriate here).
> > >
> > >> normal sync update).
> > >> To cleanup the old_fb properly, it needs to be placed in the new_state
> > >> in the end of async_update, so cleanup call will unreference the old_fb
> > >> correctly.
> > >>
> > >> Also, the previous code had a:
> > >>
> > >>plane_state = plane->funcs->atomic_duplicate_state(plane);
> > >>...
> > >>swap(plane_state, plane->state);
> > >>
> > >>if (plane->state->fb && plane->state->fb != new_state->fb) {
> > >>...
> > >>}
> > >>
> > >> Which was wrong, as the fb were just assigned to be equal, so this if
> > >> statement nevers evaluates to true.
> > >>
> > >> Another details is that the function drm_crtc_vblank_get() can only be
> > >> called when vop->is_enabled is true, otherwise it has no effect and
> > >> trows a WARN_ON().
> > >>
> > >> Calling drm_atomic_set_fb_for_plane() (which get a referent of the new
> > >> fb and pus the old fb) is not required, as it is taken care by
> > >> drm_mode_cursor_universal() when calling
> > >> drm_atomic_helper_update_plane().
> > >>
> > >> Signed-off-by: Helen Koike 
> > >>
> > >> ---
> > >> Hello,
> > >>
> > >> I tested on the rockchip ficus v1.1 using igt plane_cursor_legacy and
> > >> kms_cursor_legacy and I didn't see any regressions.
> > >>
> > >> Changes in v2: None
> > >>
> > >>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 42 -
> > >>  1 file changed, 24 insertions(+), 18 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
> > >> b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > >> index c7d4c6073ea5..a1ee8c156a7b 100644
> > >> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > >> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > >> @@ -912,30 +912,31 @@ static void vop_plane_atomic_async_update(struct 
> > >> drm_plane *plane,
> > >>  struct drm_plane_state *new_state)
> > >>  {
> > >>struct vop *vop = to_vop(plane->state->crtc);
> > >> -  struct drm_plane_state *plane_state;
> > >> +  struct drm_framebuffer *old_fb = plane->state->fb;
> > >>
> > >> -  plane_state = plane->funcs->atomic_duplicate_state(plane);
> > >> -  plane_state->crtc_x = new_state->crtc_x;
> > >> -  plane_state->crtc_y = new_state->crtc_y;
> > >> -  plane_state->crtc_h = new_state->crtc_h;
> > >> -  plane_state->crtc_w = new_state->crtc_w;
> > >> -  plane_state->src_x = new_state->src_x;
> > >> -  plane_state->src_y = new_state->src_y;
> > >> -  plane_state->src_h = new_state->src_h;
> > >> -  plane_state->src_w = new_state->src_w;
> > >> -
> > >> -  if (plane_state->fb != new_state->fb)
> > >> -  drm_atomic_set_fb_for_plane(plane_state, new_state->fb);
> > >> -
> > >> -  swap(plane_state, plane->state);
> > >> -
> > >> -  if (plane->state->fb && plane->state->fb != new_state->fb) {
> > >> +  /*
> > >> +   * A scanout can still be occurring, so we can't drop the reference to
> > >> +   * the old framebuffer. To solve this we get a reference to old_fb and
> > >> +   * set a worker to release it later.
> > >
> > > Hm, doesn't look like an async update to me if we have to wait for the
> > > next VBLANK to happen to get the new content on the screen. Maybe we
> > > should reject async updates when old_fb != new_fb in the rk
> > > ->async_check() hook.
> >
> > Unless I am misunderstanding this, we don't wait here, we just grab a
> > reference to the fb in case it is being still used by the hw, so it
> > doesn't get released prematurely.
>
> I was just reacting to the comment that says the new FB should stay
> around until the next VBLANK event happens. If the FB must stay around
> that probably means the HW is still using, which made me wonder if this
> HW actually supports async update (where async means "update now and
> don't care about about tearing"). Or maybe it takes some time to switch
> to the new FB and waiting for the next VBLANK to release the old FB was
> an easy solution to not wait for the flip to actually happen in
> ->async_update() (which is kind of a combination of async+non-blocking).

The hardware switches framebuffers on vblank, so whatever framebuffer
is currently being scanned out from needs to stay there until the
hardware switches to the new one in shadow registers. If that doesn't
happen, you get IOMMU faults and the display controller stops working
since we don't have any fault handling curr

Re: [PATCH] drm: add capability DRM_CAP_ASYNC_UPDATE

2018-12-20 Thread Tomasz Figa
On Thu, Dec 20, 2018 at 7:47 PM Daniel Vetter  wrote:
>
> On Thu, Dec 20, 2018 at 10:07 AM Tomasz Figa  wrote:
> >
> > Hi Helen,
> >
> > On Fri, Dec 14, 2018 at 10:35 AM Helen Koike  
> > wrote:
> > >
> > > Hi Tomasz,
> > >
> > > On 12/13/18 2:02 AM, Tomasz Figa wrote:
> > > > On Thu, Dec 6, 2018 at 1:12 AM Helen Koike  
> > > > wrote:
> > > >>
> > > >> Hi Ville
> > > >>
> > > >> On 11/27/18 11:34 AM, Ville Syrjälä wrote:
> > > >>> On Fri, Nov 23, 2018 at 07:53:26PM -0200, Helen Koike wrote:
> > > >>>> Allow userspace to identify if the driver supports async update.
> > > >>>
> > > >>> And what exactly is an "async update"?
> > > >>
> > > >> I agree we are lacking docs on this, I'll send in the next version as
> > > >> soon as we agree on a name (please see below).
> > > >>
> > > >> For reference:
> > > >> https://lists.freedesktop.org/archives/dri-devel/2017-April/138586.html
> > > >>
> > > >>>
> > > >>> I keep asking people to come up with the a better name for this, and 
> > > >>> to
> > > >>> document what it actually means. Recently I've been think we should
> > > >>> maybe just adopt the vulkan terminology (immediate/fifo/mailbox) to
> > > >>> avoid introducing yet another set of names for the same thing. We'd
> > > >>> still want to document things properly though.
> > > >>
> > > >> Another name it was suggested was "atomic amend", this feature 
> > > >> basically
> > > >> allows userspace to complement an update previously sent (i.e. its in
> > > >> the queue and wasn't commited yet), it allows adding a plane update to
> > > >> the next commit. So what do you think in renaming it to "atomic amend"?
> > > >
> > > > Note that it doesn't seem to be what the code currently is doing. For
> > > > example, for cursor updates, it doesn't seem to be working on the
> > > > currently pending commit, but just directly issues an atomic async
> > > > update call to the planes. The code actually seems to fall back to a
> > > > normal sync commit, if there is an already pending commit touching the
> > > > same plane or including a modeset.
> > >
> > > It should fail as discussed at:
> > > https://patchwork.freedesktop.org/patch/243088/
> > >
> > > There was the following code inside the drm_atomic_helper_async_check()
> > > in the previous patch which would fallback to a normal commit if there
> > > isn't any pending commit to amend:
> > >
> > > +   if (!old_plane_state->commit)
> > > +   return -EINVAL;
> > >
> > > In the v2 of the patch https://patchwork.freedesktop.org/patch/263712/
> > > this got removed, but which means that async update will be enabled
> > > anyway. So the following code is wrong:
> > >
> > > -   if (state->legacy_cursor_update)
> > > +   if (state->async_update || state->legacy_cursor_update)
> > > state->async_update = !drm_atomic_helper_async_check(dev, 
> > > state);
> > >
> > > Does it make sense? If yes I'll fix this in the next version of the
> > > Atomic IOCTL patch (and also those two patches should be in the same
> > > series, I'll send them together next time).
> > >
> > > Thanks for pointing this out.
> > >
> > > Please let me know if you still don't agree on the name "atomic amend",
> > > or if I am missing something.
> >
> > I'll defer it to the DRM maintainers. From Chrome OS perspective, we
> > need a way to commit the cursor plane asynchronously from other
> > commits any time the cursor changes its position or framebuffer. As
> > long as the new API allows that and the maintainers are fine with it,
> > I think I should be okay with it too.
>
> If this is just about the cursor, why is the current legacy cursor
> ioctl not good enough? It's 2 ioctls instead of one, but I'm not sure
> if we want to support having a normal atomic commit and a cursor
> update in the same atomic ioctl, coming up with reasonable semantics
> for that will be complicated.
>
> Pointer to code that uses this

Re: [PATCH] drm: add capability DRM_CAP_ASYNC_UPDATE

2018-12-20 Thread Tomasz Figa
Hi Helen,

On Fri, Dec 14, 2018 at 10:35 AM Helen Koike  wrote:
>
> Hi Tomasz,
>
> On 12/13/18 2:02 AM, Tomasz Figa wrote:
> > On Thu, Dec 6, 2018 at 1:12 AM Helen Koike  
> > wrote:
> >>
> >> Hi Ville
> >>
> >> On 11/27/18 11:34 AM, Ville Syrjälä wrote:
> >>> On Fri, Nov 23, 2018 at 07:53:26PM -0200, Helen Koike wrote:
> >>>> Allow userspace to identify if the driver supports async update.
> >>>
> >>> And what exactly is an "async update"?
> >>
> >> I agree we are lacking docs on this, I'll send in the next version as
> >> soon as we agree on a name (please see below).
> >>
> >> For reference:
> >> https://lists.freedesktop.org/archives/dri-devel/2017-April/138586.html
> >>
> >>>
> >>> I keep asking people to come up with the a better name for this, and to
> >>> document what it actually means. Recently I've been think we should
> >>> maybe just adopt the vulkan terminology (immediate/fifo/mailbox) to
> >>> avoid introducing yet another set of names for the same thing. We'd
> >>> still want to document things properly though.
> >>
> >> Another name it was suggested was "atomic amend", this feature basically
> >> allows userspace to complement an update previously sent (i.e. its in
> >> the queue and wasn't commited yet), it allows adding a plane update to
> >> the next commit. So what do you think in renaming it to "atomic amend"?
> >
> > Note that it doesn't seem to be what the code currently is doing. For
> > example, for cursor updates, it doesn't seem to be working on the
> > currently pending commit, but just directly issues an atomic async
> > update call to the planes. The code actually seems to fall back to a
> > normal sync commit, if there is an already pending commit touching the
> > same plane or including a modeset.
>
> It should fail as discussed at:
> https://patchwork.freedesktop.org/patch/243088/
>
> There was the following code inside the drm_atomic_helper_async_check()
> in the previous patch which would fallback to a normal commit if there
> isn't any pending commit to amend:
>
> +   if (!old_plane_state->commit)
> +   return -EINVAL;
>
> In the v2 of the patch https://patchwork.freedesktop.org/patch/263712/
> this got removed, but which means that async update will be enabled
> anyway. So the following code is wrong:
>
> -   if (state->legacy_cursor_update)
> +   if (state->async_update || state->legacy_cursor_update)
> state->async_update = !drm_atomic_helper_async_check(dev, 
> state);
>
> Does it make sense? If yes I'll fix this in the next version of the
> Atomic IOCTL patch (and also those two patches should be in the same
> series, I'll send them together next time).
>
> Thanks for pointing this out.
>
> Please let me know if you still don't agree on the name "atomic amend",
> or if I am missing something.

I'll defer it to the DRM maintainers. From Chrome OS perspective, we
need a way to commit the cursor plane asynchronously from other
commits any time the cursor changes its position or framebuffer. As
long as the new API allows that and the maintainers are fine with it,
I think I should be okay with it too.

Best regards,
Tomasz

>
> Helen
>
> >
> > Best regards,
> > Tomasz
> >
> >> Or do you suggest another name? I am not familiar with vulkan terminology.
> >>
> >>
> >> Thanks
> >> Helen
> >>
> >>>
> >>>>
> >>>> Signed-off-by: Enric Balletbo i Serra 
> >>>> [prepared for upstream]
> >>>> Signed-off-by: Helen Koike 
> >>>>
> >>>> ---
> >>>> Hi,
> >>>>
> >>>> This patch introduces the ASYNC_UPDATE cap, which originated from the
> >>>> discussion regarding DRM_MODE_ATOMIC_AMEND on [1], to allow user to
> >>>> figure that async_update exists.
> >>>>
> >>>> This was tested using a small program that exercises the uAPI for easy
> >>>> sanity testing. The program was created by Alexandros and modified by
> >>>> Enric to test the capability flag [2].
> >>>>
> >>>> The test worked on a rockchip Ficus v1.1 board on top of mainline plus
> >>>> the patch to update cursors asynchronously through atomic plus the patch
> >>>> that introduces the ATO

Re: [PATCH] dts: rockchip: rk3066: add qos_hdmi and HCLK_HDMI to pmu node

2018-12-16 Thread Tomasz Figa
Hi Johan,

On Sun, Dec 16, 2018 at 12:03 AM Johan Jonker  wrote:
>
> A MK808 TV stick with rk3066 processor boots normal with logo and console.
> After the boot the monitor remains black.
> This patch fixes a vblank timeout crash by adding qos_hdmi and
> HCLK_HDMI to the pmu node.
> The HCLK_HDMI clock and the RK3066_PD_VIO power domain
> will now turn on and off together.
>
> Signed-off-by: Johan Jonker 
> ---
>  arch/arm/boot/dts/rk3066a.dtsi | 6 --
>  arch/arm/boot/dts/rk3xxx.dtsi  | 5 +
>  2 files changed, 9 insertions(+), 2 deletions(-)
>

Thanks for the patch. Unfortunately, it looks like you didn't add the
necessary mailing lists to the recipient list. For reference, the
./scripts/get_maintainer.pl script in the kernel source tree should be
able to give you a reasonable recipient list. For now, I added the
mailing lists on CC and replied without snipping, so people should be
still able to review the patch.

Other than that, It looks reasonable to me, but we need someone with
access to SoC documentation to check it. Heiko, Sandy, is that
something you would be able to help with?

> diff --git a/arch/arm/boot/dts/rk3066a.dtsi b/arch/arm/boot/dts/rk3066a.dtsi
> index 30dc8af0b..6e7cdde84 100644
> --- a/arch/arm/boot/dts/rk3066a.dtsi
> +++ b/arch/arm/boot/dts/rk3066a.dtsi
> @@ -672,13 +672,15 @@
>  <&cru ACLK_IPP>,
>  <&cru HCLK_IPP>,
>  <&cru ACLK_RGA>,
> -<&cru HCLK_RGA>;
> +<&cru HCLK_RGA>,
> +<&cru HCLK_HDMI>;
> pm_qos = <&qos_lcdc0>,
>  <&qos_lcdc1>,
>  <&qos_cif0>,
>  <&qos_cif1>,
>  <&qos_ipp>,
> -<&qos_rga>;
> +<&qos_rga>,
> +<&qos_hdmi>;
> };
>
> pd_video@RK3066_PD_VIDEO {
> diff --git a/arch/arm/boot/dts/rk3xxx.dtsi b/arch/arm/boot/dts/rk3xxx.dtsi
> index 97307a405..1f9496e81 100644
> --- a/arch/arm/boot/dts/rk3xxx.dtsi
> +++ b/arch/arm/boot/dts/rk3xxx.dtsi
> @@ -187,6 +187,11 @@
> reg = <0x1012f280 0x20>;
> };
>
> +   qos_hdmi: qos@1012f300 {
> +   compatible = "syscon";
> +   reg = <0x1012f300 0x20>;
> +   };
> +

Is this really common for all rk3xxx SoCs?

> usb_otg: usb@1018 {
> compatible = "rockchip,rk3066-usb", "snps,dwc2";
> reg = <0x1018 0x4>;
> --
> 2.11.0
>

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] drm_rockchip: rk3066_hdmi: No driver support for vblank timestamp query.

2018-12-13 Thread Tomasz Figa
Hi Johan,

On Fri, Dec 14, 2018 at 2:32 AM Johan Jonker  wrote:
>
> Bug fix? (PART 7)
>
> A little bit of success here.
> Penguins and other colors are on the screen.
> DRM and FB old style seems to work with DVI-D.
> Pure HDMI and sound not tested.
> I think 'someone' forgot to add HCLK_HDMI to the pmu node.
>
> Added a qos_hdmi idea. Let me know if that's OK?
> Please advise what for qos_hdmi address I can use here.

Thanks for keeping investigating this and good to hear that you've
made some progress. Would you be able to send a patch following the
patch submission guidelines? Please see the link below.
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2] drm/atomic: add ATOMIC_AMEND flag to the Atomic IOCTL.

2018-12-12 Thread Tomasz Figa
Hi Helen,

On Sat, Nov 24, 2018 at 6:54 AM Helen Koike  wrote:
>
> This flag tells core to jump ahead the queued update if the conditions
> in drm_atomic_async_check() are met. That means we are only able to do an
> async update if no modeset is pending and update for the same plane is
> not queued.

First of all, thanks for the patch. Please see my comments below.

If the description above applies (and AFAICT that's what the existing
code does indeed), then this doesn't sound like "amend" to me. It
sounds exactly as the kernel code calls it - "async update" or perhaps
"instantaneous commit" could better describe it?

>
> It uses the already in place infrastructure for async updates.
>
> It is useful for cursor updates and async PageFlips over the atomic
> ioctl, otherwise in some cases updates may be delayed to the point the
> user will notice it. Note that for now it's only enabled for cursor
> planes.
>
> DRM_MODE_ATOMIC_AMEND should be passed to the Atomic IOCTL to use this
> feature.
>
> Signed-off-by: Gustavo Padovan 
> Signed-off-by: Enric Balletbo i Serra 
> [updated for upstream]
> Signed-off-by: Helen Koike 
> ---
> Hi,
>
> This is the second attempt to introduce the new ATOMIC_AMEND flag for atomic
> operations, see the commit message for a more detailed description.
>
> This was tested using a small program that exercises the uAPI for easy
> sanity testing. The program was created by Alexandros and modified by
> Enric to test the capability flag [2].
>
> To test, just build the program and use the --atomic flag to use the cursor
> plane with the ATOMIC_AMEND flag. E.g.
>
>   drm_cursor --atomic
>
> The test worked on a rockchip Ficus v1.1 board on top of mainline plus
> the patch to update cursors asynchronously through atomic for the
> drm/rockchip driver plus the DRM_CAP_ASYNC_UPDATE patch.
>
> Alexandros also did a proof-of-concept to use this flag and draw cursors
> using atomic if possible on ozone [1].
>
> Thanks
> Helen
>
> [1] https://chromium-review.googlesource.com/c/chromium/src/+/1092711
> [2] https://gitlab.collabora.com/eballetbo/drm-cursor/commits/async-capability
>
>
> Changes in v2:
> - rebase tree
> - do not fall back to a non-async update if if there isn't any
> pending commit to amend
>
> Changes in v1:
> - https://patchwork.freedesktop.org/patch/243088/
> - Only enable it if userspace requests it.
> - Only allow async update for cursor type planes.
> - Rename ASYNC_UPDATE for ATOMIC_AMEND.
>
>  drivers/gpu/drm/drm_atomic_helper.c | 6 +-
>  drivers/gpu/drm/drm_atomic_uapi.c   | 6 ++
>  include/uapi/drm/drm_mode.h | 4 +++-
>  3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index 269f1a74de38..333190c6a0a4 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -934,7 +934,7 @@ int drm_atomic_helper_check(struct drm_device *dev,
> if (ret)
> return ret;
>
> -   if (state->legacy_cursor_update)
> +   if (state->async_update || state->legacy_cursor_update)
> state->async_update = !drm_atomic_helper_async_check(dev, 
> state);
>
> return ret;
> @@ -1602,6 +1602,10 @@ int drm_atomic_helper_async_check(struct drm_device 
> *dev,
> if (new_plane_state->fence)
> return -EINVAL;
>
> +   /* Only allow async update for cursor type planes. */
> +   if (plane->type != DRM_PLANE_TYPE_CURSOR)
> +   return -EINVAL;
> +

So the existing upstream code already allowed this for any planes and
we're restricting this to cursor planes only. Is this expected? No
potential users that already started using this for other plane types?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: add capability DRM_CAP_ASYNC_UPDATE

2018-12-12 Thread Tomasz Figa
On Thu, Dec 6, 2018 at 1:12 AM Helen Koike  wrote:
>
> Hi Ville
>
> On 11/27/18 11:34 AM, Ville Syrjälä wrote:
> > On Fri, Nov 23, 2018 at 07:53:26PM -0200, Helen Koike wrote:
> >> Allow userspace to identify if the driver supports async update.
> >
> > And what exactly is an "async update"?
>
> I agree we are lacking docs on this, I'll send in the next version as
> soon as we agree on a name (please see below).
>
> For reference:
> https://lists.freedesktop.org/archives/dri-devel/2017-April/138586.html
>
> >
> > I keep asking people to come up with the a better name for this, and to
> > document what it actually means. Recently I've been think we should
> > maybe just adopt the vulkan terminology (immediate/fifo/mailbox) to
> > avoid introducing yet another set of names for the same thing. We'd
> > still want to document things properly though.
>
> Another name it was suggested was "atomic amend", this feature basically
> allows userspace to complement an update previously sent (i.e. its in
> the queue and wasn't commited yet), it allows adding a plane update to
> the next commit. So what do you think in renaming it to "atomic amend"?

Note that it doesn't seem to be what the code currently is doing. For
example, for cursor updates, it doesn't seem to be working on the
currently pending commit, but just directly issues an atomic async
update call to the planes. The code actually seems to fall back to a
normal sync commit, if there is an already pending commit touching the
same plane or including a modeset.

Best regards,
Tomasz

> Or do you suggest another name? I am not familiar with vulkan terminology.
>
>
> Thanks
> Helen
>
> >
> >>
> >> Signed-off-by: Enric Balletbo i Serra 
> >> [prepared for upstream]
> >> Signed-off-by: Helen Koike 
> >>
> >> ---
> >> Hi,
> >>
> >> This patch introduces the ASYNC_UPDATE cap, which originated from the
> >> discussion regarding DRM_MODE_ATOMIC_AMEND on [1], to allow user to
> >> figure that async_update exists.
> >>
> >> This was tested using a small program that exercises the uAPI for easy
> >> sanity testing. The program was created by Alexandros and modified by
> >> Enric to test the capability flag [2].
> >>
> >> The test worked on a rockchip Ficus v1.1 board on top of mainline plus
> >> the patch to update cursors asynchronously through atomic plus the patch
> >> that introduces the ATOMIC_AMEND flag for the drm/rockchip driver.
> >>
> >> To test, just build the program and use the --atomic flag to use the cursor
> >> plane with the ATOMIC_AMEND flag. E.g.
> >>
> >>   drm_cursor --atomic
> >>
> >> [1] https://patchwork.freedesktop.org/patch/243088/
> >> [2] 
> >> https://gitlab.collabora.com/eballetbo/drm-cursor/commits/async-capability
> >>
> >> Thanks
> >> Helen
> >>
> >>
> >>  drivers/gpu/drm/drm_ioctl.c | 11 +++
> >>  include/uapi/drm/drm.h  |  1 +
> >>  2 files changed, 12 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> >> index 94bd872d56c4..4a7e0f874171 100644
> >> --- a/drivers/gpu/drm/drm_ioctl.c
> >> +++ b/drivers/gpu/drm/drm_ioctl.c
> >> @@ -31,6 +31,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  #include "drm_legacy.h"
> >>  #include "drm_internal.h"
> >>  #include "drm_crtc_internal.h"
> >> @@ -229,6 +230,7 @@ static int drm_getcap(struct drm_device *dev, void 
> >> *data, struct drm_file *file_
> >>  {
> >>  struct drm_get_cap *req = data;
> >>  struct drm_crtc *crtc;
> >> +struct drm_plane *plane;
> >>
> >>  req->value = 0;
> >>
> >> @@ -292,6 +294,15 @@ static int drm_getcap(struct drm_device *dev, void 
> >> *data, struct drm_file *file_
> >>  case DRM_CAP_CRTC_IN_VBLANK_EVENT:
> >>  req->value = 1;
> >>  break;
> >> +case DRM_CAP_ASYNC_UPDATE:
> >> +req->value = 1;
> >> +list_for_each_entry(plane, &dev->mode_config.plane_list, 
> >> head) {
> >> +if (!plane->helper_private->atomic_async_update) {
> >> +req->value = 0;
> >> +break;
> >> +}
> >> +}
> >> +break;
> >>  default:
> >>  return -EINVAL;
> >>  }
> >> diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> >> index 300f336633f2..ff01540cbb1d 100644
> >> --- a/include/uapi/drm/drm.h
> >> +++ b/include/uapi/drm/drm.h
> >> @@ -649,6 +649,7 @@ struct drm_gem_open {
> >>  #define DRM_CAP_PAGE_FLIP_TARGET0x11
> >>  #define DRM_CAP_CRTC_IN_VBLANK_EVENT0x12
> >>  #define DRM_CAP_SYNCOBJ 0x13
> >> +#define DRM_CAP_ASYNC_UPDATE0x14
> >>
> >>  /** DRM_IOCTL_GET_CAP ioctl argument type */
> >>  struct drm_get_cap {
> >> --
> >> 2.19.1
> >>
> >> ___
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
__

Re: [BUG] drm_rockchip: rk3066_hdmi: No driver support for vblank timestamp query.

2018-12-09 Thread Tomasz Figa
Hi Johan,

It looks like VOP on RK3066 is not officially supported by upstream, so
what you're seeing is not a bug, it's just expected behavior, because
nobody had the time (or need) to enable support for your hardware yet.

I added all the people that may be potentially thinking to add support for
this SoC, but they have no obligation to do so. If you are in an urgent
need, I think you may have more luck asking your hardware or SoC vendor
directly.

Best regards,
Tomasz


On Sat, Dec 1, 2018 at 3:53 AM Johan Jonker  wrote:

> Hi,
>
> This is about a TV stick called MK808.
> Enabled VOP an HDMI for rk3066.
> Able to see pinguins at boot.
>
> Found similar bug reports for rk3399.
>
> http://lists.infradead.org/pipermail/linux-rockchip/2018-April/020426.html
> http://lists.infradead.org/pipermail/linux-rockchip/2018-April/020427.html
> http://lists.infradead.org/pipermail/linux-rockchip/2018-April/020428.html
>
>
> This patch doesn't seem to work for me.
>
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> @@ -1601,6 +1601,8 @@ static void vop_unbind(struct device *dev, struct
> device *master, void *data)
>   {
>  struct vop *vop = dev_get_drvdata(dev);
>
> +   // Pair with the initial disable_irq()
> +   enable_irq(vop->irq);
>
> Compared to rk3399 the rk3066 doesn't seem to have iommu.
>
> [0.383273] rockchip-drm display-subsystem:
> [drm:rockchip_drm_platform_probe] no iommu attached for /vop@1010c000,
> using non-iommu buffers
>
> Bugs as usual:
>
>
> [4.730057] rockchip-vop 1010c000.vop: [drm:vop_crtc_atomic_flush]
> *ERROR* VOP vblank IRQ stuck for 10 ms
>
> [  596.422383] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR*
> [CRTC:30:crtc-0] flip_done timed out
>
> [  606.661508] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR*
> [CONNECTOR:33:HDMI-A-1] flip_done timed out
>
> [  616.901899] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR*
> [PLANE:28:plane-0] flip_done timed out
>
>
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2018-12-02 Thread Tomasz Figa
device is%sbehind an iommu\n",
> iommu ? " " : " not ");
>
> +   /*
> +* There is at least one case where the driver wants to directly
> +* manage the IOMMU, but if we end up with iommu dma_ops, that
> +* interferes with the drivers ability to use dma_map_sg() for
> +* cache operations.  Since we don't currently have a better
> +* solution, and this code runs before the driver is probed and
> +* has a chance to intervene, use a simple blacklist to avoid
> +* ending up with iommu dma_ops:
> +*/
> +   if (of_match_device(iommu_blacklist, dev)) {
> +   dev_dbg(dev, "skipping iommu hookup\n");
> +   iommu = NULL;
> +   }
> +
> arch_setup_dma_ops(dev, dma_addr, size, iommu, coherent);
>
> return 0;
> --
> 2.19.2
>

+Marek Szyprowski who I believe had a similar problem with Exynos DRM before.

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-12-02 Thread Tomasz Figa
On Sat, Dec 1, 2018 at 3:47 AM Rob Clark  wrote:
>
> On Fri, Nov 30, 2018 at 9:05 PM Tomasz Figa  wrote:
> >
> > On Thu, Nov 29, 2018 at 4:23 PM Tomasz Figa  wrote:
> > >
> > > On Thu, Nov 29, 2018 at 12:03 PM Robin Murphy  
> > > wrote:
> > > >
> > > > On 29/11/2018 19:57, Tomasz Figa wrote:
> > > > > On Thu, Nov 29, 2018 at 11:40 AM Jordan Crouse 
> > > > >  wrote:
> > > > >>
> > > > >> On Thu, Nov 29, 2018 at 01:48:15PM -0500, Rob Clark wrote:
> > > > >>> On Thu, Nov 29, 2018 at 10:54 AM Christoph Hellwig  
> > > > >>> wrote:
> > > > >>>>
> > > > >>>> On Thu, Nov 29, 2018 at 09:42:50AM -0500, Rob Clark wrote:
> > > > >>>>> Maybe the thing we need to do is just implement a blacklist of
> > > > >>>>> compatible strings for devices which should skip the automatic
> > > > >>>>> iommu/dma hookup.  Maybe a bit ugly, but it would also solve a 
> > > > >>>>> problem
> > > > >>>>> preventing us from enabling per-process pagetables for a5xx 
> > > > >>>>> (where we
> > > > >>>>> need to control the domain/context-bank that is allocated by the 
> > > > >>>>> dma
> > > > >>>>> api).
> > > > >>>>
> > > > >>>> You can detach from the dma map attachment using 
> > > > >>>> arm_iommu_detach_device,
> > > > >>>> which a few drm drivers do, but I don't think this is the problem.
> > > > >>>
> > > > >>> I think even with detach, we wouldn't end up with the context-bank
> > > > >>> that the gpu firmware was hard-coded to expect, and so it would
> > > > >>> overwrite the incorrect page table address register.  (I could be
> > > > >>> mis-remembering that, Jordan spent more time looking at that.  But 
> > > > >>> it
> > > > >>> was something along those lines.)
> > > > >>
> > > > >> Right - basically the DMA domain steals context bank 0 and the GPU 
> > > > >> is hard coded
> > > > >> to use that context bank for pagetable switching.
> > > > >>
> > > > >> I believe the Tegra guys also had a similar problem with a hard 
> > > > >> coded context
> > > > >> bank.
> > > >
> > > > AIUI, they don't need a specific hardware context, they just need to
> > > > know which one they're actually using, which the domain abstraction 
> > > > hides.
> > > >
> > > > > Wait, if we detach the GPU/display struct device from the default
> > > > > domain and attach it to a newly allocated domain, wouldn't the newly
> > > > > allocated domain use the context bank we need? Note that we're already
> > > >
> > > > The arm-smmu driver doesn't, but there's no fundamental reason it
> > > > couldn't. That should just need code to refcount domain users and
> > > > release hardware contexts for domains with no devices currently 
> > > > attached.
> > > >
> > > > Robin.
> > > >
> > > > > doing that, except that we're doing it behind the back of the DMA
> > > > > mapping subsystem, so that it keeps using the IOMMU version of the DMA
> > > > > ops for the device and doing any mapping operations on the default
> > > > > domain. If we ask the DMA mapping to detach, wouldn't it essentially
> > > > > solve the problem?
> > >
> > > Thanks Robin.
> > >
> > > Still, my point is that the MSM DRM driver attaches the GPU struct
> > > device to a new domain it allocates using iommu_domain_alloc() and it
> > > seems to work fine, so I believe it's not the problem we're looking
> > > into with this patch.
> >
> > Could we just make the MSM DRM call arch_teardown_dma_ops() and then
> > arch_setup_dma_ops() with the `iommu` argument set to NULL and be done
> > with it?
>
> I don't think those are exported to modules?
>

Indeed, if we compile MSM DRM as modules, it wouldn't work...

> I have actually a simpler patch, that adds a small blacklist to check
> in of_dma_configure() before calling arch_setup_dma_ops(), which can
> replace this patch.  It also solves the problem of dma api allocating
> the context bank that he gpu wants to use for context-switching, and
> should be a simple thing to backport to stable branches.
>
> I was just spending some time trying to figure out what changed
> recently to start causing dma_map_sg() to opps on boot for us, so I
> could write a more detailed commit msg.

Yeah, that sounds much better, thanks. Reviewed that patch.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-30 Thread Tomasz Figa
On Thu, Nov 29, 2018 at 4:23 PM Tomasz Figa  wrote:
>
> On Thu, Nov 29, 2018 at 12:03 PM Robin Murphy  wrote:
> >
> > On 29/11/2018 19:57, Tomasz Figa wrote:
> > > On Thu, Nov 29, 2018 at 11:40 AM Jordan Crouse  
> > > wrote:
> > >>
> > >> On Thu, Nov 29, 2018 at 01:48:15PM -0500, Rob Clark wrote:
> > >>> On Thu, Nov 29, 2018 at 10:54 AM Christoph Hellwig  wrote:
> > >>>>
> > >>>> On Thu, Nov 29, 2018 at 09:42:50AM -0500, Rob Clark wrote:
> > >>>>> Maybe the thing we need to do is just implement a blacklist of
> > >>>>> compatible strings for devices which should skip the automatic
> > >>>>> iommu/dma hookup.  Maybe a bit ugly, but it would also solve a problem
> > >>>>> preventing us from enabling per-process pagetables for a5xx (where we
> > >>>>> need to control the domain/context-bank that is allocated by the dma
> > >>>>> api).
> > >>>>
> > >>>> You can detach from the dma map attachment using 
> > >>>> arm_iommu_detach_device,
> > >>>> which a few drm drivers do, but I don't think this is the problem.
> > >>>
> > >>> I think even with detach, we wouldn't end up with the context-bank
> > >>> that the gpu firmware was hard-coded to expect, and so it would
> > >>> overwrite the incorrect page table address register.  (I could be
> > >>> mis-remembering that, Jordan spent more time looking at that.  But it
> > >>> was something along those lines.)
> > >>
> > >> Right - basically the DMA domain steals context bank 0 and the GPU is 
> > >> hard coded
> > >> to use that context bank for pagetable switching.
> > >>
> > >> I believe the Tegra guys also had a similar problem with a hard coded 
> > >> context
> > >> bank.
> >
> > AIUI, they don't need a specific hardware context, they just need to
> > know which one they're actually using, which the domain abstraction hides.
> >
> > > Wait, if we detach the GPU/display struct device from the default
> > > domain and attach it to a newly allocated domain, wouldn't the newly
> > > allocated domain use the context bank we need? Note that we're already
> >
> > The arm-smmu driver doesn't, but there's no fundamental reason it
> > couldn't. That should just need code to refcount domain users and
> > release hardware contexts for domains with no devices currently attached.
> >
> > Robin.
> >
> > > doing that, except that we're doing it behind the back of the DMA
> > > mapping subsystem, so that it keeps using the IOMMU version of the DMA
> > > ops for the device and doing any mapping operations on the default
> > > domain. If we ask the DMA mapping to detach, wouldn't it essentially
> > > solve the problem?
>
> Thanks Robin.
>
> Still, my point is that the MSM DRM driver attaches the GPU struct
> device to a new domain it allocates using iommu_domain_alloc() and it
> seems to work fine, so I believe it's not the problem we're looking
> into with this patch.

Could we just make the MSM DRM call arch_teardown_dma_ops() and then
arch_setup_dma_ops() with the `iommu` argument set to NULL and be done
with it?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-29 Thread Tomasz Figa
On Thu, Nov 29, 2018 at 12:03 PM Robin Murphy  wrote:
>
> On 29/11/2018 19:57, Tomasz Figa wrote:
> > On Thu, Nov 29, 2018 at 11:40 AM Jordan Crouse  
> > wrote:
> >>
> >> On Thu, Nov 29, 2018 at 01:48:15PM -0500, Rob Clark wrote:
> >>> On Thu, Nov 29, 2018 at 10:54 AM Christoph Hellwig  wrote:
> >>>>
> >>>> On Thu, Nov 29, 2018 at 09:42:50AM -0500, Rob Clark wrote:
> >>>>> Maybe the thing we need to do is just implement a blacklist of
> >>>>> compatible strings for devices which should skip the automatic
> >>>>> iommu/dma hookup.  Maybe a bit ugly, but it would also solve a problem
> >>>>> preventing us from enabling per-process pagetables for a5xx (where we
> >>>>> need to control the domain/context-bank that is allocated by the dma
> >>>>> api).
> >>>>
> >>>> You can detach from the dma map attachment using arm_iommu_detach_device,
> >>>> which a few drm drivers do, but I don't think this is the problem.
> >>>
> >>> I think even with detach, we wouldn't end up with the context-bank
> >>> that the gpu firmware was hard-coded to expect, and so it would
> >>> overwrite the incorrect page table address register.  (I could be
> >>> mis-remembering that, Jordan spent more time looking at that.  But it
> >>> was something along those lines.)
> >>
> >> Right - basically the DMA domain steals context bank 0 and the GPU is hard 
> >> coded
> >> to use that context bank for pagetable switching.
> >>
> >> I believe the Tegra guys also had a similar problem with a hard coded 
> >> context
> >> bank.
>
> AIUI, they don't need a specific hardware context, they just need to
> know which one they're actually using, which the domain abstraction hides.
>
> > Wait, if we detach the GPU/display struct device from the default
> > domain and attach it to a newly allocated domain, wouldn't the newly
> > allocated domain use the context bank we need? Note that we're already
>
> The arm-smmu driver doesn't, but there's no fundamental reason it
> couldn't. That should just need code to refcount domain users and
> release hardware contexts for domains with no devices currently attached.
>
> Robin.
>
> > doing that, except that we're doing it behind the back of the DMA
> > mapping subsystem, so that it keeps using the IOMMU version of the DMA
> > ops for the device and doing any mapping operations on the default
> > domain. If we ask the DMA mapping to detach, wouldn't it essentially
> > solve the problem?

Thanks Robin.

Still, my point is that the MSM DRM driver attaches the GPU struct
device to a new domain it allocates using iommu_domain_alloc() and it
seems to work fine, so I believe it's not the problem we're looking
into with this patch.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-29 Thread Tomasz Figa
On Thu, Nov 29, 2018 at 11:40 AM Jordan Crouse  wrote:
>
> On Thu, Nov 29, 2018 at 01:48:15PM -0500, Rob Clark wrote:
> > On Thu, Nov 29, 2018 at 10:54 AM Christoph Hellwig  wrote:
> > >
> > > On Thu, Nov 29, 2018 at 09:42:50AM -0500, Rob Clark wrote:
> > > > Maybe the thing we need to do is just implement a blacklist of
> > > > compatible strings for devices which should skip the automatic
> > > > iommu/dma hookup.  Maybe a bit ugly, but it would also solve a problem
> > > > preventing us from enabling per-process pagetables for a5xx (where we
> > > > need to control the domain/context-bank that is allocated by the dma
> > > > api).
> > >
> > > You can detach from the dma map attachment using arm_iommu_detach_device,
> > > which a few drm drivers do, but I don't think this is the problem.
> >
> > I think even with detach, we wouldn't end up with the context-bank
> > that the gpu firmware was hard-coded to expect, and so it would
> > overwrite the incorrect page table address register.  (I could be
> > mis-remembering that, Jordan spent more time looking at that.  But it
> > was something along those lines.)
>
> Right - basically the DMA domain steals context bank 0 and the GPU is hard 
> coded
> to use that context bank for pagetable switching.
>
> I believe the Tegra guys also had a similar problem with a hard coded context
> bank.

Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already
doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-29 Thread Tomasz Figa
[CC Marek]

On Thu, Nov 29, 2018 at 9:09 AM Daniel Vetter  wrote:
>
> On Thu, Nov 29, 2018 at 5:57 PM Christoph Hellwig  wrote:
> > On Thu, Nov 29, 2018 at 05:28:07PM +0100, Daniel Vetter wrote:
> > > Just spend a bit of time reading through the implementations already
> > > merged. Is the struct device *dev parameter actually needed anywhere?
> > > dma-api definitely needs it, because we need that to pick the right iommu.
> > > But for cache management from what I've seen the target device doesn't
> > > matter, all the target specific stuff will be handled by the iommu.
> >
> > It looks like only mips every uses the dev argument, and even there
> > the function it passes dev to ignores it.  So it could probably be removed.
> >

Whether the cache maintenance operation needs to actually do anything
or not is a function of `dev`. We can have some devices that are
coherent with CPU caches, and some that are not, on the same system.

> > >
> > > Dropping the dev parameter would make this a perfect fit for coherency
> > > management of buffers used by multiple devices. Right now there's all
> > > kinds of nasty tricks for that use cases needed to avoid redundant
> > > flushes.
> >
> > Note that one thing I'd like to avoid is exposing these funtions directly
> > to drivers, as that will get us into all kinds of abuses.
>
> What kind of abuse do you expect? It could very well be that gpu folks
> call that "standard use case" ... At least on x86 with the i915 driver
> we pretty much rely on architectural guarantees for how cache flushes
> work very much. Down to userspace doing the cache flushing for
> mappings the kernel has set up.

i915 is a very specific case of a fully contained,
architecture-specific hardware subsystem, where you can just hardcode
all integration details inside the driver, because nobody else would
care.

In ARM world, you can have the same IP blocks licensed by multiple SoC
vendors with different integration details and that often includes the
option of coherency.

>
> > So I'd much prefer if we could have iommu APIs wrapping these that are
> > specific to actual uses cases that we understand well.
> >
> > As for the buffer sharing: at least for the DMA API side I want to
> > move the current buffer sharing users away from dma_alloc_coherent
> > (and coherent dma_alloc_attrs users) and the remapping done in there
> > required for non-coherent architectures.  Instead I'd like to allocate
> > plain old pages, and then just dma map them for each device separately,
> > with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
> > or last user to unmap.  On the iommu side it could probably work
> > similar.
>
> I think this is what's often done. Except then there's also the issue
> of how to get at the cma allocator if your device needs something
> contiguous. There's a lot of that still going on in graphics/media.

I suppose one could just expose CMA with the default pool directly.
It's just an allocator, so not sure why it would need any
device-specific information.

There is also the use case of using CMA with device-specific pools of
memory reusable by the system when not used by the device and those
would have to somehow get the pool to allocate from, but I wonder if
struct device is the right way to pass such information. I'd see the
pool given explicitly like cma_alloc(struct cma_pool *, size, flags)
and perhaps a wrapper cma_alloc_default(size, flags) that is just a
simple macro calling cma_alloc(&cma_pool_default, size, flags).

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-27 Thread Tomasz Figa
Hi Vivek,

On Tue, Nov 27, 2018 at 6:37 AM Vivek Gautam
 wrote:
>
> dma_map_sg() expects a DMA domain. However, the drm devices
> have been traditionally using unmanaged iommu domain which
> is non-dma type. Using dma mapping APIs with that domain is bad.
>
> Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}()
> to do the cache maintenance.
>
> Signed-off-by: Vivek Gautam 
> Suggested-by: Tomasz Figa 
> Cc: Rob Clark 
> Cc: Jordan Crouse 
> Cc: Sean Paul 
> ---
>
> Changes since v1:
>  - Addressed Jordan's and Tomasz's comments for
>- moving sg dma addresses preparation out of the coditional
>  check to the main path so we do it irrespective of whether
>  the buffer is cached or uncached.
>- Enhance the comment to explain this dma addresses prepartion.
>

Thanks for the patch. Some comments inline.

>  drivers/gpu/drm/msm/msm_gem.c | 31 ++-
>  1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index 00c795ced02c..1811ac23a31c 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -81,6 +81,8 @@ static struct page **get_pages(struct drm_gem_object *obj)
> struct drm_device *dev = obj->dev;
> struct page **p;
> int npages = obj->size >> PAGE_SHIFT;
> +   struct scatterlist *s;
> +   int i;
>
> if (use_pages(obj))
> p = drm_gem_get_pages(obj);
> @@ -104,12 +106,21 @@ static struct page **get_pages(struct drm_gem_object 
> *obj)
> return ptr;
> }
>
> -   /* For non-cached buffers, ensure the new pages are clean
> +   /*
> +* dma_sync_sg_*() flush the physical pages, so point
> +* sg->dma_address to the physical ones for the right 
> behavior.

The two halves of the sequence don't really relate to each other. An
sglist has the `page` field for the purpose of pointing to physical
pages. The `dma_address` field is for DMA addresses, which are not
equivalent to physical addresses. I'd rewrite it like this;

/*
 * Some implementations of the DMA mapping ops expect
 * physical addresses of the pages to be stored as DMA
 * addresses of the sglist entries. To work around it,
 * set them here explicitly.
 */

> +*/
> +   for_each_sg(msm_obj->sgt->sgl, s, msm_obj->sgt->nents, i)
> +   sg_dma_address(s) = sg_phys(s);
> +
> +   /*
> +* For non-cached buffers, ensure the new pages are clean
>  * because display controller, GPU, etc. are not coherent:
>  */
> -   if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
> -   dma_map_sg(dev->dev, msm_obj->sgt->sgl,
> -   msm_obj->sgt->nents, 
> DMA_BIDIRECTIONAL);
> +   if (msm_obj->flags & (MSM_BO_WC | MSM_BO_UNCACHED))
> +   dma_sync_sg_for_device(dev->dev, msm_obj->sgt->sgl,
> +  msm_obj->sgt->nents,
> +  DMA_TO_DEVICE);

Why changing from DMA_BIDIRECTIONAL?

> }
>
> return msm_obj->pages;
> @@ -133,14 +144,16 @@ static void put_pages(struct drm_gem_object *obj)
>
> if (msm_obj->pages) {
> if (msm_obj->sgt) {
> -   /* For non-cached buffers, ensure the new
> +   /*
> +* For non-cached buffers, ensure the new
>  * pages are clean because display controller,
>  * GPU, etc. are not coherent:
>  */
> -   if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
> -   dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
> -msm_obj->sgt->nents,
> -DMA_BIDIRECTIONAL);
> +   if (msm_obj->flags & (MSM_BO_WC | MSM_BO_UNCACHED))
> +   dma_sync_sg_for_cpu(obj->dev->dev,
> +   msm_obj->sgt->sgl,
> +   msm_obj->sgt->nents,
> +   DMA_FROM_DEVICE);

Ditto.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] drm_rockchip: rk3066_hdmi: No driver support for vblank timestamp query.

2018-11-27 Thread Tomasz Figa
On Tue, Nov 27, 2018 at 6:11 PM Tomasz Figa  wrote:
>
> Hi Johan,
>
> Adding the dri-devel mailing lists and some maintainers.
>
> On Sun, Nov 25, 2018 at 12:53 AM Johan Jonker  wrote:
> >
> > For a TV stick called MK808 with rk3066 processor I would like to enable
> > the VOP and HDMI.
> > With boot logo and console enabled I can see 2 pinguins and a cursor for
> > a second and then the display is black.
> > From a internet search I learned that the rk3066 doesn't support a
> > vblank counter.
> > Yet they have replaced the Rockchip custom wait_for_vblanks with a drm
> > helper.
>
> All of the Rockchip SoCs supported by the upstream driver lack a
> vblank counter and the code that you refer to is designed for this
> limitation in particular. I suspect the problem you're seeing is
> unrelated.
>
> >
> > Question for the authors and reviewers:
> >
> > -Did anyone test this convert on a rk3066?
>
> Not sure. I have tested internally on RK3399.
>
> > -Can someone with more knowledge explane what happens in this crash
> > rapport below.
>
> To me, it looks like the vblank interrupt
>

Oops. I meant:

It looks like the vblank interrupt doesn't fire for some reason.

> > -Can this be fixed?
> >
>
> Very likely. :)
>
> > Please contact if more info is needed.
> >
>
> I don't see display support enabled in the rk3066-mk808.dts in latest 
> upstream:
> https://elixir.bootlin.com/linux/latest/source/arch/arm/boot/dts/rk3066a-mk808.dts
>
> Do you have any local changes in your kernel sources?
>
> Best regards,
> Tomasz
>
> > Kind regards,
> >
> > Johan Jonker
> >
> > ///
> >
> > Reference:
> >
> > -drm/rockchip: Replace custom wait_for_vblanks with helper
> > https://patchwork.kernel.org/patch/9331351/
> >
> > -drm/rockchip: don't wait for vblank if fb hasn't changed
> > https://patchwork.kernel.org/patch/8024741/
> >
> > -explain why we can't wait_for_vblanks
> > https://lore.kernel.org/patchwork/patch/635586/
> >
> > -drm/rockchip: Convert to support atomic API
> > https://patchwork.kernel.org/patch/7732341/
> >
> > -drm/rockchip: Convert to support atomic API
> > https://patchwork.kernel.org/patch/7868601/
> >
> > https://lkml.org/lkml/2017/9/4/251
> >
> >
> > //
> >
> > [0.897682] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> > [0.904616] [drm] No driver support for vblank timestamp query.
> >
> > [1.440201] [drm:drm_calc_timestamping_constants] crtc 30: hwmode:
> > htotal 1650, vtotal 750, vdisplay 720
> > [1.440212] [drm:drm_calc_timestamping_constants] crtc 30: clock
> > 74250 kHz framedur 1666 linedur 2
> >
> > [1.440307] [drm:drm_crtc_vblank_on] crtc 0, vblank enabled 0,
> > inmodeset 0
> >
> > [1.451189] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
> >
> > [1.451202] [drm:drm_update_vblank_count] updating vblank count on
> > crtc 0: current=1, diff=0, hw=0 hw_last=0
> > .
> > .
> > .
> > .
> > [4.303883] [drm:drm_update_vblank_count] updating vblank count on
> > crtc 0: current=113, diff=1, hw=0 hw_last=0
> > [4.304140] Warning: unable to open an initial console.
> >
> >
> > [6.777825] [ cut here ]
> > [6.786554] WARNING: CPU: 0 PID: 19 at
> > drivers/gpu/drm/drm_atomic_helper.c:1406
> > drm_atomic_helper_wait_for_vblanks.part.1+0x298/0x2a0
> > [6.800803] [CRTC:30:crtc-0] vblank wait timed out
> > [6.806701] Modules linked in:
> > [6.810317] CPU: 0 PID: 19 Comm: kworker/0:1 Not tainted
> > 4.20.0-rc1-g0d694f0b3-dirty #19
> > [6.819938] Hardware name: Rockchip (Device Tree)
> > [6.825498] Workqueue: events output_poll_execute
> > [6.831062] [] (unwind_backtrace) from []
> > (show_stack+0x10/0x14)
> > [6.840191] [] (show_stack) from []
> > (dump_stack+0x88/0x9c)
> > [6.848729] [] (dump_stack) from []
> > (__warn+0xf8/0x110)
> > [6.856936] [] (__warn) from []
> > (warn_slowpath_fmt+0x48/0x6c)
> > [6.873544] [] (warn_slowpath_fmt) from []
> > (drm_atomic_helper_wait_for_vblanks.part.1+0x298/0x2a0)
> > [6.886148] [] (drm_atomic_helper_wait_for_vblanks.part.1)
> > from [] (rockchip_atomic_helper_commit_tail_rpm+0x17c/0x194)
> > [6.900896] [] (rockchip_atomic_helper_commit_tai

Re: [BUG] drm_rockchip: rk3066_hdmi: No driver support for vblank timestamp query.

2018-11-27 Thread Tomasz Figa
Hi Johan,

Adding the dri-devel mailing lists and some maintainers.

On Sun, Nov 25, 2018 at 12:53 AM Johan Jonker  wrote:
>
> For a TV stick called MK808 with rk3066 processor I would like to enable
> the VOP and HDMI.
> With boot logo and console enabled I can see 2 pinguins and a cursor for
> a second and then the display is black.
> From a internet search I learned that the rk3066 doesn't support a
> vblank counter.
> Yet they have replaced the Rockchip custom wait_for_vblanks with a drm
> helper.

All of the Rockchip SoCs supported by the upstream driver lack a
vblank counter and the code that you refer to is designed for this
limitation in particular. I suspect the problem you're seeing is
unrelated.

>
> Question for the authors and reviewers:
>
> -Did anyone test this convert on a rk3066?

Not sure. I have tested internally on RK3399.

> -Can someone with more knowledge explane what happens in this crash
> rapport below.

To me, it looks like the vblank interrupt

> -Can this be fixed?
>

Very likely. :)

> Please contact if more info is needed.
>

I don't see display support enabled in the rk3066-mk808.dts in latest upstream:
https://elixir.bootlin.com/linux/latest/source/arch/arm/boot/dts/rk3066a-mk808.dts

Do you have any local changes in your kernel sources?

Best regards,
Tomasz

> Kind regards,
>
> Johan Jonker
>
> ///
>
> Reference:
>
> -drm/rockchip: Replace custom wait_for_vblanks with helper
> https://patchwork.kernel.org/patch/9331351/
>
> -drm/rockchip: don't wait for vblank if fb hasn't changed
> https://patchwork.kernel.org/patch/8024741/
>
> -explain why we can't wait_for_vblanks
> https://lore.kernel.org/patchwork/patch/635586/
>
> -drm/rockchip: Convert to support atomic API
> https://patchwork.kernel.org/patch/7732341/
>
> -drm/rockchip: Convert to support atomic API
> https://patchwork.kernel.org/patch/7868601/
>
> https://lkml.org/lkml/2017/9/4/251
>
>
> //
>
> [0.897682] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [0.904616] [drm] No driver support for vblank timestamp query.
>
> [1.440201] [drm:drm_calc_timestamping_constants] crtc 30: hwmode:
> htotal 1650, vtotal 750, vdisplay 720
> [1.440212] [drm:drm_calc_timestamping_constants] crtc 30: clock
> 74250 kHz framedur 1666 linedur 2
>
> [1.440307] [drm:drm_crtc_vblank_on] crtc 0, vblank enabled 0,
> inmodeset 0
>
> [1.451189] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
>
> [1.451202] [drm:drm_update_vblank_count] updating vblank count on
> crtc 0: current=1, diff=0, hw=0 hw_last=0
> .
> .
> .
> .
> [4.303883] [drm:drm_update_vblank_count] updating vblank count on
> crtc 0: current=113, diff=1, hw=0 hw_last=0
> [4.304140] Warning: unable to open an initial console.
>
>
> [6.777825] [ cut here ]
> [6.786554] WARNING: CPU: 0 PID: 19 at
> drivers/gpu/drm/drm_atomic_helper.c:1406
> drm_atomic_helper_wait_for_vblanks.part.1+0x298/0x2a0
> [6.800803] [CRTC:30:crtc-0] vblank wait timed out
> [6.806701] Modules linked in:
> [6.810317] CPU: 0 PID: 19 Comm: kworker/0:1 Not tainted
> 4.20.0-rc1-g0d694f0b3-dirty #19
> [6.819938] Hardware name: Rockchip (Device Tree)
> [6.825498] Workqueue: events output_poll_execute
> [6.831062] [] (unwind_backtrace) from []
> (show_stack+0x10/0x14)
> [6.840191] [] (show_stack) from []
> (dump_stack+0x88/0x9c)
> [6.848729] [] (dump_stack) from []
> (__warn+0xf8/0x110)
> [6.856936] [] (__warn) from []
> (warn_slowpath_fmt+0x48/0x6c)
> [6.873544] [] (warn_slowpath_fmt) from []
> (drm_atomic_helper_wait_for_vblanks.part.1+0x298/0x2a0)
> [6.886148] [] (drm_atomic_helper_wait_for_vblanks.part.1)
> from [] (rockchip_atomic_helper_commit_tail_rpm+0x17c/0x194)
> [6.900896] [] (rockchip_atomic_helper_commit_tail_rpm)
> from [] (commit_tail+0x40/0x6c)
> [6.912370] [] (commit_tail) from []
> (drm_atomic_helper_commit+0xbc/0x128)
> [6.922515] [] (drm_atomic_helper_commit) from []
> (restore_fbdev_mode_atomic+0x1cc/0x1dc)
> [6.934196] [] (restore_fbdev_mode_atomic) from
> [] (drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xa0)
> [6.947410] [] (drm_fb_helper_restore_fbdev_mode_unlocked)
> from [] (drm_fb_helper_set_par+0x30/0x54)
> [6.960213] [] (drm_fb_helper_set_par) from []
> (drm_fb_helper_hotplug_event.part.9+0x90/0xa8)
> [6.972305] [] (drm_fb_helper_hotplug_event.part.9) from
> [] (drm_kms_helper_hotplug_event+0x24/0x30)
> [6.985108] [] (drm_kms_helper_hotplug_event) from
> [] (output_poll_execute+0x188/0x1a0)
> [6.996583] [] (output_poll_execute) from []
> (process_one_work+0x218/0x508)
> [7.006829] [] (process_one_work) from []
> (worker_thread+0x30/0x59c)
> [7.016360] [] (worker_thread) from []
> (kthread+0x124/0x154)
> [7.025073] [] (kthread) from []
> (ret_from_fork+0x14/0x2c)
> [7.033567] Exception stack(0xeea3dfb0 to 0xeea3dff8)

Re: [PATCH v3] drm/rockchip: update cursors asynchronously through atomic.

2018-11-26 Thread Tomasz Figa
Hi Gustavo,

On Tue, Nov 27, 2018 at 8:54 AM Gustavo Padovan
 wrote:
>
> Hi Tomasz,
>
> On 11/23/18 12:27 AM, Tomasz Figa wrote:
> > Hi Helen,
> >
> > On Fri, Nov 23, 2018 at 8:31 AM Helen Koike  
> > wrote:
> >> Hi Tomasz,
> >>
> >> On 11/20/18 4:48 AM, Tomasz Figa wrote:
> >>> Hi Helen,
> >>>
> >>> On Tue, Nov 20, 2018 at 4:08 AM Helen Koike  
> >>> wrote:
> >>>> From: Enric Balletbo i Serra 
> >>>>
> >>>> Add support to async updates of cursors by using the new atomic
> >>>> interface for that.
> >>>>
> >>>> Signed-off-by: Enric Balletbo i Serra 
> >>>> [updated for upstream]
> >>>> Signed-off-by: Helen Koike 
> >>>>
> >>>> ---
> >>>> Hello,
> >>>>
> >>>> This is the third version of the async-plane update suport to the
> >>>> Rockchip driver.
> >>>>
> >>> Thanks for a quick respin. Please see my comments inline. (I'll try to
> >>> be better at responding from now on...)
> >>>
> >>>> I tested running igt kms_cursor_legacy and kms_atomic tests using a 
> >>>> 96Boards Ficus.
> >>>>
> >>>> Note that before the patch, the following igt tests failed:
> >>>>
> >>>>  basic-flip-before-cursor-atomic
> >>>>  basic-flip-before-cursor-legacy
> >>>>  cursor-vs-flip-atomic
> >>>>  cursor-vs-flip-legacy
> >>>>  cursor-vs-flip-toggle
> >>>>  flip-vs-cursor-atomic
> >>>>  flip-vs-cursor-busy-crc-atomic
> >>>>  flip-vs-cursor-busy-crc-legacy
> >>>>  flip-vs-cursor-crc-atomic
> >>>>  flip-vs-cursor-crc-legacy
> >>>>  flip-vs-cursor-legacy
> >>>>
> >>>> Full log: https://people.collabora.com/~koike/results-4.20/html/
> >>>>
> >>>> Now with the patch applied the following were fixed:
> >>>>  basic-flip-before-cursor-atomic
> >>>>  basic-flip-before-cursor-legacy
> >>>>  flip-vs-cursor-atomic
> >>>>  flip-vs-cursor-legacy
> >>>>
> >>>> Full log: https://people.collabora.com/~koike/results-4.20-async/html/
> >>> Could you also test modetest, with the -C switch to test the legacy
> >>> cursor API? I remember it triggering crashes due to synchronization
> >>> issues easily.
> >> Sure. I tested with
> >> $ modetest -M rockchip -s 37:1920x1080 -C
> >>
> >> I also vary the mode but I couldn't trigger any crashes.
> >>
> >>>> Tomasz, as you mentined in v2 about waiting the hardware before updating
> >>>> the framebuffer, now I call the loop you pointed out in the async path,
> >>>> was that what you had in mind? Or do you think I would make sense to
> >>>> call the vop_crtc_atomic_flush() instead of just exposing that loop?
> >>>>
> >>>> Thanks
> >>>> Helen
> >>>>
> >>>> Changes in v3:
> >>>> - Rebased on top of drm-misc
> >>>> - Fix missing include in rockchip_drm_vop.c
> >>>> - New function vop_crtc_atomic_commit_flush
> >>>>
> >>>> Changes in v2:
> >>>> - v2: https://patchwork.freedesktop.org/patch/254180/
> >>>> - Change the framebuffer as well to cover jumpy cursor when hovering
> >>>>text boxes or hyperlink. (Tomasz)
> >>>> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
> >>>>PSR flushing (Tomasz)
> >>>>
> >>>> Changes in v1:
> >>>> - Rebased on top of drm-misc
> >>>> - In async_check call drm_atomic_helper_check_plane_state to check that
> >>>>the desired plane is valid and update various bits of derived state
> >>>>(clipped coordinates etc.)
> >>>> - In async_check allow to configure new scaling in the fast path.
> >>>> - In async_update force to flush all registered PSR encoders.
> >>>> - In async_update call atomic_update directly.
> >>>> - In async_update call vop_cfg_done needed to set the vop registers and 
> >>>> take eff

Re: [PATCH v3] drm/rockchip: update cursors asynchronously through atomic.

2018-11-22 Thread Tomasz Figa
Hi Michael,

On Fri, Nov 23, 2018 at 1:58 PM Michael Zoran  wrote:
>
> On Fri, 2018-11-23 at 11:27 +0900, Tomasz Figa wrote:
> >
> > The point here is not about setting and resetting the plane->fb
> > pointer. It's about what happens inside
> > drm_atomic_set_fb_for_plane().
> >
> > It calls drm_framebuffer_get() for the new fb and
> > drm_framebuffer_put() for the old fb. In result, if the fb changes,
> > the old fb, which had its reference count incremented in the atomic
> > commit that set it to the plane before, has its reference count
> > decremented. Moreover, if the new reference count becomes 0,
> > drm_framebuffer_put() will immediately free the buffer.
> >
> > Freeing a buffer when the hardware is still scanning out of it isn't
> > a
> > good idea, is it?
>
> No, it's not.  But the board I submitted the patch for doesn't have
> anything like hot swapable ram.  The ram access is still going to work,
> just it might display something it shouldn't. Say for example if that
> frame buffer got reused by somethig else and filled with new data in
> the very small window.

Thanks for a quick reply!

To clarify, on the Rockchip platform this patch is for (and many other
arm/arm64 SoCs) the display controller is behind an IOMMU. Freeing the
buffer would mean unmapping the related IOVAs from the IOMMU. If the
hardware is still scanning out from the unmapped addresses, it would
cause IOMMU page faults. We don't have any good IOMMU page fault
handling in the kernel, so on most platforms that would likely end up
stalling the display controller completely (on Rockchip it does).

>
> But yes, I agree the best solution would be to not release the buffer
> until the next vblank.
>
> Perhaps a good solution would be for the DRM api to have the concept of
> a deferred release?  Meaning if the put() call just added the frame
> buffer to a list that DRM core could walk during the vblank.  That
> might be better then every single driver trying to work up a custom
> solution.

Agreed.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3] drm/rockchip: update cursors asynchronously through atomic.

2018-11-22 Thread Tomasz Figa
Hi Helen,

On Fri, Nov 23, 2018 at 8:31 AM Helen Koike  wrote:
>
> Hi Tomasz,
>
> On 11/20/18 4:48 AM, Tomasz Figa wrote:
> > Hi Helen,
> >
> > On Tue, Nov 20, 2018 at 4:08 AM Helen Koike  
> > wrote:
> >>
> >> From: Enric Balletbo i Serra 
> >>
> >> Add support to async updates of cursors by using the new atomic
> >> interface for that.
> >>
> >> Signed-off-by: Enric Balletbo i Serra 
> >> [updated for upstream]
> >> Signed-off-by: Helen Koike 
> >>
> >> ---
> >> Hello,
> >>
> >> This is the third version of the async-plane update suport to the
> >> Rockchip driver.
> >>
> >
> > Thanks for a quick respin. Please see my comments inline. (I'll try to
> > be better at responding from now on...)
> >
> >> I tested running igt kms_cursor_legacy and kms_atomic tests using a 
> >> 96Boards Ficus.
> >>
> >> Note that before the patch, the following igt tests failed:
> >>
> >> basic-flip-before-cursor-atomic
> >> basic-flip-before-cursor-legacy
> >> cursor-vs-flip-atomic
> >> cursor-vs-flip-legacy
> >> cursor-vs-flip-toggle
> >> flip-vs-cursor-atomic
> >> flip-vs-cursor-busy-crc-atomic
> >> flip-vs-cursor-busy-crc-legacy
> >> flip-vs-cursor-crc-atomic
> >> flip-vs-cursor-crc-legacy
> >> flip-vs-cursor-legacy
> >>
> >> Full log: https://people.collabora.com/~koike/results-4.20/html/
> >>
> >> Now with the patch applied the following were fixed:
> >> basic-flip-before-cursor-atomic
> >> basic-flip-before-cursor-legacy
> >> flip-vs-cursor-atomic
> >> flip-vs-cursor-legacy
> >>
> >> Full log: https://people.collabora.com/~koike/results-4.20-async/html/
> >
> > Could you also test modetest, with the -C switch to test the legacy
> > cursor API? I remember it triggering crashes due to synchronization
> > issues easily.
>
> Sure. I tested with
> $ modetest -M rockchip -s 37:1920x1080 -C
>
> I also vary the mode but I couldn't trigger any crashes.
>
> >
> >>
> >> Tomasz, as you mentined in v2 about waiting the hardware before updating
> >> the framebuffer, now I call the loop you pointed out in the async path,
> >> was that what you had in mind? Or do you think I would make sense to
> >> call the vop_crtc_atomic_flush() instead of just exposing that loop?
> >>
> >> Thanks
> >> Helen
> >>
> >> Changes in v3:
> >> - Rebased on top of drm-misc
> >> - Fix missing include in rockchip_drm_vop.c
> >> - New function vop_crtc_atomic_commit_flush
> >>
> >> Changes in v2:
> >> - v2: https://patchwork.freedesktop.org/patch/254180/
> >> - Change the framebuffer as well to cover jumpy cursor when hovering
> >>   text boxes or hyperlink. (Tomasz)
> >> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
> >>   PSR flushing (Tomasz)
> >>
> >> Changes in v1:
> >> - Rebased on top of drm-misc
> >> - In async_check call drm_atomic_helper_check_plane_state to check that
> >>   the desired plane is valid and update various bits of derived state
> >>   (clipped coordinates etc.)
> >> - In async_check allow to configure new scaling in the fast path.
> >> - In async_update force to flush all registered PSR encoders.
> >> - In async_update call atomic_update directly.
> >> - In async_update call vop_cfg_done needed to set the vop registers and 
> >> take effect.
> >>
> >>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 ---
> >>  drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++
> >>  drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
> >>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +---
> >>  4 files changed, 131 insertions(+), 53 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c 
> >> b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >> index ea18cb2a76c0..08bec50d9c5d 100644
> >> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> >> @@ -127,42 +127,6 @@ rockchip_user_fb_create(struct drm_device *dev, 
> >> struct drm_file *file_priv,
> >> return ERR_PTR(ret);
> >>  }
> >>
>

Re: [PATCH 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-20 Thread Tomasz Figa
Hi Jordan, Vivek,

On Wed, Nov 21, 2018 at 12:41 AM Jordan Crouse  wrote:
>
> On Tue, Nov 20, 2018 at 03:24:37PM +0530, Vivek Gautam wrote:
> > dma_map_sg() expects a DMA domain. However, the drm devices
> > have been traditionally using unmanaged iommu domain which
> > is non-dma type. Using dma mapping APIs with that domain is bad.
> >
> > Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}()
> > to do the cache maintenance.
> >
> > Signed-off-by: Vivek Gautam 
> > Suggested-by: Tomasz Figa 
> > ---
> >
> > Tested on an MTP sdm845:
> > https://github.com/vivekgautam1/linux/tree/v4.19/sdm845-mtp-display-working
> >
> >  drivers/gpu/drm/msm/msm_gem.c | 27 ---
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> > index 00c795ced02c..d7a7af610803 100644
> > --- a/drivers/gpu/drm/msm/msm_gem.c
> > +++ b/drivers/gpu/drm/msm/msm_gem.c
> > @@ -81,6 +81,8 @@ static struct page **get_pages(struct drm_gem_object *obj)
> >   struct drm_device *dev = obj->dev;
> >   struct page **p;
> >   int npages = obj->size >> PAGE_SHIFT;
> > + struct scatterlist *s;
> > + int i;
> >
> >   if (use_pages(obj))
> >   p = drm_gem_get_pages(obj);
> > @@ -107,9 +109,19 @@ static struct page **get_pages(struct drm_gem_object 
> > *obj)
> >   /* For non-cached buffers, ensure the new pages are clean
> >* because display controller, GPU, etc. are not coherent:
> >*/
> > - if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
> > - dma_map_sg(dev->dev, msm_obj->sgt->sgl,
> > - msm_obj->sgt->nents, 
> > DMA_BIDIRECTIONAL);
> > + if (msm_obj->flags & (MSM_BO_WC | MSM_BO_UNCACHED)) {
> > + /*
> > +  * Fake up the SG table so that dma_sync_sg_*()
> > +  * can be used to flush the pages associated with it.
> > +  */
>
> We aren't really faking.  The table is real, we are just slightly abusing the
> sg_dma_address() which makes this comment a bit misleading. Instead I would
> probably say something like:
>
> /* dma_sync_sg_* flushes pages using sg_dma_address() so point it at the
>  * physical page for the right behavior */
>
> Or something like that.
>

It's actually quite complicated, but I agree that the comment isn't
very precise. The cases are as follows:
- arm64 iommu_dma_ops use sg_phys()
https://elixir.bootlin.com/linux/v4.20-rc3/source/arch/arm64/mm/dma-mapping.c#L599
- swiotlb_dma_ops used on arm64 if no IOMMU is available use
sg->dma_address directly:
https://elixir.bootlin.com/linux/v4.20-rc3/source/kernel/dma/swiotlb.c#L832
- arm_dma_ops use sg_dma_address():
https://elixir.bootlin.com/linux/v4.20-rc3/source/arch/arm/mm/dma-mapping.c#L1130
- arm iommu_ops use sg_page():
https://elixir.bootlin.com/linux/v4.20-rc3/source/arch/arm/mm/dma-mapping.c#L1869

Sounds like a mess...

> > + for_each_sg(msm_obj->sgt->sgl, s,
> > + msm_obj->sgt->nents, i)
> > + sg_dma_address(s) = sg_phys(s);
> > +
>
> I'm wondering - wouldn't we want to do this association for cached buffers to 
> so
> we could sync them correctly in cpu_prep and cpu_fini?  Maybe it wouldn't hurt
> to put this association in the main path (obviously the sync should stay 
> inside
> the conditional for uncached buffers).
>

I guess it wouldn't hurt indeed. Note that cpu_prep/fini seem to be
missing the sync call currently.

P.S. Jordan, not sure if it's my Gmail or your email client, but your
message had all the recipients in a Reply-to header, except you, so
pressing Reply to all in my case led to a message that didn't have you
in recipients anymore...

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3] drm/rockchip: update cursors asynchronously through atomic.

2018-11-19 Thread Tomasz Figa
Hi Helen,

On Tue, Nov 20, 2018 at 4:08 AM Helen Koike  wrote:
>
> From: Enric Balletbo i Serra 
>
> Add support to async updates of cursors by using the new atomic
> interface for that.
>
> Signed-off-by: Enric Balletbo i Serra 
> [updated for upstream]
> Signed-off-by: Helen Koike 
>
> ---
> Hello,
>
> This is the third version of the async-plane update suport to the
> Rockchip driver.
>

Thanks for a quick respin. Please see my comments inline. (I'll try to
be better at responding from now on...)

> I tested running igt kms_cursor_legacy and kms_atomic tests using a 96Boards 
> Ficus.
>
> Note that before the patch, the following igt tests failed:
>
> basic-flip-before-cursor-atomic
> basic-flip-before-cursor-legacy
> cursor-vs-flip-atomic
> cursor-vs-flip-legacy
> cursor-vs-flip-toggle
> flip-vs-cursor-atomic
> flip-vs-cursor-busy-crc-atomic
> flip-vs-cursor-busy-crc-legacy
> flip-vs-cursor-crc-atomic
> flip-vs-cursor-crc-legacy
> flip-vs-cursor-legacy
>
> Full log: https://people.collabora.com/~koike/results-4.20/html/
>
> Now with the patch applied the following were fixed:
> basic-flip-before-cursor-atomic
> basic-flip-before-cursor-legacy
> flip-vs-cursor-atomic
> flip-vs-cursor-legacy
>
> Full log: https://people.collabora.com/~koike/results-4.20-async/html/

Could you also test modetest, with the -C switch to test the legacy
cursor API? I remember it triggering crashes due to synchronization
issues easily.

>
> Tomasz, as you mentined in v2 about waiting the hardware before updating
> the framebuffer, now I call the loop you pointed out in the async path,
> was that what you had in mind? Or do you think I would make sense to
> call the vop_crtc_atomic_flush() instead of just exposing that loop?
>
> Thanks
> Helen
>
> Changes in v3:
> - Rebased on top of drm-misc
> - Fix missing include in rockchip_drm_vop.c
> - New function vop_crtc_atomic_commit_flush
>
> Changes in v2:
> - v2: https://patchwork.freedesktop.org/patch/254180/
> - Change the framebuffer as well to cover jumpy cursor when hovering
>   text boxes or hyperlink. (Tomasz)
> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
>   PSR flushing (Tomasz)
>
> Changes in v1:
> - Rebased on top of drm-misc
> - In async_check call drm_atomic_helper_check_plane_state to check that
>   the desired plane is valid and update various bits of derived state
>   (clipped coordinates etc.)
> - In async_check allow to configure new scaling in the fast path.
> - In async_update force to flush all registered PSR encoders.
> - In async_update call atomic_update directly.
> - In async_update call vop_cfg_done needed to set the vop registers and take 
> effect.
>
>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  |  36 ---
>  drivers/gpu/drm/rockchip/rockchip_drm_psr.c |  37 +++
>  drivers/gpu/drm/rockchip/rockchip_drm_psr.h |   3 +
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 108 +---
>  4 files changed, 131 insertions(+), 53 deletions(-)
>
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c 
> b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> index ea18cb2a76c0..08bec50d9c5d 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> @@ -127,42 +127,6 @@ rockchip_user_fb_create(struct drm_device *dev, struct 
> drm_file *file_priv,
> return ERR_PTR(ret);
>  }
>
> -static void
> -rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> -{
> -   struct drm_crtc *crtc;
> -   struct drm_crtc_state *crtc_state;
> -   struct drm_encoder *encoder;
> -   u32 encoder_mask = 0;
> -   int i;
> -
> -   for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> -   encoder_mask |= crtc_state->encoder_mask;
> -   encoder_mask |= crtc->state->encoder_mask;
> -   }
> -
> -   drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> -   rockchip_drm_psr_inhibit_get(encoder);
> -}
> -
> -static void
> -rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> -{
> -   struct drm_crtc *crtc;
> -   struct drm_crtc_state *crtc_state;
> -   struct drm_encoder *encoder;
> -   u32 encoder_mask = 0;
> -   int i;
> -
> -   for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> -   encoder_mask |= crtc_state->encoder_mask;
> -   encoder_mask |= crtc->state->encoder_mask;
> -   }
> -
> -   drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> -   rockchip_drm_psr_inhibit_put(encoder);
> -}
> -
>  static void
>  rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
>  {
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c 
> b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> index 01ff3c858875..22a70ab6e214 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
>

Re: [PATCH v2] drm/rockchip: update cursors asynchronously through atomic.

2018-11-14 Thread Tomasz Figa
Hi Enric,

On Fri, Sep 28, 2018 at 5:27 PM Enric Balletbo i Serra
 wrote:
>
> Add support to async updates of cursors by using the new atomic
> interface for that.
>
> Signed-off-by: Enric Balletbo i Serra 
> ---
> Hi,
>
> This is a second version of the patch to add async-plane update support
> to the Rockchip driver.
>

I'm really sorry for the super late reply. I couldn't get cycles to
look at this earlier. :(

> The patch was tested on a Samsung Chromebook Plus in two ways.
>
>  1. Running all igt kms_cursor_legacy and kms_atomic tests and see that
> there is no regression after the patch.
> Note that before the patch, the following igt tests failed:
> - basic-flip-before-cursor-legacy
> - basic-flip-before-cursor-atomic
> - cursor-vs-flip-legacy
> - cursor-vs-flip-toggle
> - flip-vs-cursor-atomic
> - flip-vs-cursor-crc-atomic
> - flip-vs-cursor-crc-legacy
> - flip-vs-cursor-legacy
> - flip-vs-cursor-crc-atomic
> - flip-vs-cursor-crc-legacy

Are the last 2 tests repeated?

>
> With the patch applied only two tests don't fail (these two are
> expexted to not pass right now):
> - flip-vs-cursor-crc-atomic
> - flip-vs-cursor-crc-legac

Did you mean "don't pass"?

>
> You can check full IGT test report here:
> - Before the patch:
> 
> https://people.collabora.com/~eballetbo/tests/igt/samsung-chromebook-plus/igt-1.23/4.19-rc5/index.html
> - With the patch applied:
> 
> https://people.collabora.com/~eballetbo/tests/igt/samsung-chromebook-plus/igt-1.23/4.19-rc5-async-update/index.html
>
>  2. Running weston using the atomic API.
>
> Best regards,
>   Enric
>
> Changes in v2:
> - Change the framebuffer as well to cover jumpy cursor when hovering
>   text boxes or hyperlink. (Tomasz)
> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
>   PSR flushing (Tomasz)
>
> Changes in v1:
> - Rebased on top of drm-misc
> - In async_check call drm_atomic_helper_check_plane_state to check that
>   the desired plane is valid and update various bits of derived state
>   (clipped coordinates etc.)
> - In async_check allow to configure new scaling in the fast path.
> - In async_update force to flush all registered PSR encoders.
> - In async_update call atomic_update directly.
> - In async_update call vop_cfg_done needed to set the vop registers and take 
> effect.
>
>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  | 36 
>  drivers/gpu/drm/rockchip/rockchip_drm_psr.c | 37 
>  drivers/gpu/drm/rockchip/rockchip_drm_psr.h |  3 +
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 62 +
>  4 files changed, 102 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c 
> b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> index 7b6f7227d476..aec9a997de13 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
> @@ -128,24 +128,6 @@ rockchip_user_fb_create(struct drm_device *dev, struct 
> drm_file *file_priv,
> return ERR_PTR(ret);
>  }
>
> -static void
> -rockchip_drm_psr_inhibit_get_state(struct drm_atomic_state *state)
> -{
> -   struct drm_crtc *crtc;
> -   struct drm_crtc_state *crtc_state;
> -   struct drm_encoder *encoder;
> -   u32 encoder_mask = 0;
> -   int i;
> -
> -   for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> -   encoder_mask |= crtc_state->encoder_mask;
> -   encoder_mask |= crtc->state->encoder_mask;
> -   }
> -
> -   drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> -   rockchip_drm_psr_inhibit_get(encoder);
> -}
> -
>  uint32_t rockchip_drm_get_vblank_ns(struct drm_display_mode *mode)
>  {
> uint64_t vblank_time = mode->vtotal - mode->vdisplay;
> @@ -156,24 +138,6 @@ uint32_t rockchip_drm_get_vblank_ns(struct 
> drm_display_mode *mode)
> return vblank_time;
>  }
>
> -static void
> -rockchip_drm_psr_inhibit_put_state(struct drm_atomic_state *state)
> -{
> -   struct drm_crtc *crtc;
> -   struct drm_crtc_state *crtc_state;
> -   struct drm_encoder *encoder;
> -   u32 encoder_mask = 0;
> -   int i;
> -
> -   for_each_old_crtc_in_state(state, crtc, crtc_state, i) {
> -   encoder_mask |= crtc_state->encoder_mask;
> -   encoder_mask |= crtc->state->encoder_mask;
> -   }
> -
> -   drm_for_each_encoder_mask(encoder, state->dev, encoder_mask)
> -   rockchip_drm_psr_inhibit_put(encoder);
> -}
> -
>  static void
>  rockchip_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
>  {
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c 
> b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> index 79d00d861a31..1635485955d3 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c
> @@ -13,6 +13,7 @@
>   */
>
>  #include 
> +#include 
>  #include 
>
>  #includ

Re: [RFC v4 03/18] vb2: Move cache synchronisation from buffer done to dqbuf handler

2018-10-05 Thread Tomasz Figa
Hi Sakari, Hans,

On Tue, May 9, 2017 at 12:05 AM Sakari Ailus
 wrote:
>
> The cache synchronisation may be a time consuming operation and thus not
> best performed in an interrupt which is a typical context for
> vb2_buffer_done() calls. This may consume up to tens of ms on some
> machines, depending on the buffer size.
>
> Signed-off-by: Sakari Ailus 
> Acked-by: Hans Verkuil 
> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c 
> b/drivers/media/v4l2-core/videobuf2-core.c
> index 8bf3369..e866115 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -889,7 +889,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum 
> vb2_buffer_state state)
>  {
> struct vb2_queue *q = vb->vb2_queue;
> unsigned long flags;
> -   unsigned int plane;
>
> if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
> return;
> @@ -910,10 +909,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum 
> vb2_buffer_state state)
> dprintk(4, "done processing on buffer %d, state: %d\n",
> vb->index, state);
>
> -   /* sync buffers */
> -   for (plane = 0; plane < vb->num_planes; ++plane)
> -   call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> -
> spin_lock_irqsave(&q->done_lock, flags);
> if (state == VB2_BUF_STATE_QUEUED ||
> state == VB2_BUF_STATE_REQUEUEING) {
> @@ -1573,6 +1568,10 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
>
> vb->state = VB2_BUF_STATE_DEQUEUED;
>
> +   /* sync buffers */
> +   for (i = 0; i < vb->num_planes; ++i)
> +   call_void_memop(vb, finish, vb->planes[i].mem_priv);
> +

Sorry for digging up this old patch. Posting for reference, in case
someone decides to use or take over this patch.

This piece of code seems to be executed after queue's .buf_finish()
callback. The latter is allowed to access the buffer contents, so it
looks like we're breaking it, because it would now access the buffer
before the cache is synchronized.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/rockchip: update cursors asynchronously through atomic.

2018-08-13 Thread Tomasz Figa
On Mon, Aug 13, 2018 at 4:26 PM Heiko Stuebner  wrote:
>
> Am Montag, 13. August 2018, 09:11:07 CEST schrieb Tomasz Figa:
> > Hi Enric,
> >
> > On Tue, Aug 7, 2018 at 12:53 AM Enric Balletbo i Serra
> >  wrote:
> > >
> > > Add support to async updates of cursors by using the new atomic
> > > interface for that.
> > >
> > > Signed-off-by: Enric Balletbo i Serra 
> > > ---
> > > Hi,
> > >
> > > This first version is slightly different from the RFC, note that I did
> > > not maintain the Sean reviewed tag for that reason. With this version I
> > > don't touch the atomic_update function and all is implemented in the
> > > async_check/update functions. See the changelog for a list of changes.
> > >
> > > The patch was tested on a Samsung Chromebook Plus in two ways.
> > >
> > >  1. Running all igt kms_cursor_legacy and kms_atomic@plane_cursor_legacy
> > > tests and see that there is no regression after the patch.
> > >
> > >  2. Running weston using the atomic API.
> >
> > Thanks for the patch. This feature might look like a really minor
> > thing, but we really had hard time dealing with users complaints, so
> > having this in upstream would be a really useful thing.
> >
> > Let me post some comments inline.
> >
> > >
> > > Best regards,
> > >   Enric
> > >
> > > Changes in v1:
> > > - Rebased on top of drm-misc
> > > - In async_check call drm_atomic_helper_check_plane_state to check that
> > >   the desired plane is valid and update various bits of derived state
> > >   (clipped coordinates etc.)
> > > - In async_check allow to configure new scaling in the fast path.
> > > - In async_update force to flush all registered PSR encoders.
> > > - In async_update call atomic_update directly.
> > > - In async_update call vop_cfg_done needed to set the vop registers and 
> > > take effect.
> > >
> > >  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 53 +
> > >  1 file changed, 53 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
> > > b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > > index e9f91278137d..dab70056ee73 100644
> > > --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > > @@ -811,10 +811,63 @@ static void vop_plane_atomic_update(struct 
> > > drm_plane *plane,
> > > spin_unlock(&vop->reg_lock);
> > >  }
> > >
> > > +static int vop_plane_atomic_async_check(struct drm_plane *plane,
> > > +   struct drm_plane_state *state)
> > > +{
> > > +   struct vop_win *vop_win = to_vop_win(plane);
> > > +   const struct vop_win_data *win = vop_win->data;
> > > +   int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
> > > +   DRM_PLANE_HELPER_NO_SCALING;
> > > +   int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
> > > +   DRM_PLANE_HELPER_NO_SCALING;
> > > +   int ret;
> > > +
> > > +   if (plane != state->crtc->cursor)
> > > +   return -EINVAL;
> > > +
> > > +   if (!plane->state)
> > > +   return -EINVAL;
> > > +
> > > +   if (!plane->state->fb ||
> > > +   plane->state->fb != state->fb)
> > > +   return -EINVAL;
> >
> > While it covers for quite a big part of cursor movements, you may
> > still expect jumpy cursor when hoovering text boxes or hyperlinks,
> > since it changes the cursor image. Our downstream patch [1] actually
> > took care of changing the framebuffer as well, although with the added
> > complexity of referencing the old buffer at update time and releasing
> > it in a flip work.
> >
> > [1] 
> > https://chromium.git.corp.google.com/chromiumos/third_party/kernel/+/1ad887e1a1349991c9e137b48cb32086e65347fc%5E%21/
>
> [1] 
> https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/394492
> for non-google people ;-)
>

Thanks, not sure how that internal link sneaked into my clipboard.
Should have checked what I pasted. :P

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/rockchip: update cursors asynchronously through atomic.

2018-08-13 Thread Tomasz Figa
Hi Enric,

On Tue, Aug 7, 2018 at 12:53 AM Enric Balletbo i Serra
 wrote:
>
> Add support to async updates of cursors by using the new atomic
> interface for that.
>
> Signed-off-by: Enric Balletbo i Serra 
> ---
> Hi,
>
> This first version is slightly different from the RFC, note that I did
> not maintain the Sean reviewed tag for that reason. With this version I
> don't touch the atomic_update function and all is implemented in the
> async_check/update functions. See the changelog for a list of changes.
>
> The patch was tested on a Samsung Chromebook Plus in two ways.
>
>  1. Running all igt kms_cursor_legacy and kms_atomic@plane_cursor_legacy
> tests and see that there is no regression after the patch.
>
>  2. Running weston using the atomic API.

Thanks for the patch. This feature might look like a really minor
thing, but we really had hard time dealing with users complaints, so
having this in upstream would be a really useful thing.

Let me post some comments inline.

>
> Best regards,
>   Enric
>
> Changes in v1:
> - Rebased on top of drm-misc
> - In async_check call drm_atomic_helper_check_plane_state to check that
>   the desired plane is valid and update various bits of derived state
>   (clipped coordinates etc.)
> - In async_check allow to configure new scaling in the fast path.
> - In async_update force to flush all registered PSR encoders.
> - In async_update call atomic_update directly.
> - In async_update call vop_cfg_done needed to set the vop registers and take 
> effect.
>
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 53 +
>  1 file changed, 53 insertions(+)
>
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
> b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> index e9f91278137d..dab70056ee73 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> @@ -811,10 +811,63 @@ static void vop_plane_atomic_update(struct drm_plane 
> *plane,
> spin_unlock(&vop->reg_lock);
>  }
>
> +static int vop_plane_atomic_async_check(struct drm_plane *plane,
> +   struct drm_plane_state *state)
> +{
> +   struct vop_win *vop_win = to_vop_win(plane);
> +   const struct vop_win_data *win = vop_win->data;
> +   int min_scale = win->phy->scl ? FRAC_16_16(1, 8) :
> +   DRM_PLANE_HELPER_NO_SCALING;
> +   int max_scale = win->phy->scl ? FRAC_16_16(8, 1) :
> +   DRM_PLANE_HELPER_NO_SCALING;
> +   int ret;
> +
> +   if (plane != state->crtc->cursor)
> +   return -EINVAL;
> +
> +   if (!plane->state)
> +   return -EINVAL;
> +
> +   if (!plane->state->fb ||
> +   plane->state->fb != state->fb)
> +   return -EINVAL;

While it covers for quite a big part of cursor movements, you may
still expect jumpy cursor when hoovering text boxes or hyperlinks,
since it changes the cursor image. Our downstream patch [1] actually
took care of changing the framebuffer as well, although with the added
complexity of referencing the old buffer at update time and releasing
it in a flip work.

[1] 
https://chromium.git.corp.google.com/chromiumos/third_party/kernel/+/1ad887e1a1349991c9e137b48cb32086e65347fc%5E%21/

> +
> +   ret = drm_atomic_helper_check_plane_state(plane->state,
> + plane->crtc->state,
> + min_scale, max_scale,
> + true, true);
> +   return ret;
> +}
> +
> +static void vop_plane_atomic_async_update(struct drm_plane *plane,
> + struct drm_plane_state *new_state)
> +{
> +   struct vop *vop = to_vop(plane->state->crtc);
> +
> +   plane->state->crtc_x = new_state->crtc_x;
> +   plane->state->crtc_y = new_state->crtc_y;
> +   plane->state->crtc_h = new_state->crtc_h;
> +   plane->state->crtc_w = new_state->crtc_w;
> +   plane->state->src_x = new_state->src_x;
> +   plane->state->src_y = new_state->src_y;
> +   plane->state->src_h = new_state->src_h;
> +   plane->state->src_w = new_state->src_w;
> +
> +   if (vop->is_enabled) {
> +   rockchip_drm_psr_flush_all(plane->dev);

We should use the inhibit mechanism when accessing VOP hardware. While
the flush is expected to keep the PSR disabled for at least 100 ms,
we're not in any atomic (pun not intended) context here and might get
preempted for some unspecified time in very high load conditions.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 1/2] drm/rockchip: vop: split out core clock enablement into separate functions

2018-06-18 Thread Tomasz Figa
Hi Heiko,

On Tue, Jun 12, 2018 at 10:20 PM Heiko Stuebner  wrote:
>
> Judging from the iommu code, both the hclk and aclk are necessary for
> register access. Split them off into separate functions from the regular
> vop enablement, so that we can use them elsewhere as well.
>
> Fixes: d0b912bd4c23 ("iommu/rockchip: Request irqs in rk_iommu_probe()")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Heiko Stuebner 
> Tested-by: Ezequiel Garcia 
> ---
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 44 +++--
>  1 file changed, 31 insertions(+), 13 deletions(-)

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 2/2] drm/rockchip: vop: fix irq disabled after vop driver probed

2018-06-18 Thread Tomasz Figa
Hi Heiko,

On Tue, Jun 12, 2018 at 9:15 PM Heiko Stuebner  wrote:
>
> From: Sandy Huang 
>
> The vop irq is shared between vop and iommu and irq probing in the
> iommu driver moved to the probe function recently. This can in some
> cases lead to a stall if the irq is triggered while the vop driver
> still has it disabled, but the vop irq handler gets called.
>
> But there is no real need to disable the irq, as the vop can simply
> also track its enabled state and ignore irqs in that case.
> For this we can simply check the power-domain state of the vop,
> similar to how the iommu driver does it.
>
> So remove the enable/disable handling and add appropriate condition
> to the irq handler.
>
> changes in v2:
> - move to just check the power-domain state
> - add clock handling
> changes in v3:
> - clarify comment to speak of runtime-pm not power-domain
[snip]
> @@ -1209,8 +1215,11 @@ static irqreturn_t vop_isr(int irq, void *data)
> spin_unlock(&vop->irq_lock);
>
> /* This is expected for vop iommu irqs, since the irq is shared */
> -   if (!active_irqs)
> -   return IRQ_NONE;
> +   if (!active_irqs) {
> +   ret = IRQ_NONE;
> +   vop_core_clks_disable(vop);

nit: If we're adding "out:", couldn't we also add "out_clks:" and move
the call to vop_core_clks_disable() there?

> +   goto out;
> +   }
>
> if (active_irqs & DSP_HOLD_VALID_INTR) {
> complete(&vop->dsp_hold_completion);
> @@ -1236,6 +1245,10 @@ static irqreturn_t vop_isr(int irq, void *data)
> DRM_DEV_ERROR(vop->dev, "Unknown VOP IRQs: %#02x\n",
>       active_irqs);
>
> +   vop_core_clks_disable(vop);
> +
> +out:
> +   pm_runtime_put(vop->dev);
> return ret;
>  }

Other than that:

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/rockchip: vop: fix irq disabled after vop driver probed

2018-05-24 Thread Tomasz Figa
Hi Heiko, Sandy,

On Fri, May 25, 2018 at 7:07 AM Heiko Stübner  wrote:

> From: Sandy Huang 

> The vop irq is shared between vop and iommu and irq probing in the
> iommu driver moved to the probe function recently. This can in some
> cases lead to a stall if the irq is triggered while the vop driver
> still has it disabled.

> But there is no real need to disable the irq, as the vop can simply
> also track its enabled state and ignore irqs in that case.

> So remove the enable/disable handling and add appropriate condition
> to the irq handler.

> Signed-off-by: Sandy Huang 
> [added an actual commit message]
> Signed-off-by: Heiko Stuebner 
> ---
> Hi Ezequiel,

> this patch came from a discussion I had with Rockchip people over the
> iommu changes and resulting issues back in april, but somehow was
> forgotten and not posted to the lists. Correcting that now.

> So removing the enable/disable voodoo on the shared interrupt is
> the preferred way.

>   drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 13 ++---
>   1 file changed, 7 insertions(+), 7 deletions(-)

> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> index 510cdf0..61493d4 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> @@ -549,8 +549,6 @@ static int vop_enable(struct drm_crtc *crtc)

>  spin_unlock(&vop->reg_lock);

> -   enable_irq(vop->irq);
> -

While this one should be okay (+/- my comment for vop_isr()), because the
hardware is already powered on and clocked at this point...

>  drm_crtc_vblank_on(crtc);

>  return 0;
> @@ -596,8 +594,6 @@ static void vop_crtc_atomic_disable(struct drm_crtc
*crtc,

>  vop_dsp_hold_valid_irq_disable(vop);

> -   disable_irq(vop->irq);
> -
>  vop->is_enabled = false;

...this one is more tricky. There might be an interrupt handler still
running at this point. disable_irq() waits for any running handler to
complete before disabling, so we might want to call synchronize_irq() after
setting is_enabled to false.


>  /*
> @@ -1168,6 +1164,13 @@ static irqreturn_t vop_isr(int irq, void *data)
>  int ret = IRQ_NONE;

>  /*
> +* since the irq is shared with iommu, iommu irq is enabled
before vop
> +* enable, so before vop enable we do nothing.
> +*/
> +   if (!vop->is_enabled)
> +   return IRQ_NONE;

This doesn't seem to be properly synchronized. We don't even call it under
a spinlock, so no barriers are issued. Perhaps we should make this atomic_t?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


minigbm/cros_gralloc handle struct (Was: Re: [drm_hwc] PSA: drm_hwc submissions via gitlab)

2018-05-06 Thread Tomasz Figa
[Really taking this to another thread. Leaving CC as is for the time being.]

On Sat, May 5, 2018 at 4:16 AM Mauro Rossi  wrote:

>
> Il 04 mag 2018 19:51, "Alistair Strachan"  ha
> scritto:
>
> On Fri, May 4, 2018 at 10:35 AM Sean Paul  wrote:
>
>> On Fri, May 04, 2018 at 05:19:50PM +, Mauro Rossi wrote:
>
> [snip]
>
>> > Another question is for Intel and Chromeos developers: are you planning
>> to
>>
> > update your minigbm projects to the new common gralloc_handle.h handle
>> > structure in latest libdrm?
>> >
>>
>> I assume yes, but I'm not hooked in to what's happening with minigbm.
>>
>
> +1. We can take this to another thread. The main issue is that minigbm
> assumes a 1:1 planes to handles paradigm, the new generic handle does not.
>
>
> Thanks for the info
>

I'm personally skeptical about this, but adding Gurchetan, who is a better
person to comment on this. My biggest concern is that it might not be
possible to define a common handle that is really good for everyone. Also,
since the handle would effectively be a stable ABI between independent
components, changing it to accommodate for future features (or discovered
issues ) would be problematic.

Generally we designed our graphics stack in Chrome OS / ARC++ in a way that
does not rely on the handle struct at all, so that we are free to change
the struct in any way we want in the future, without any compatibility
concerns (such as plane layout isssue mentioned before).

For the needs of EGL and most of other consumers, we are doing well without
any API extensions (most of the time we get some more complete struct, such
as ANativeWindow(Buffer) and for ambiguous formats we adopted lock_ycbcr
method).

Our hwcomposer is the only exception for which we added some perform calls
to retrieve data such as HAL pixel format, width, height, stride and
backing store (unique ID within some private namespace of the gralloc
instance, having similar properties to GEM handle within one DRI FD).

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v6 28/30] drm/rockchip: Disable PSR from reboot notifier

2018-04-16 Thread Tomasz Figa
Hi Andrzej,

On Mon, Apr 16, 2018 at 6:57 PM Andrzej Hajda  wrote:

> On 05.04.2018 11:49, Enric Balletbo i Serra wrote:
> > From: Tomasz Figa 
> >
> > It looks like the driver subsystem detaches devices from power domains
> > at shutdown without consent of the drivers.

> It looks bit strange. Could you elaborate more on it. Could you show the
> code performing the detach?

It not only looks strange, but it is strange. The code was present in 4.4:

https://elixir.bootlin.com/linux/v4.4.128/source/drivers/base/platform.c#L553

but was apparently removed in 4.5:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/base/platform.c?h=next-20180416&id=2d30bb0b3889adf09b342722b2ce596c0763bc93

So we might not need this patch anymore.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC 0/1] DRM Node Probing functionality

2018-04-15 Thread Tomasz Figa
Hi Rob,

On Thu, Apr 12, 2018 at 12:56 PM Robert Foss 
wrote:

> *Resend the whole actual series*

> This patch implements a simple function for matching DRM device FDs
against
> the desired properties of a device that you are looking for.

> The properties that are possible to look for are the elements of
DrmVersion
> and DrmDevice.

> The discussion that led to this series can be found here:
> https://lists.freedesktop.org/archives/mesa-dev/2018-January/183878.html

> In the previous discussion we left off having settled on implementing this
> in mesa/src/loader/loader.c, which I initially did. But the code ended up
being
> so generic that there's no reason for it not to live in libdrm, since it
can be
> used for any compositor and mesa.

> The implementer will still have to iterate through all of the devices
available
> on the target, and see if they match. This additional functionality could
be
> moved into libdrm at a later point if it turns out that all of the users
do this
> in the same manner.
> If there is some variety, for example with selecting fallback devices if
nothing
> matches maybe it is best left up to the user of libdrm.

> The biggest problem with the approach as implemented, the way I see it,
is how
> we match against the DrmVersion/DrmDevice of a given FD.
> Currently we use DrmVersion & DrmDevice as supplied by the caller to
define what
> we are looking for.
> This is a little bit problematic, especially for DrmDevice, since all of
the
> elements of the struct do not have enough bitspace to signal that we are
> uninterested in looking for that specific element. DrmDevice uses
> drmDevicesEqual() to do what amounts to a memcmp of the DrmDevice
argument and
> the one of the FD. So looking for any device on any PCI bus with just the
> PCI vendor ID supplied isn't possible.

> An alternative Daniel Stone suggested is adding enums for different
properties
> and allowing the caller to supply a list of properties that are
interesting and
> their values. In terms of long-term maintainership this might be less
pleasant
> than the  approach of the current implementation.

I'm personally okay with how patch 1 implements the matching. Thanks for
this work. Looking forward to the Mesa probing helper, which uses this! :)

P.S. I normally use my @chromium.org email for mailing lists.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH libdrm 1/2] intel: Do not use libpciaccess on Android

2018-03-27 Thread Tomasz Figa
On Wed, Mar 21, 2018 at 2:36 AM, Emil Velikov  wrote:
> From: Tomasz Figa 
>
> This patch makes the code not rely anymore on libpciaccess when compiled
> for Android to eliminate ioperm() and iopl() syscalls required by that
> library. As a side effect, the mappable aperture size is hardcoded to 64
> MiB on Android, however nothing seems to rely on this value anyway, as
> checked be grepping relevant code in drm_gralloc and Mesa.
>
> Cc: John Stultz 
> Cc: Rob Herring 
> Cc: John Stultz 
> Cc: Tomasz Figa 
> Signed-off-by: Tomasz Figa 
> [Emil Velikov: rebase against master]
> Signed-off-by: Emil Velikov 
> ---
> Tomasz, I've taken the liberty of pulling the patch from the Android
> tree. Hope you don't mind.

Thanks Emil for digging it up. I have no objections.

For reference, we used this as a quick hack before moving build of
graphics components to Chrome OS side. After that, for short time, we
had libpciaccess being built with autotools (and some small patch
disabling port IO related stuff). Eventually we got rid of it
completely, as Mesa stopped using libdrm_intel for i965 (and we don't
use i915).

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-22 Thread Tomasz Figa
On Thu, Feb 22, 2018 at 10:45 PM, Robin Murphy  wrote:
> [sorry, I had intended to reply sooner but clearly forgot]
>
>
> On 16/02/18 00:13, Tomasz Figa wrote:
>>
>> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy 
>> wrote:
>>>
>>> On 15/02/18 04:17, Tomasz Figa wrote:
>>> [...]
>>>>>
>>>>>
>>>>> Could you elaborate on what kind of locking you are concerned about?
>>>>> As I explained before, the normally happening fast path would lock
>>>>> dev->power_lock only for the brief moment of incrementing the runtime
>>>>> PM usage counter.
>>>>
>>>>
>>>>
>>>> My bad, that's not even it.
>>>>
>>>> The atomic usage counter is incremented beforehands, without any
>>>> locking [1] and the spinlock is acquired only for the sake of
>>>> validating that device's runtime PM state remained valid indeed [2],
>>>> which would be the case in the fast path of the same driver doing two
>>>> mappings in parallel, with the master powered on (and so the SMMU,
>>>> through device links; if master was not powered on already, powering
>>>> on the SMMU is unavoidable anyway and it would add much more latency
>>>> than the spinlock itself).
>>>
>>>
>>>
>>> We now have no locking at all in the map path, and only a per-domain lock
>>> around TLB sync in unmap which is unfortunately necessary for
>>> correctness;
>>> the latter isn't too terrible, since in "serious" hardware it should only
>>> be
>>> serialising a few cpus serving the same device against each other (e.g.
>>> for
>>> multiple queues on a single NIC).
>>>
>>> Putting in a global lock which serialises *all* concurrent map and unmap
>>> calls for *all* unrelated devices makes things worse. Period. Even if the
>>> lock itself were held for the minimum possible time, i.e. trivially
>>> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing
>>> that
>>> one cache line around between 96 CPUs across two sockets is not
>>> negligible.
>>
>>
>> Fair enough. Note that we're in a quite interesting situation now:
>>   a) We need to have runtime PM enabled on Qualcomm SoC to have power
>> properly managed,
>>   b) We need to have lock-free map/unmap on such distributed systems,
>>   c) If runtime PM is enabled, we need to call into runtime PM from any
>> code that does hardware accesses, otherwise the IOMMU API (and so DMA
>> API and then any V4L2 driver) becomes unusable.
>>
>> I can see one more way that could potentially let us have all the
>> three. How about enabling runtime PM only on selected implementations
>> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded
>> with if (pm_runtime_enabled()), which is lockless?
>
>
> Yes, that's the kind of thing I was gravitating towards - my vague thought
> was adding some flag to the smmu_domain, but pm_runtime_enabled() does look
> conceptually a lot cleaner.

Great, thanks. Looks like we're in agreement now. \o/

Vivek, does this sound reasonable to you?

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-22 Thread Tomasz Figa
On Fri, Feb 16, 2018 at 9:13 AM, Tomasz Figa  wrote:
> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy  wrote:
>> On 15/02/18 04:17, Tomasz Figa wrote:
>> [...]
>>>>
>>>> Could you elaborate on what kind of locking you are concerned about?
>>>> As I explained before, the normally happening fast path would lock
>>>> dev->power_lock only for the brief moment of incrementing the runtime
>>>> PM usage counter.
>>>
>>>
>>> My bad, that's not even it.
>>>
>>> The atomic usage counter is incremented beforehands, without any
>>> locking [1] and the spinlock is acquired only for the sake of
>>> validating that device's runtime PM state remained valid indeed [2],
>>> which would be the case in the fast path of the same driver doing two
>>> mappings in parallel, with the master powered on (and so the SMMU,
>>> through device links; if master was not powered on already, powering
>>> on the SMMU is unavoidable anyway and it would add much more latency
>>> than the spinlock itself).
>>
>>
>> We now have no locking at all in the map path, and only a per-domain lock
>> around TLB sync in unmap which is unfortunately necessary for correctness;
>> the latter isn't too terrible, since in "serious" hardware it should only be
>> serialising a few cpus serving the same device against each other (e.g. for
>> multiple queues on a single NIC).
>>
>> Putting in a global lock which serialises *all* concurrent map and unmap
>> calls for *all* unrelated devices makes things worse. Period. Even if the
>> lock itself were held for the minimum possible time, i.e. trivially
>> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that
>> one cache line around between 96 CPUs across two sockets is not negligible.
>
> Fair enough. Note that we're in a quite interesting situation now:
>  a) We need to have runtime PM enabled on Qualcomm SoC to have power
> properly managed,
>  b) We need to have lock-free map/unmap on such distributed systems,
>  c) If runtime PM is enabled, we need to call into runtime PM from any
> code that does hardware accesses, otherwise the IOMMU API (and so DMA
> API and then any V4L2 driver) becomes unusable.
>
> I can see one more way that could potentially let us have all the
> three. How about enabling runtime PM only on selected implementations
> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded
> with if (pm_runtime_enabled()), which is lockless?
>

Sorry for pinging, but any opinion on this kind of approach?

Best regards,
Tomasz

>>
>>> [1]
>>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028
>>> [2]
>>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613
>>>
>>> In any case, I can't imagine this working with V4L2 or anything else
>>> relying on any memory management more generic than calling IOMMU API
>>> directly from the driver, with the IOMMU device having runtime PM
>>> enabled, but without managing the runtime PM from the IOMMU driver's
>>> callbacks that need access to the hardware. As I mentioned before,
>>> only the IOMMU driver knows when exactly the real hardware access
>>> needs to be done (e.g. Rockchip/Exynos don't need to do that for
>>> map/unmap if the power is down, but some implementations of SMMU with
>>> TLB powered separately might need to do so).
>>
>>
>> It's worth noting that Exynos and Rockchip are relatively small
>> self-contained IP blocks integrated closely with the interfaces of their
>> relevant master devices; SMMU is an architecture, implementations of which
>> may be large, distributed, and have complex and wildly differing internal
>> topologies. As such, it's a lot harder to make hardware-specific assumptions
>> and/or be correct for all possible cases.
>>
>> Don't get me wrong, I do ultimately agree that the IOMMU driver is the only
>> agent who ultimately knows what calls are going to be necessary for whatever
>> operation it's performing on its own hardware*; it's just that for SMMU it
>> needs to be implemented in a way that has zero impact on the cases where it
>> doesn't matter, because it's not viable to specialise that driver for any
>> particular IP implementation/use-case.
>
> Still, exactly the same holds for the low power embedded use cases,
> where we strive for the lowest possible power consumption, while
> maintaining performance levels high as well. And so the SMMU code is
> expected to also work with our use cases, such as V4L2 or DRM drivers.
> Since these points don't hold for current SMMU code, I could say that
> the it has been already specialized for large, distributed
> implementations.
>
> Best regards,
> Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-15 Thread Tomasz Figa
On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy  wrote:
> On 15/02/18 04:17, Tomasz Figa wrote:
> [...]
>>>
>>> Could you elaborate on what kind of locking you are concerned about?
>>> As I explained before, the normally happening fast path would lock
>>> dev->power_lock only for the brief moment of incrementing the runtime
>>> PM usage counter.
>>
>>
>> My bad, that's not even it.
>>
>> The atomic usage counter is incremented beforehands, without any
>> locking [1] and the spinlock is acquired only for the sake of
>> validating that device's runtime PM state remained valid indeed [2],
>> which would be the case in the fast path of the same driver doing two
>> mappings in parallel, with the master powered on (and so the SMMU,
>> through device links; if master was not powered on already, powering
>> on the SMMU is unavoidable anyway and it would add much more latency
>> than the spinlock itself).
>
>
> We now have no locking at all in the map path, and only a per-domain lock
> around TLB sync in unmap which is unfortunately necessary for correctness;
> the latter isn't too terrible, since in "serious" hardware it should only be
> serialising a few cpus serving the same device against each other (e.g. for
> multiple queues on a single NIC).
>
> Putting in a global lock which serialises *all* concurrent map and unmap
> calls for *all* unrelated devices makes things worse. Period. Even if the
> lock itself were held for the minimum possible time, i.e. trivially
> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that
> one cache line around between 96 CPUs across two sockets is not negligible.

Fair enough. Note that we're in a quite interesting situation now:
 a) We need to have runtime PM enabled on Qualcomm SoC to have power
properly managed,
 b) We need to have lock-free map/unmap on such distributed systems,
 c) If runtime PM is enabled, we need to call into runtime PM from any
code that does hardware accesses, otherwise the IOMMU API (and so DMA
API and then any V4L2 driver) becomes unusable.

I can see one more way that could potentially let us have all the
three. How about enabling runtime PM only on selected implementations
(e.g. qcom,smmu) and then having all the runtime PM calls surrounded
with if (pm_runtime_enabled()), which is lockless?

>
>> [1]
>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028
>> [2]
>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613
>>
>> In any case, I can't imagine this working with V4L2 or anything else
>> relying on any memory management more generic than calling IOMMU API
>> directly from the driver, with the IOMMU device having runtime PM
>> enabled, but without managing the runtime PM from the IOMMU driver's
>> callbacks that need access to the hardware. As I mentioned before,
>> only the IOMMU driver knows when exactly the real hardware access
>> needs to be done (e.g. Rockchip/Exynos don't need to do that for
>> map/unmap if the power is down, but some implementations of SMMU with
>> TLB powered separately might need to do so).
>
>
> It's worth noting that Exynos and Rockchip are relatively small
> self-contained IP blocks integrated closely with the interfaces of their
> relevant master devices; SMMU is an architecture, implementations of which
> may be large, distributed, and have complex and wildly differing internal
> topologies. As such, it's a lot harder to make hardware-specific assumptions
> and/or be correct for all possible cases.
>
> Don't get me wrong, I do ultimately agree that the IOMMU driver is the only
> agent who ultimately knows what calls are going to be necessary for whatever
> operation it's performing on its own hardware*; it's just that for SMMU it
> needs to be implemented in a way that has zero impact on the cases where it
> doesn't matter, because it's not viable to specialise that driver for any
> particular IP implementation/use-case.

Still, exactly the same holds for the low power embedded use cases,
where we strive for the lowest possible power consumption, while
maintaining performance levels high as well. And so the SMMU code is
expected to also work with our use cases, such as V4L2 or DRM drivers.
Since these points don't hold for current SMMU code, I could say that
the it has been already specialized for large, distributed
implementations.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-14 Thread Tomasz Figa
On Thu, Feb 15, 2018 at 12:17 PM, Tomasz Figa  wrote:
> On Thu, Feb 15, 2018 at 1:03 AM, Robin Murphy  wrote:
>> On 14/02/18 10:33, Vivek Gautam wrote:
>>>
>>> On Wed, Feb 14, 2018 at 2:46 PM, Tomasz Figa  wrote:
>>>
>>> Adding Jordan to this thread as well.
>>>
>>>> On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam
>>>>  wrote:
>>>>>
>>>>> Hi Tomasz,
>>>>>
>>>>> On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa 
>>>>> wrote:
>>>>>>
>>>>>> On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi Tomasz,
>>>>>>>
>>>>>>> On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark 
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa 
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark 
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Vivek,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the patch. Please see my comments inline.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> While handling the concerned iommu, there should not be a
>>>>>>>>>>>>> need to power control the drm devices from iommu interface.
>>>>>>>>>>>>> If these drm devices need to be powered around this time,
>>>>>>>>>>>>> the respective drivers should take care of this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Replace the pm_runtime_get/put_sync() with
>>>>>>>>>>>>> pm_runtime_get/put_suppliers() calls, to power-up
>>>>>>>>>>>>> the connected iommu through the device link interface.
>>>>>>>>>>>>> In case the device link is not setup these get/put_suppliers()
>>>>>>>>>>>>> calls will be a no-op, and the iommu driver should take care of
>>>>>>>>>>>>> powering on its devices accordingly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Vivek Gautam 
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>   drivers/gpu/drm/msm/msm_iommu.c | 16 
>>>>>>>>>>>>>   1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c
>>>>>>>>>>>>> b/drivers/gpu/drm/msm/msm_iommu.c
>>>>>>>>>>>>> index b23d33622f37..1ab629bbee69 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c
>>>>>>>>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu
>>>>>>>>>>>>> *mmu, const char * const *names,
>>>>>>>>>>>>>  struct msm_iommu *iommu = to_msm_iommu(mmu);
>>>>>>>>>>>>>  int ret;
>>>>>>>>>>>>>
>>>>>>>>>>>>> -   pm_runtime_get_sync(mmu->dev);
>>>>>>>>>>>>> +   pm_runtime_get_suppliers(mmu->dev);
>>>>>>>>>>>>>  ret = iommu_attach_device(iommu->domain, mmu->dev);
>>>>>>>>>>>>> -   pm_runtime_put_sync(mmu->dev);
>>>&g

Re: [Freedreno] [PATCH v7 6/6] drm/msm: iommu: Replace runtime calls with runtime suppliers

2018-02-14 Thread Tomasz Figa
On Thu, Feb 15, 2018 at 1:12 AM, Rob Clark  wrote:
> On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse  
> wrote:
>> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote:
>>>
>>> - When submitting commands to the GPU, the GPU driver will
>>> pm_runtime_get_sync() on the GPU device, which will automatically do
>>> the same on all the linked suppliers, which would also include the
>>> SMMU itself. The role of device links here is exactly that the GPU
>>> driver doesn't have to care which other devices need to be brought up.
>>
>> This is true.  Assuming that the device link works correctly we would not 
>> need
>> to explicitly power the SMMU which makes my point entirely moot.
>
> Just to point out what motivated this patchset, the biggest problem is
> iommu_unmap() because that can happen when GPU is not powered on (or
> in the v4l2 case, because some other device dropped it's reference to
> the dma-buf allowing it to be free'd).  Currently we pm get/put the
> GPU device around unmap, but it is kinda silly to boot up the GPU just
> to unmap a buffer.

Note that in V4L2 both mapping and unmapping can happen completely
without involving the driver. So AFAICT the approach being implemented
by this patchset will not work, because there will be no one to power
up the IOMMU before the operation. Moreover, there are platforms for
which there is no reason to power up the IOMMU just for map/unmap,
because the hardware state is lost anyway and the only real work
needed is updating the page tables in memory. (I feel like this is
actually true for most of the platforms in the wild, but this is based
purely on the not so small number of platforms I worked with, haven't
bothered looking for more general evidence.)

>
> (Semi-related, I would also like to batch map/unmap's, I just haven't
> gotten around to implementing it yet.. but that would be another case
> where a single get_supplier()/put_supplier() outside of the iommu
> would make sense instead of pm_get/put() inside the iommu driver's
> ->unmap().)
>
> If you really dislike the get/put_supplier() approach, then perhaps we
> need iommu_pm_get()/iommu_pm_put() operations that the iommu user
> could use to accomplish the same thing?

I'm afraid this wouldn't work for V4L2 either. And I still haven't
been given any evidence that the approach I'm suggesting, which relies
only on existing pieces of infrastructure and which worked for both
Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC
SoCs.

Best regards,
Tomasz
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


  1   2   3   4   5   >