Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-25 Thread Hsia-Jun Li




On 8/25/23 15:40, Pekka Paalanen wrote:

Subject:
Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers
From:
Pekka Paalanen 
Date:
8/25/23, 15:40

To:
Hsia-Jun Li 
CC:
Tomasz Figa , linux...@kvack.org, 
dri-devel@lists.freedesktop.org, Linux Media Mailing List 
, hu...@google.com, 
a...@linux-foundation.org, Simon Ser , Hans Verkuil 
, dani...@collabora.com, ayaka 
, linux-ker...@vger.kernel.org, Nicolas Dufresne 




On Wed, 23 Aug 2023 15:11:23 +0800
Hsia-Jun Li  wrote:


On 8/23/23 12:46, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Hsia-Jun,

On Tue, Aug 22, 2023 at 8:14 PM Hsia-Jun Li  wrote:

Hello

I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
purpose of that is sharing metadata or just a pure container for cross
drivers.

We need to exchange some sort of metadata between drivers, likes dynamic
HDR data between video4linux2 and DRM.

If the metadata isn't too big, would it be enough to just have the
kernel copy_from_user() to a kernel buffer in the ioctl code?
   

Or the graphics frame buffer is
too complex to be described with plain plane's DMA-buf fd.
An issue between DRM and V4L2 is that DRM could only support 4 planes
while it is 8 for V4L2. It would be pretty hard for DRM to expend its
interface to support that 4 more planes which would lead to revision of
many standard likes Vulkan, EGL.

Could you explain how a shmem buffer could be used to support frame
buffers with more than 4 planes?
If you are asking why we need this:

1. metadata likes dynamic HDR tone data
2. DRM also challenges with this problem, let me quote what sima said:
"another trick that we iirc used for afbc is that sometimes the planes
have a fixed layout
like nv12
and so logically it's multiple planes, but you only need one plane slot
to describe the buffer
since I think afbc had the "we need more than 4 planes" issue too"

Unfortunately, there are vendor pixel formats are not fixed layout.

3. Secure(REE, trusted video piepline) info.

For how to assign such metadata data.
In case with a drm fb_id, it is simple, we just add a drm plane property
for it. The V4L2 interface is not flexible, we could only leave into
CAPTURE request_fd as a control.

Also, there is no reason to consume a device's memory for the content
that device can't read it, or wasting an entry of IOMMU for such data.

That's right, but DMA-buf doesn't really imply any of those. DMA-buf
is just a kernel object with some backing memory. It's up to the
allocator to decide how the backing memory is allocated and up to the
importer on whether it would be mapped into an IOMMU.
   

I just want to say it can't be allocated at the same place which was for
those DMA bufs(graphics or compressed bitstream).
This also could be answer for your first question, if we place this kind
of buffer in a plane for DMABUF(importing) in V4L2, V4L2 core would try
to prepare it, which could map it into IOMMU.


Usually, such a metadata would be the value should be written to a
hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.

Still, I have some problems with SHMEM:
1. I don't want the userspace modify the context of the SHMEM allocated
by the kernel, is there a way to do so?

This is generally impossible without doing any of the two:
1) copying the contents to an internal buffer not accessible to the
userspace, OR
2) modifying any of the buffer mappings to read-only

2) can actually be more costly than 1) (depending on the architecture,
data size, etc.), so we shouldn't just discard the option of a simple
copy_from_user() in the ioctl.
   

I don't want the userspace access it at all. So that won't be a problem.

Hi,

if userspace cannot access things like an image's HDR metadata, then it
will be impossible for userspace to program KMS to have the correct
color pipeline, or to send intended HDR metadata to a video sink.

You cannot leave userspace out of HDR metadata handling, because quite
probably the V4L2 buffer is not the only thing on screen. That means
there must composition of multiple sources with different image
properties and metadata, which means it is no longer obvious what HDR
metadata should be sent to the video sink.

Even if it is a TV-like application rather than a windowed desktop, you
will still have other contents to composite: OSD (volume indicators,
channels indicators, program guide, ...), sub-titles, channel logos,
notifications... These components ideally should not change their
appearance arbitrarily with the main program content and metadata
changes. Either the metadata sent to the video sink is kept static and
the main program adapted on the fly, or main program metadata is sent
to the video sink and the additional content is adapted on the fly.

There is only one set of HDR

Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-25 Thread Hsia-Jun Li




On 8/23/23 21:15, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


On Wed, Aug 23, 2023 at 4:11 PM Hsia-Jun Li  wrote:




On 8/23/23 12:46, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Hsia-Jun,

On Tue, Aug 22, 2023 at 8:14 PM Hsia-Jun Li  wrote:


Hello

I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
purpose of that is sharing metadata or just a pure container for cross
drivers.

We need to exchange some sort of metadata between drivers, likes dynamic
HDR data between video4linux2 and DRM.


If the metadata isn't too big, would it be enough to just have the
kernel copy_from_user() to a kernel buffer in the ioctl code?


Or the graphics frame buffer is
too complex to be described with plain plane's DMA-buf fd.
An issue between DRM and V4L2 is that DRM could only support 4 planes
while it is 8 for V4L2. It would be pretty hard for DRM to expend its
interface to support that 4 more planes which would lead to revision of
many standard likes Vulkan, EGL.


Could you explain how a shmem buffer could be used to support frame
buffers with more than 4 planes?
If you are asking why we need this:


I'm asking how your proposal to use shmem FD solves the problem for those cases.

The shmem fd is the reference to a metadata container(A C struct in 
kernel). Then drivers(V4L2 and DRM) could read this metadata when it 
processes the major buffer(SHMEM buf is the buffer assigned with a major 
buffer like the graphics buffer).

1. metadata likes dynamic HDR tone data
2. DRM also challenges with this problem, let me quote what sima said:
"another trick that we iirc used for afbc is that sometimes the planes
have a fixed layout
like nv12
and so logically it's multiple planes, but you only need one plane slot
to describe the buffer
since I think afbc had the "we need more than 4 planes" issue too"

Unfortunately, there are vendor pixel formats are not fixed layout.

3. Secure(REE, trusted video piepline) info.

For how to assign such metadata data.
In case with a drm fb_id, it is simple, we just add a drm plane property
for it. The V4L2 interface is not flexible, we could only leave into
CAPTURE request_fd as a control.


Also, there is no reason to consume a device's memory for the content
that device can't read it, or wasting an entry of IOMMU for such data.


That's right, but DMA-buf doesn't really imply any of those. DMA-buf
is just a kernel object with some backing memory. It's up to the
allocator to decide how the backing memory is allocated and up to the
importer on whether it would be mapped into an IOMMU.


I just want to say it can't be allocated at the same place which was for
those DMA bufs(graphics or compressed bitstream).
This also could be answer for your first question, if we place this kind
of buffer in a plane for DMABUF(importing) in V4L2, V4L2 core would try
to prepare it, which could map it into IOMMU.



V4L2 core will prepare it according to the struct device that is given
to it. For the planes that don't have to go to the hardware a struct
device could be given that doesn't require any DMA mapping. Also you
can check how the uvcvideo driver handles it. It doesn't use the vb2

Because it uses vb2_vmalloc_memops?
That vb2_vmalloc_attach_dmabuf() won't work anything.

buffers directly, but always writes to them using CPU (due to how the

Yes I noticed it would copy UBR buffer to vb2 buffer.

UVC protocol is designed).
I don't know what stops that, because we can't assume xHCI or EHCI have 
the IOMMU?


I think that is not I want. If you were not talking about META_CAPTURE, 
which would be a ioslated dev node.

For example, we have a NV15(2 planes) buffer with its HDR data.
We need its NV15 planes be accessed by DMA directly or it would be a 
performance issue(so UVC memcpy is not acceptable), while its HDR data, 
we just read it from the devices' registers or somewhere, should be ship 
with the exactly buffer.


Even we could expand the vb2_mem_ops interfaces, making it know which 
plane(ex. plane 0, 1 are graphics plane 2 is the metadata). The purpose 
here it not invoke the metadata buffer with any DMA buffer procedure.

Usually, such a metadata would be the value should be written to a
hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.

Still, I have some problems with SHMEM:
1. I don't want the userspace modify the context of the SHMEM allocated
by the kernel, is there a way to do so?


This is generally impossible without doing any of the two:
1) copying the contents to an internal buffer not accessible to the
userspace, OR
2) modifying any of the buffer mappings to read-only

2) can actually be

Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-23 Thread Hsia-Jun Li




On 8/23/23 12:46, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Hsia-Jun,

On Tue, Aug 22, 2023 at 8:14 PM Hsia-Jun Li  wrote:


Hello

I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
purpose of that is sharing metadata or just a pure container for cross
drivers.

We need to exchange some sort of metadata between drivers, likes dynamic
HDR data between video4linux2 and DRM.


If the metadata isn't too big, would it be enough to just have the
kernel copy_from_user() to a kernel buffer in the ioctl code?


Or the graphics frame buffer is
too complex to be described with plain plane's DMA-buf fd.
An issue between DRM and V4L2 is that DRM could only support 4 planes
while it is 8 for V4L2. It would be pretty hard for DRM to expend its
interface to support that 4 more planes which would lead to revision of
many standard likes Vulkan, EGL.


Could you explain how a shmem buffer could be used to support frame
buffers with more than 4 planes?
If you are asking why we need this:

1. metadata likes dynamic HDR tone data
2. DRM also challenges with this problem, let me quote what sima said:
"another trick that we iirc used for afbc is that sometimes the planes 
have a fixed layout

like nv12
and so logically it's multiple planes, but you only need one plane slot 
to describe the buffer

since I think afbc had the "we need more than 4 planes" issue too"

Unfortunately, there are vendor pixel formats are not fixed layout.

3. Secure(REE, trusted video piepline) info.

For how to assign such metadata data.
In case with a drm fb_id, it is simple, we just add a drm plane property 
for it. The V4L2 interface is not flexible, we could only leave into 
CAPTURE request_fd as a control.


Also, there is no reason to consume a device's memory for the content
that device can't read it, or wasting an entry of IOMMU for such data.


That's right, but DMA-buf doesn't really imply any of those. DMA-buf
is just a kernel object with some backing memory. It's up to the
allocator to decide how the backing memory is allocated and up to the
importer on whether it would be mapped into an IOMMU.

I just want to say it can't be allocated at the same place which was for 
those DMA bufs(graphics or compressed bitstream).
This also could be answer for your first question, if we place this kind 
of buffer in a plane for DMABUF(importing) in V4L2, V4L2 core would try 
to prepare it, which could map it into IOMMU.



Usually, such a metadata would be the value should be written to a
hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.

Still, I have some problems with SHMEM:
1. I don't want the userspace modify the context of the SHMEM allocated
by the kernel, is there a way to do so?


This is generally impossible without doing any of the two:
1) copying the contents to an internal buffer not accessible to the
userspace, OR
2) modifying any of the buffer mappings to read-only

2) can actually be more costly than 1) (depending on the architecture,
data size, etc.), so we shouldn't just discard the option of a simple
copy_from_user() in the ioctl.


I don't want the userspace access it at all. So that won't be a problem.

2. Should I create a helper function for installing the SHMEM file as a fd?


We already have the udmabuf device [1] to turn a memfd into a DMA-buf,
so maybe that would be enough?

[1] https://elixir.bootlin.com/linux/v6.5-rc7/source/drivers/dma-buf/udmabuf.c

It is the kernel driver that allocate this buffer. For example, v4l2 
CAPTURE allocate a buffer for metadata when VIDIOC_REQBUFS.

Or GBM give you a fd which is assigned with a surface.

So we need a kernel interface.

Best,
Tomasz



--
Hsia-Jun(Randy) Li


--
Hsia-Jun(Randy) Li


Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-22 Thread Hsia-Jun Li




On 8/23/23 03:55, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi,

Le mardi 22 août 2023 à 19:14 +0800, Hsia-Jun Li a écrit :

Hello

I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
purpose of that is sharing metadata or just a pure container for cross
drivers.

We need to exchange some sort of metadata between drivers, likes dynamic
HDR data between video4linux2 and DRM. Or the graphics frame buffer is
too complex to be described with plain plane's DMA-buf fd.
An issue between DRM and V4L2 is that DRM could only support 4 planes
while it is 8 for V4L2. It would be pretty hard for DRM to expend its
interface to support that 4 more planes which would lead to revision of
many standard likes Vulkan, EGL.

Also, there is no reason to consume a device's memory for the content
that device can't read it, or wasting an entry of IOMMU for such data.
Usually, such a metadata would be the value should be written to a
hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.

Still, I have some problems with SHMEM:
1. I don't want thhe userspace modify the context of the SHMEM allocated
by the kernel, is there a way to do so?
2. Should I create a helper function for installing the SHMEM file as a fd?


Please have a look at memfd and the seal feature, it does cover the reason why


That is the implement I need, it would affact the userspace not the 
kernel space. Should I expand a kAPI for memfd or just take the 
implement for the SHMEM?

This interfaces need to offer three things:
1. a fd for userspace to exchange between drivers
2. a kernel virtual address for accessing
3. userspace SEAL

Meanwhile, I am thinking whether we should offer a generic context 
header for such usage. Or we need another fields in a driver to describe it.

struct shmem_generic_container {
u64 format; /* use DRM modifier vendor bits but */
u32 size; /* size of the payload */
u8 payload[];
};
/* format linear for nesting dolls context */
struct shmem_nesting_container {
u32 num;
u64 formats[num];
u32 sizes[num];
u32 offsets[num]; /* offset from the payload below */
u8 payload[];
};

unsealed shared memory require full trust. For controls, the SEAL_WRITE is even
needed, as with appropriate timing, a malicous process can modify the data in-
between validation and allocation, causing possible memory overflow.

https://man7.org/linux/man-pages/man2/memfd_create.2.html
File sealing
In the absence of file sealing, processes that communicate via
shared memory must either trust each other, or take measures to
deal with the possibility that an untrusted peer may manipulate
the shared memory region in problematic ways.  For example, an
untrusted peer might modify the contents of the shared memory at
any time, or shrink the shared memory region.  The former
possibility leaves the local process vulnerable to time-of-check-
to-time-of-use race conditions (typically dealt with by copying
data from the shared memory region before checking and using it).
The latter possibility leaves the local process vulnerable to
SIGBUS signals when an attempt is made to access a now-
nonexistent location in the shared memory region.  (Dealing with
this possibility necessitates the use of a handler for the SIGBUS
signal.)

Dealing with untrusted peers imposes extra complexity on code
that employs shared memory.  Memory sealing enables that extra
complexity to be eliminated, by allowing a process to operate
secure in the knowledge that its peer can't modify the shared
memory in an undesired fashion.

[...]

regards,
Nicolas


--
Hsia-Jun(Randy) Li


[RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-22 Thread Hsia-Jun Li

Hello

I would like to introduce a usage of SHMEM slimier to DMA-buf, the major 
purpose of that is sharing metadata or just a pure container for cross 
drivers.


We need to exchange some sort of metadata between drivers, likes dynamic 
HDR data between video4linux2 and DRM. Or the graphics frame buffer is 
too complex to be described with plain plane's DMA-buf fd.
An issue between DRM and V4L2 is that DRM could only support 4 planes 
while it is 8 for V4L2. It would be pretty hard for DRM to expend its 
interface to support that 4 more planes which would lead to revision of 
many standard likes Vulkan, EGL.


Also, there is no reason to consume a device's memory for the content 
that device can't read it, or wasting an entry of IOMMU for such data.
Usually, such a metadata would be the value should be written to a 
hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.


Still, I have some problems with SHMEM:
1. I don't want thhe userspace modify the context of the SHMEM allocated 
by the kernel, is there a way to do so?

2. Should I create a helper function for installing the SHMEM file as a fd?

--
Hsia-Jun(Randy) Li


[PATCH] dma-buf/heaps: map CMA all pages for user

2023-08-21 Thread Hsia-Jun Li

Page fault would raise a CPU interrupt, it is not
a good idea.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 drivers/dma-buf/heaps/cma_heap.c | 26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/drivers/dma-buf/heaps/cma_heap.c 
b/drivers/dma-buf/heaps/cma_heap.c

index ee899f8e6721..7d0b15ad21a7 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -160,35 +160,15 @@ static int cma_heap_dma_buf_end_cpu_access(struct 
dma_buf *dmabuf,

return 0;
 }
 -static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
-{
-   struct vm_area_struct *vma = vmf->vma;
-   struct cma_heap_buffer *buffer = vma->vm_private_data;
-
-   if (vmf->pgoff > buffer->pagecount)
-   return VM_FAULT_SIGBUS;
-
-   vmf->page = buffer->pages[vmf->pgoff];
-   get_page(vmf->page);
-
-   return 0;
-}
-
-static const struct vm_operations_struct dma_heap_vm_ops = {
-   .fault = cma_heap_vm_fault,
-};
-
 static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct 
*vma)

 {
struct cma_heap_buffer *buffer = dmabuf->priv;
-
if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
return -EINVAL;
 -  vma->vm_ops = &dma_heap_vm_ops;
-   vma->vm_private_data = buffer;
-
-   return 0;
+   return remap_pfn_range(vma, vma->vm_start,
+  page_to_pfn(buffer->pages[vma->vm_pgoff]),
+  vma->vm_end - vma->vm_start, vma->vm_page_prot);
 }
  static void *cma_heap_do_vmap(struct cma_heap_buffer *buffer)
--
2.17.1



[PATCH] RFC: drm: Create a alloc helper flags blob

2023-03-22 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

In Android, we could also call gralloc to allocate a
graphics buffer for the decoder, display or encoder.
In the GNU Linux, we don't have such framework, the only
thing we could have is the GBM.
Unfortunately, some platforms don't have a GPU may not
ship the gbm library or the GBM is a part of proprietary
GPU driver. They may not know the allocation requirement
for the other display device.

So it would be better to offer an generic interfaces
for the application allocating the buffer from the 3rd place,
likes DMA-heap or DRM dumb.

The storage of this blob would is different to the modifier
blob, userspace would likes the format key and modifiers
data relation. It would be better to let application seek
the allocation flags they want.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_mode.h | 36 
 1 file changed, 36 insertions(+)

diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 46becedf5b2f..ee5b4d5aee0a 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -218,6 +218,11 @@ extern "C" {
 #define DRM_MODE_CONTENT_PROTECTION_DESIRED 1
 #define DRM_MODE_CONTENT_PROTECTION_ENABLED 2
 
+/* DRM buffer allocation flags */
+#define DRM_BUF_ALLOC_FLAG_DUMB_IMPORT (1UL << 63)
+#define DRM_BUF_ALLOC_FLAG_SEPARATE_PLANE  (1UL << 62)
+/* bits 0~31 were reserved for DMA-heap heap_flags */
+
 /**
  * struct drm_mode_modeinfo - Display mode information.
  * @clock: pixel clock in kHz
@@ -1168,6 +1173,37 @@ struct drm_format_modifier {
__u64 modifier;
 };
 
+struct drm_buf_alloc_flags_blob {
+#define FORMAT_BLOB_CURRENT 1
+   /* Version of this blob format */
+   __u32 version;
+
+   /* Flags */
+   __u32 flags;
+
+   /* Number of fourcc formats supported */
+   __u32 count_formats;
+
+   /* Where in this blob the formats exist (in bytes) */
+   __u32 formats_offset;
+
+   /* Number of drm_buf_alloc_flags */
+   __u32 count_alloc_flags;
+
+   /* Where in this blob the modifiers exist (in bytes) */
+   __u32 alloc_flags_offset;
+
+   /* __u32 formats[] */
+   /* struct drm_buf_alloc_flags alloc_flags[] */
+};
+
+struct drm_buf_alloc_flags {
+   __u32 format;
+   __u32 pad;
+   __u64 modifier_mask;
+   __u64 flags;
+};
+
 /**
  * struct drm_mode_create_blob - Create New blob property
  *
-- 
2.17.1



[PATCH v6 2/2] Documentation/gpu: Add Synaptics tiling formats documentation

2023-03-22 Thread Hsia-Jun Li
From: Randy Li 

Signed-off-by: Randy Li 
Signed-off-by: Hsia-Jun(Randy) Li 
---
 Documentation/gpu/synaptics.rst | 81 +
 1 file changed, 81 insertions(+)
 create mode 100644 Documentation/gpu/synaptics.rst

diff --git a/Documentation/gpu/synaptics.rst b/Documentation/gpu/synaptics.rst
new file mode 100644
index ..4185ca536bf1
--- /dev/null
+++ b/Documentation/gpu/synaptics.rst
@@ -0,0 +1,81 @@
+.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
+
+
+Synaptics Tiling
+
+
+The tiling pixel formats in Synpatics Video Smart platform have
+many variants. Tiles could form the group of tiles, pixels within
+the group (nearest) width and height are stored into tile.
+Meanwhile, the tile in a group may not follow dimension layout,
+tile could form a small group of tiles, then that (sub)group
+of tiles would form a bigger group. We won't describe the dimension
+layout inside the group of tiles here. The layout of the group
+of tiles is fixed with the group width and height parameters
+in the same generation of the platform.
+
+Compression
+===
+The proprietary lossless image compression protocol in Synaptics
+could minimizes the amount of data transferred (less memory bandwidth
+consumption) between devices. It would usually apply to the tiling
+pixel format.
+
+Each component would request an extra page aligned length buffer
+for storing the compression meta data. Also a 32 bytes parameters
+set would come with a compression meta data buffer.
+
+The component here corresponds to a signal type (i.e. Luma, chroma).
+They could be encoded into one or multiple metadata planes, but
+their compression parameters still would be individual.
+
+Pixel format modifiers
+==
+Addition alignment requirement for stride and size of a memory plane
+could apply beyond what has been mentioned below. Remember always
+negotiating with all the devices in pipeline before allocation.
+
+.. flat-table:: Synpatics Image Format Modifiers
+
+   * - Identifier
+ - Fourcc
+ - Details
+
+   * - DRM_FORMAT_MOD_SYNA_V4H1
+ - DRM_FORMAT_NV12
+ - The plain uncompressed 8 bits tile format. It sounds similar to
+   Intel's Y-tile. but it won't take any pixel from the next X direction
+   in a tile group. The line stride and image height must be aligned to
+   a multiple of 16. The height of chrominance plane would plus 8.
+
+   * - DRM_FORMAT_MOD_SYNA_V4H3P8
+ - DRM_FORMAT_NV15
+ - The plain uncompressed 10 bits tile format. It stores pixel in 2D
+   3x4 tiles with a 8bits padding to each of tile. Then a tile is in a
+   128 bits cache line.
+
+   * - DRM_FORMAT_MOD_SYNA_V4H1_64L4_COMPRESSED
+ - DRM_FORMAT_NV12
+ - Group of tiles and compressed variant of 
``DRM_FORMAT_MOD_SYNA_V4H1``.
+A group of tiles would contain 64x4 pixels, where a tile has 1x4
+pixel.
+
+   * - DRM_FORMAT_MOD_SYNA_V4H3P8_64L4_COMPRESSED
+ - DRM_FORMAT_NV15
+ - Group of tiles and compressed variant of 
``DRM_FORMAT_MOD_SYNA_V4H3P8``.
+ A group of tiles would contains 48x4 pixels, where a tile has 3x4 pixels
+ and a 8 bits padding in the end of a tile. A group of tiles would
+ be 256 bytes.
+
+   * - ``DRM_FORMAT_MOD_SYNA_V4H1_128L128_COMPRESSED``
+ - DRM_FORMAT_NV12
+ - Group of tiles and compressed variant of 
``DRM_FORMAT_MOD_SYNA_V4H1``.
+A group of tiles would contain 128x32 pixels, where a tile has 1x4
+pixel.
+
+   * - ``DRM_FORMAT_MOD_SYNA_V4H3P8_128L128_COMPRESSED``
+ - DRM_FORMAT_NV15
+ - Group of tiles and compressed variant of 
``DRM_FORMAT_MOD_SYNA_V4H3P8``.
+ A group of tiles would contains 96x128 pixels, where a tile has 3x4 pixels
+ and a 8 bits padding in the end of a tile. A group of tiles would
+ be 16 KiB.
-- 
2.17.1



[PATCH v6 1/2] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2023-03-22 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_fourcc.h | 75 +++
 1 file changed, 75 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index de703c6be969..ee13250f06f4 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -419,6 +419,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b
 
 /* add more to the end as needed */
 
@@ -1519,6 +1520,80 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 #define AMD_FMT_MOD_CLEAR(field) \
(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
 
+/*
+ * Synaptics VideoSmart modifiers
+ *
+ * Tiles could be arranged in Groups of Tiles (GOTs), it is a small tile
+ * within a tile. GOT size and layout varies based on platform and
+ * performance concern. When the compression is applied, it is possible
+ * that we would have two tile type in the GOT, these parameters can't
+ * tell the secondary tile type.
+ *
+ * Besides, an 8 size 4 bytes arrary (32 bytes) would be need to store
+ * some compression parameters for a compression meta data plane.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.
+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.
+ *
+ * 20:20 g GOT packing flag.
+ *
+ * 23:21 - Reserved for future use.  Must be zero.
+ *
+ * 27:24 h log2(horizontal) of bytes, in GOTs.
+ *
+ * 31:28 v log2(vertical) of bytes, in GOTs.
+ *
+ * 35:32 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, g, h, v, c) \
+   fourcc_mod_code(SYNAPTICS, ((__u64)((f) & 0xff) | \
+((__u64)((m) & 0xff) << 8) | \
+((__u64)((p) & 0xf) << 16) | \
+((__u64)((g) & 0x1) << 20) | \
+((__u64)((h) & 0xf) << 24) | \
+((__u64)((v) & 0xf) << 28) | \
+((__u64)((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 7, 7, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 7, 7, 1)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.17.1



[PATCH v6 0/2] Add pixel formats used in Synatpics SoC

2023-03-22 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Those pixel formats are used in Synaptics's VideoSmart series SoCs,
likes VS640, VS680. I just disclose the pixel formats used in the video
codecs and display pipeline this time. Actually any device connected to
the MTR module could support those tiling and compressed pixel formats.

https://synaptics.com/products/multimedia-solutions

Changelog:
v6:
Refresh and fix warnings in its document.
v5:
Moving back the document and rewriting the description.
v4:
Removed the patches for V4L2, V4L2 would use the drm_fourcc.h .
Moving the documents to the mesa project.
v3:
There was a mistake in format macro.
Correcting the description of 64L4 variant modifiers.
v2:
The DRM modifiers in the first draft is too simple, it can't tell
the tiles in group attribute in memory layout.
Removing the v4l2 fourcc. Adding a document for the future v4l2 extended
fmt.
v1:
first draft of DRM modifiers
Try to put basic tile formats into v4l2 fourcc

Hsia-Jun(Randy) Li (1):
  drm/fourcc: Add Synaptics VideoSmart tiled modifiers

Randy Li (1):
  Documentation/gpu: Add Synaptics tiling formats documentation

 Documentation/gpu/synaptics.rst | 81 +
 include/uapi/drm/drm_fourcc.h   | 75 ++
 2 files changed, 156 insertions(+)
 create mode 100644 Documentation/gpu/synaptics.rst

-- 
2.17.1



[PATCH v5 1/2] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-11-30 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_fourcc.h | 76 +++
 1 file changed, 76 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index bc056f2d537d..e0905f573f43 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -407,6 +407,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b
 
 /* add more to the end as needed */
 
@@ -1507,6 +1508,81 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 #define AMD_FMT_MOD_CLEAR(field) \
(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
 
+/*
+ * Synaptics VideoSmart modifiers
+ *
+ * Tiles could be arranged in Groups of Tiles (GOTs), it is a small tile
+ * within a tile. GOT size and layout varies based on platform and
+ * performance concern.
+ *
+ * Besides, an 8 length 4 bytes arrary (32 bytes) would be need to store
+ * some compression parameters for a compression metadata plane.
+ *
+ * Further information can be found in
+ * Documentation/gpu/synaptics.rst
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.
+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.
+ *
+ * 20:20 g GOT packing flag.
+ *
+ * 23:21 - Reserved for future use.  Must be zero.
+ *
+ * 27:24 h log2(horizontal) of pixels, in GOTs.
+ *
+ * 31:28 v log2(vertical) of pixels, in GOTs.
+ *
+ * 35:32 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, g, h, v, c) \
+   fourcc_mod_code(SYNAPTICS, ((__u64)((f) & 0xff) | \
+((__u64)((m) & 0xff) << 8) | \
+((__u64)((p) & 0xf) << 16) | \
+((__u64)((g) & 0x1) << 20) | \
+((__u64)((h) & 0xf) << 24) | \
+((__u64)((v) & 0xf) << 28) | \
+((__u64)((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 7, 7, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 7, 7, 1)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.37.3



[PATCH v5 2/2] Documentation/gpu: Add Synaptics tiling formats documentation

2022-11-30 Thread Hsia-Jun Li
From: Randy Li 

Signed-off-by: Randy Li 
Signed-off-by: Hsia-Jun(Randy) Li 
---
 Documentation/gpu/drivers.rst   |   1 +
 Documentation/gpu/synaptics.rst | 104 
 2 files changed, 105 insertions(+)
 create mode 100644 Documentation/gpu/synaptics.rst

diff --git a/Documentation/gpu/drivers.rst b/Documentation/gpu/drivers.rst
index 3a52f48215a3..7e820c93d994 100644
--- a/Documentation/gpu/drivers.rst
+++ b/Documentation/gpu/drivers.rst
@@ -18,6 +18,7 @@ GPU Driver Documentation
xen-front
afbc
komeda-kms
+   synaptics
 
 .. only::  subproject and html
 
diff --git a/Documentation/gpu/synaptics.rst b/Documentation/gpu/synaptics.rst
new file mode 100644
index ..b0564d2fe3ce
--- /dev/null
+++ b/Documentation/gpu/synaptics.rst
@@ -0,0 +1,104 @@
+.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
+
+
+Synaptics Tiling
+
+
+The tiling pixel formats in Synpatics Video Smart platform have
+many variants. Tiles could form the group of tiles(GOT), pixels
+within a group rectangle are stored into tile.
+There are two parameters which consist a modifier described
+the (nearest) width and height pixels in a group.
+
+Meanwhile, the tile in a group may not follow dimension
+layout, tile could form a small group of tiles, then that (sub)group
+of tiles would form a bigger group. We won't describe the dimension
+layout inside the group of tiles here. The layout of the group
+of tiles is fixed with the group width and height parameters
+in the same generation of the platform.
+
+Compression
+===
+The proprietary lossless image compression protocol in Synaptics
+could minimizes the amount of data transferred (less memory bandwidth
+consumption) between devices. It would usually apply to the tiling
+pixel format.
+
+Each component would request an extra page aligned length buffer
+for storing the compression meta data. Also a 32 bytes parameters
+set would come with a compression meta data buffer.
+
+The component here corresponds to a signal type (i.e. Luma, Chroma).
+They could be encoded into one or multiple metadata planes, but
+their compression parameters would still be individual.
+
+Pixel format modifiers
+==
+Addition alignment requirement for stride and size of a memory plane
+could apply beyond what has been mentioned below. Remember always
+negotiating with all the devices in pipeline before allocation.
+
+.. raw:: latex
+
+\small
+
+.. tabularcolumns:: |p{5.8cm}|p{1.2cm}|p{10.3cm}|
+
+.. cssclass:: longtable
+
+.. flat-table:: Synpatics Image Format Modifiers
+:header-rows:  1
+:stub-columns: 0
+:widths:   3 1 8
+
+* - Identifier
+  - Fourcc
+  - Details
+* .. _DRM-FORMAT-MOD-SYNA-V4H1
+
+  - ``DRM_FORMAT_MOD_SYNA_V4H1``
+  - ``DRM_FORMAT_NV12``
+  - The plain uncompressed 8 bits tile format. It sounds similar to
+   Intel's Y-tile. but it won't take any pixel from the next X direction
+   in a tile group. The line stride and image height must be aligned to
+   a multiple of 16. The height of chrominance plane would plus 8.
+* .. _DRM-FORMAT-MOD-SYNA-V4H3P8
+
+  - ``DRM_FORMAT_MOD_SYNA_V4H3P8``
+  - ``DRM_FORMAT_NV15``
+  - The plain uncompressed 10 bits tile format. It stores pixel in 2D
+   3x4 tiles with a 8bits padding to each of tile. Then a tile is in a
+   128 bits cache line.
+* .. _DRM-FORMAT-MOD-SYNA-V4H1-64L4-COMPRESSED
+
+  - ``DRM_FORMAT_MOD_SYNA_V4H1_64L4_COMPRESSED``
+  - ``DRM_FORMAT_NV12``
+  - Group of tiles and compressed variant of ``DRM_FORMAT_MOD_SYNA_V4H1``.
+   A group of tiles would contain 64x4 pixels, where a tile has 1x4
+   pixel.
+* .. _DRM-FORMAT-MOD-SYNA-V4H3P8-64L4-COMPRESSED
+
+  - ``DRM_FORMAT_MOD_SYNA_V4H3P8_64L4_COMPRESSED``
+  - ``DRM_FORMAT_NV15``
+  - Group of tiles and compressed variant of 
``DRM_FORMAT_MOD_SYNA_V4H3P8``.
+   A group of tiles would contains 48x4 pixels, where a tile has 3x4 pixels
+   and a 8 bits padding in the end of a tile. A group of tiles would
+   be 256 bytes.
+* .. _DRM-FORMAT-MOD-SYNA-V4H1-128L128-COMPRESSED
+
+  - ``DRM_FORMAT_MOD_SYNA_V4H1_128L128_COMPRESSED``
+  - ``DRM_FORMAT_NV12``
+  - Group of tiles and compressed variant of ``DRM_FORMAT_MOD_SYNA_V4H1``.
+   A group of tiles would contain 128x32 pixels, where a tile has 1x4 
pixel.
+* .. _DRM-FORMAT-MOD-SYNA-V4H3P8-128L128-COMPRESSED
+
+  - ``DRM_FORMAT_MOD_SYNA_V4H3P8_128L128_COMPRESSED``
+  - ``DRM_FORMAT_NV15``
+  - Group of tiles and compressed variant of 
``DRM_FORMAT_MOD_SYNA_V4H3P8``.
+   A group of tiles would contains 96x128 pixels, where a tile has 3x4 
pixels
+   and a 8 bits padding in the end of a tile. A group of tiles would 
+   be 16 KiB.
+
+.. raw:: latex
+
+\normalsize
-- 
2.37.3



[PATCH v5 0/2] Add pixel formats used in Synatpics SoC

2022-11-30 Thread Hsia-Jun Li
Those pixel formats are used in Synaptics's VideoSmart series SoCs,
likes VS640, VS680. I just disclose the pixel formats used in the video
codecs and display pipeline this time. Actually any device connected to
the MTR module could support those tiling and compressed pixel formats.

https://synaptics.com/products/multimedia-solutions

Changelog:
v5:
Moving back the document and rewriting the description.
v4:
Removed the patches for V4L2, V4L2 would use the drm_fourcc.h .
Moving the documents to the mesa project.
v3:
There was a mistake in format macro.
Correcting the description of 64L4 variant modifiers.
v2:
The DRM modifiers in the first draft is too simple, it can't tell
the tiles in group attribute in memory layout.
Removing the v4l2 fourcc. Adding a document for the future v4l2 extended
fmt.
v1:
first draft of DRM modifiers
Try to put basic tile formats into v4l2 fourcc

Hsia-Jun(Randy) Li (1):
  drm/fourcc: Add Synaptics VideoSmart tiled modifiers

Randy Li (1):
  Documentation/gpu: Add Synaptics tiling formats documentation

 Documentation/gpu/drivers.rst   |   1 +
 Documentation/gpu/synaptics.rst | 104 
 include/uapi/drm/drm_fourcc.h   |  76 +++
 3 files changed, 181 insertions(+)
 create mode 100644 Documentation/gpu/synaptics.rst

-- 
2.37.3



Re: [RFC] drm/fourcc: Add a modifier for contiguous memory

2022-11-29 Thread Hsia-Jun Li




On 11/29/22 18:42, Daniel Stone wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Randy,

On Tue, 29 Nov 2022 at 10:11, Hsia-Jun Li  wrote:

Currently, we assume all the pixel formats are multiple planes, devices
could support each component has its own memory plane.
But that may not apply for any device in the world. We could have a
device without IOMMU then this is not impossible.

Besides, when we export an handle through the PRIME, the upstream
device(likes a capture card or camera) may not support non-contiguous
memory. It would be better to allocate the handle in contiguous memory
at the first time.

We may think the memory allocation is done in user space, we could do
the trick there. But the dumb_create() sometimes is not the right API
for that.

"Note that userspace is not allowed to use such objects for render
acceleration - drivers must create their own private ioctls for such a
use case."
"Note that dumb objects may not be used for gpu acceleration, as has
been attempted on some ARM embedded platforms. Such drivers really must
have a hardware-specific ioctl to allocate suitable buffer objects."

We need to relay on those device custom APIs then. It would be helpful
for their library to calculate the right size for contiguous memory. It
would be useful for the driver supports rendering dumb buffer as well.


As a buffer can only have a single modifier, this isn't practical.
Usually only those legacy or low cost devices would request this 
modifier. Unlikely they would support tile format(or we would add 
support for them).


But yes, we would be better not set a trap for us.

Contiguous needs to be negotiated separately and out of band. See e.g.
dma-heaps for this.
I don't really like the Android way here. If we are in a world of no 
hot-plug. That would be fine.


V4L2 has had a way to negotiate the memory layout it could support. 
Android gralloc would use the fixed platform sentences to decide the 
memory layout and buffer size. That is not flexible.


So would it be better that I add a common property(a list be the same 
length of the formats property) in drm_plane ?


Cheers,
Daniel


--
Hsia-Jun(Randy) Li


Re: [RFC] drm/fourcc: Add a modifier for contiguous memory

2022-11-29 Thread Hsia-Jun Li




On 11/29/22 18:18, Simon Ser wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Format modifiers are for the buffer layout only, not for the allocation
parameters, placement, etc. See the doc comment at the top of
drm_fourcc.h.
In the v4l2 mail list, we have such proposal that dropping the pixel 
formats(not the codec formats) from v4l2 header completely, as the 
growing of tile pixel formats.
But we can't get rid of those variants about non-contiguous(the same 
value FOURCC in v4l2 are all for the contiguous memory).


Before I solve this problem, I believe the support for tile formats in 
v4l2 would never be stable.


The most common way here is to hack the pixel format modifier, then 
userspace library could be aware this in allocation and get properties 
of the drm_planes.


Or another way, we could add a common plane property, indicated that 
whether the driver requests contiguous memory plane for a format?

--
Hsia-Jun(Randy) Li


[RFC] drm/fourcc: Add a modifier for contiguous memory

2022-11-29 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Hello All

Currently, we assume all the pixel formats are multiple planes, devices
could support each component has its own memory plane.
But that may not apply for any device in the world. We could have a
device without IOMMU then this is not impossible.

Besides, when we export an handle through the PRIME, the upstream
device(likes a capture card or camera) may not support non-contiguous
memory. It would be better to allocate the handle in contiguous memory
at the first time.

We may think the memory allocation is done in user space, we could do
the trick there. But the dumb_create() sometimes is not the right API
for that.

"Note that userspace is not allowed to use such objects for render
acceleration - drivers must create their own private ioctls for such a
use case."
"Note that dumb objects may not be used for gpu acceleration, as has
been attempted on some ARM embedded platforms. Such drivers really must
have a hardware-specific ioctl to allocate suitable buffer objects."

We need to relay on those device custom APIs then. It would be helpful
for their library to calculate the right size for contiguous memory. It
would be useful for the driver supports rendering dumb buffer as well.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_fourcc.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index bc056f2d537d..ec039ced8257 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -473,6 +473,11 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_LINEAR  fourcc_mod_code(NONE, 0)
 
+/*
+ * Contiguous memory
+ */
+#define DRM_FORMAT_MOD_CONTIG_MEM  fourcc_mod_code(NONE, 1)
+
 /*
  * Deprecated: use DRM_FORMAT_MOD_LINEAR instead
  *
-- 
2.17.1



Re: [PATCH v4] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-11-23 Thread Hsia-Jun Li




On 11/24/22 01:27, Daniel Vetter wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


On Thu, Nov 24, 2022 at 01:14:48AM +0800, Randy Li wrote:



On Nov 24, 2022, at 12:42 AM, Daniel Vetter  wrote:

On Wed, Nov 23, 2022 at 10:58:11PM +0800, Jisheng Zhang wrote:

On Wed, Nov 23, 2022 at 05:19:57PM +0800, Hsia-Jun Li wrote:
From: "Hsia-Jun(Randy) Li" 
Memory Traffic Reduction(MTR) is a module in Synaptics
VideoSmart platform could process lossless compression image
and cache the tile memory line.
Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.
We would allocate the same size of memory for uncompressed
and compressed luma and chroma data, while the compressed buffer
would request two extra planes holding the metadata for
the decompression.
Signed-off-by: Hsia-Jun(Randy) Li 
---
include/uapi/drm/drm_fourcc.h | 75 +++
1 file changed, 75 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index bc056f2d537d..ca0b4ca70b36 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -407,6 +407,7 @@ extern "C" {
#define DRM_FORMAT_MOD_VENDOR_ARM 0x08
#define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b

Any users in the mainline tree?

Not yet. I believe a V4L2 codec would be the first one.
Still there are many patches are requested for v4l2 which currently does
not support format modifier. You could find discussion in linux media
list.

This does need the agreement from drm maintainers, three of us tend to
drop the pixel formats in video4linux2.h only keeping those codec
formats in new extended v4l2 format negotiation interface. All the pixel
formats should go to drm_fourcc.h while we can’t decide how to present
those hardware requests contiguous memory.


Uh no.

These enums are maintained in drm_fourcc.h, by drm maintainers. You
_cannot_ mix them up with the fourcc enums that video4linux2.h has, that's
a completely different enum space because fourcc codes are _not_ a
standard.



Things us in v4l2 try to solve is the those non contiguous memory planes 
in v4l2, we don’t want to increase them anymore. Besides the values for 
pixel formats are the same between V4L2 and DRM.

Please do not ever mix up drm_fourcc format modifiers with v4l2 fourcc
codes, that will result in complete chaos. There's a reason why there's
only one authoritative source for these.



In the previous version, it would fail in building, because a driver’s 
header(ipu-v3) would included both v4l2 and drm. I can’t add another 
format modifier macro to v4l2.
If DRM doesn’t like the idea that v4l2 use the fourcc from DRM, I should 
inform people about that.

We don’t bring those NV12M into drm_fourcc.h, we hate that.

Note that drm_fourcc.h serves as the vendor-neutral registry for these
numbers, and they're referenced in both gl and vk extensions. So this is
the one case where we do _not_ require in-kernel users or open source
userspace.


The first user for these pixel formats would be the software pixel reader for 
Gstreamer, I am planning to add the unpacker for the two uncompressed pixel 
formats.

If there is someone interested in an in-kernel or open userspace driver
though it would be really great to have their acks before merging. Just to
make sure that the modifiers will work with both upstream and downstream
driver stacks.

This patch have been reviewed internally, it is good enough to describe our 
pixel formats.


I just realized that we've failed to document this, I'll type up a patch.


About the format itself, I have sent the document to the mesa, you could find a 
MR there.


Please include the link to that MR in the patch description.

mesa !19921

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19921

I would like to do that when the document got more reviewed.

-Daniel


-Daniel



/* add more to the end as needed */
@@ -1507,6 +1508,80 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
#define AMD_FMT_MOD_CLEAR(field) \
   (~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
+/*
+ * Synaptics VideoSmart modifiers
+ *
+ * Tiles could be arranged in Groups of Tiles (GOTs), it is a small tile
+ * within a tile. GOT size and layout varies based on platform and
+ * performance concern. When the compression is applied, it is possible
+ * that we would have two tile type in the GOT, these parameters can't
+ * tell the secondary tile type.
+ *
+ * Besides, an 8 size 4 bytes arrary (32 bytes) would be need to store
+ * some compression parameters for a compression met

[PATCH v4] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-11-23 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Memory Traffic Reduction(MTR) is a module in Synaptics
VideoSmart platform could process lossless compression image
and cache the tile memory line.

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

We would allocate the same size of memory for uncompressed
and compressed luma and chroma data, while the compressed buffer
would request two extra planes holding the metadata for
the decompression.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_fourcc.h | 75 +++
 1 file changed, 75 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index bc056f2d537d..ca0b4ca70b36 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -407,6 +407,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b
 
 /* add more to the end as needed */
 
@@ -1507,6 +1508,80 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 #define AMD_FMT_MOD_CLEAR(field) \
(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
 
+/*
+ * Synaptics VideoSmart modifiers
+ *
+ * Tiles could be arranged in Groups of Tiles (GOTs), it is a small tile
+ * within a tile. GOT size and layout varies based on platform and
+ * performance concern. When the compression is applied, it is possible
+ * that we would have two tile type in the GOT, these parameters can't
+ * tell the secondary tile type.
+ *
+ * Besides, an 8 size 4 bytes arrary (32 bytes) would be need to store
+ * some compression parameters for a compression meta data plane.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.
+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.
+ *
+ * 20:20 g GOT packing flag.
+ *
+ * 23:21 - Reserved for future use.  Must be zero.
+ *
+ * 27:24 h log2(horizontal) of bytes, in GOTs.
+ *
+ * 31:28 v log2(vertical) of bytes, in GOTs.
+ *
+ * 35:32 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, g, h, v, c) \
+   fourcc_mod_code(SYNAPTICS, ((__u64)((f) & 0xff) | \
+((__u64)((m) & 0xff) << 8) | \
+((__u64)((p) & 0xf) << 16) | \
+((__u64)((g) & 0x1) << 20) | \
+((__u64)((h) & 0xf) << 24) | \
+((__u64)((v) & 0xf) << 28) | \
+((__u64)((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 7, 7, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 7, 7, 1)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.17.1



[PATCH v3 4/4] media: docs: Add Synpatics tile modifiers

2022-11-01 Thread Hsia-Jun Li
From: Randy Li 

The pixel formats used in Synpatics video smart platform
are too many. It is impossible to store them in fourcc
namespace.

Signed-off-by: Randy Li 
Signed-off-by: Hsia-Jun(Randy) Li 
---
 .../media/v4l/pixfmt-synaptics.rst| 86 +++
 .../userspace-api/media/v4l/pixfmt.rst|  1 +
 2 files changed, 87 insertions(+)
 create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst

diff --git a/Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst 
b/Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst
new file mode 100644
index ..edf6525a3ef4
--- /dev/null
+++ b/Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst
@@ -0,0 +1,86 @@
+.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
+
+.. _pixfmt-synaptics:
+
+***
+Synaptics Pixel Format Modifiers
+***
+
+The tiled pixel formats in synpatics video smart platform have
+many variants. Here just list the most widely pixel format modifiers
+here. The value here should be the same as the one defined in the
+``drm_fourcc.h`` file.
+
+.. tabularcolumns:: |p{6.6cm}|p{2.2cm}|p{8.5cm}|
+
+.. raw:: latex
+
+   \small
+
+.. _reserved-formats:
+
+.. flat-table:: Synpatics Image Format Modifiers
+   :header-rows:  1
+   :stub-columns: 0
+   :widths:   3 1 4
+
+   * - Identifier
+ - Code
+ - Details
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H1:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H1``
+ - '0x0b000101'
+ - The plain uncompressed 8bits tile format. It sounds similar to
+   Intel's Y-tile. but it won't take any pixel from the next X direction
+   in a tile group. The line stride and image height must be aligned to
+   a multiple of 16. The height of chrominance plane would plus 8.
+   This modifier current would be in conjunction with ``V4L2_PIX_FMT_NV12``
+   or ``V4L2_PIX_FMT_NV12M``.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H3P8:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H3P8``
+ - '0x0b080301'
+ - The plain uncompressed 10bits tile format. It stores pixel in 2D
+   3x4 tiles with a 8bits padding to each of tiles. Then tile is in a
+   128 bytes cache line. This modifier would be in conjunction with
+   ``V4L2_PIX_FMT_NV15``.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H1-64L4C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H1_64L4C``
+ - '0x0b0026100101'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H1``. It stores 64x4 pixels
+   in 1x4 tiles. Each plane would request a meta plane (MTR plane) for
+   decompression. A MTR plane would have a 32 bytes parameters set.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H3P8-64L4C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H3P8_64L4C``
+ - '0x0b0026180301'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H3``. It stores 48x4 pixels
+   in 3x4 tiles, echo tile would have 8 bits padding. Then a group of tiles
+   would be 16 bytes(128 bits).
+
+   Each plane would request a meta plane (MTR plane) for decompression.
+   A MTR plane would have a 32 bytes parameters set.
+
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H1-128L128C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H1_128L128C``
+ - '0x0b0077100101'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H1``. It stores 128x32 pixels
+   in 1x4 tiles. Each plane would request a meta plane (MTR plane) for
+   decompression. A MTR plane would have a 32 bytes parameters set.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H3P8-128L128C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H3P8_128L128C``
+ - '0x0b0077180301'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H3``. It stores 96x128 pixels
+   in 3x4 tiles, echo tile would have 8 bits padding. Then a group of tiles
+   would be 16 KiB.
+
+   Each plane would request a meta plane (MTR plane) for
+   decompression. A MTR plane would have a 32 bytes parameters set.
+
+.. raw:: latex
+
+   \normalsize
diff --git a/Documentation/userspace-api/media/v4l/pixfmt.rst 
b/Documentation/userspace-api/media/v4l/pixfmt.rst
index 11dab4a90630..738a160a4c41 100644
--- a/Documentation/userspace-api/media/v4l/pixfmt.rst
+++ b/Documentation/userspace-api/media/v4l/pixfmt.rst
@@ -36,3 +36,4 @@ see also :ref:`VIDIOC_G_FBUF `.)
 colorspaces
 colorspaces-defs
 colorspaces-details
+pixfmt-synaptics
-- 
2.17.1



[PATCH v3 3/4] media: videodev2.h: add Synaptics tiled modifiers

2022-11-01 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

These modifiers would have the same values as the one defined
in drm_fourcc.h, they would just be named in v4l2 style.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/linux/videodev2.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index d00b2e9c0c54..71136f29362e 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -820,6 +820,8 @@ struct v4l2_pix_format {
  *  F O R M A T   M O D I F I E S
  */
 /* Vendor Ids: */
+#define V4L2_PIX_FMT_MOD_VENDOR_SYNAPTICS 0x0b
+
 #define V4L2_PIX_FMT_RESERVED   ((1ULL << 56) - 1)
 
 #define fourcc_mod_get_vendor(modifier) \
@@ -835,6 +837,34 @@ struct v4l2_pix_format {
 #define V4L2_PIX_FMT_MOD_INVALID  fourcc_mod_code(NONE, V4L2_PIX_FMT_RESERVED)
 #define V4L2_PIX_FMT_MOD_LINEAR   fourcc_mod_code(NONE, 0)
 
+/* Synaptics VideoSmart modifiers */
+#define V4L2_PIX_FMT_MOD_SYNA_V4_TILEDfourcc_mod_code(SYNAPTICS, 1)
+#define V4L2_PIX_FMT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, g, h, v, c) \
+   fourcc_mod_code(SYNAPTICS, ((__u64)((f) & 0xff) | \
+((__u64)((m) & 0xff) << 8) | \
+((__u64)((p) & 0xf) << 16) | \
+((__u64)((g) & 0x1) << 20) | \
+((__u64)((h) & 0xf) << 24) | \
+((__u64)((v) & 0xf) << 28) | \
+((__u64)((c) & 0x1) << 36)))
+
+#define V4L2_PIX_FMT_MOD_SYNA_V4H1 \
+   V4L2_PIX_FMT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0, 0, 0, 0)
+
+#define V4L2_PIX_FMT_MOD_SYNA_V4H3P8 \
+   V4L2_PIX_FMT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0, 0, 0, 0)
+
+#define V4L2_PIX_FMT_MOD_SYNA_V4H1_64L4C \
+   V4L2_PIX_FMT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 6, 2, 1)
+
+#define V4L2_PIX_FMT_MOD_SYNA_V4H3P8_64L4C \
+   V4L2_PIX_FMT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 6, 2, 1)
+
+#define V4L2_PIX_FMT_MOD_SYNA_V4H1_128L128C \
+   V4L2_PIX_FMT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 7, 7, 1)
+
+#define V4L2_PIX_FMT_MOD_SYNA_V4H3P8_128L128C \
+   V4L2_PIX_FMT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 7, 7, 1)
 
 /*
  * F O R M A T   E N U M E R A T I O N
-- 
2.17.1



[PATCH v3 2/4] media: videodev2.h: add pixel format modifiers

2022-11-01 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/linux/videodev2.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 29da1f4b4578..d00b2e9c0c54 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -816,6 +816,26 @@ struct v4l2_pix_format {
 #define V4L2_PIX_FMT_FLAG_PREMUL_ALPHA 0x0001
 #define V4L2_PIX_FMT_FLAG_SET_CSC  0x0002
 
+/*
+ *  F O R M A T   M O D I F I E S
+ */
+/* Vendor Ids: */
+#define V4L2_PIX_FMT_RESERVED   ((1ULL << 56) - 1)
+
+#define fourcc_mod_get_vendor(modifier) \
+   (((modifier) >> 56) & 0xff)
+
+#define fourcc_mod_is_vendor(modifier, vendor) \
+   (fourcc_mod_get_vendor(modifier) == V4L2_PIX_FMT_MOD_VENDOR_## vendor)
+
+#define fourcc_mod_code(vendor, val) \
+   __u64)V4L2_PIX_FMT_MOD_VENDOR_## vendor) << 56) | ((val) & 
0x00ffULL))
+
+/* Format Modifier tokens */
+#define V4L2_PIX_FMT_MOD_INVALID  fourcc_mod_code(NONE, V4L2_PIX_FMT_RESERVED)
+#define V4L2_PIX_FMT_MOD_LINEAR   fourcc_mod_code(NONE, 0)
+
+
 /*
  * F O R M A T   E N U M E R A T I O N
  */
-- 
2.17.1



[PATCH v3 1/4] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-11-01 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Memory Traffic Reduction(MTR) is a module in Synaptics
VideoSmart platform could process lossless compression image
and cache the tile memory line.

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

We would allocate the same size of memory for uncompressed
and compressed luma and chroma data, while the compressed buffer
would request two extra planes holding the metadata for
the decompression.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_fourcc.h | 75 +++
 1 file changed, 75 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index bc056f2d537d..ca0b4ca70b36 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -407,6 +407,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b
 
 /* add more to the end as needed */
 
@@ -1507,6 +1508,80 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 #define AMD_FMT_MOD_CLEAR(field) \
(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
 
+/*
+ * Synaptics VideoSmart modifiers
+ *
+ * Tiles could be arranged in Groups of Tiles (GOTs), it is a small tile
+ * within a tile. GOT size and layout varies based on platform and
+ * performance concern. When the compression is applied, it is possible
+ * that we would have two tile type in the GOT, these parameters can't
+ * tell the secondary tile type.
+ *
+ * Besides, an 8 size 4 bytes arrary (32 bytes) would be need to store
+ * some compression parameters for a compression meta data plane.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.
+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.
+ *
+ * 20:20 g GOT packing flag.
+ *
+ * 23:21 - Reserved for future use.  Must be zero.
+ *
+ * 27:24 h log2(horizontal) of bytes, in GOTs.
+ *
+ * 31:28 v log2(vertical) of bytes, in GOTs.
+ *
+ * 35:32 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, g, h, v, c) \
+   fourcc_mod_code(SYNAPTICS, ((__u64)((f) & 0xff) | \
+((__u64)((m) & 0xff) << 8) | \
+((__u64)((p) & 0xf) << 16) | \
+((__u64)((g) & 0x1) << 20) | \
+((__u64)((h) & 0xf) << 24) | \
+((__u64)((v) & 0xf) << 28) | \
+((__u64)((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 7, 7, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 7, 7, 1)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.17.1



[PATCH v3 0/4] Add pixel formats used in Synatpics SoC

2022-11-01 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Those pixel formats are used in Synaptics's VideoSmart series SoCs,
likes VS640, VS680. I just disclose the pixel formats used in the video
codecs and display pipeline this time. Actually any device connected to
the MTR module could support those tiled and compressed pixel formats.

We may not be able to post any drivers here in a short time, the most of
work in this platform is done in the Trusted Execution Environment and
we didn't use the optee event its client framework.

Please notice that, the memory planes needed for video codecs could be
one more than display case. That extra planes in the video codecs is
for the decoding internal usage, it can't append to the luma or chroma
buffer as many other drivers do, because this buffer could be only
accessed by the video codecs itself, it requests a different memory
security attributes. There is not a proper place in v4l2 m2m to allocate
a large size buffer, we don't know when the users won't allocate more
graphics buffers. Although we could allocate it in a step likes
STREAMON, it would lead unusual delaying in starting of video playbacl.

https://synaptics.com/products/multimedia-solutions

Changlog
v3:
There was a mistake in format macro.
Correcting the description of 64L4 variant modifiers.
v2:
The DRM modifiers in the first draft is too simple, it can't tell
the tiles in group attribute in memory layout.
Removing the v4l2 fourcc. Adding a document for the future v4l2 extended
fmt.
v1:
first draft of DRM modifiers
Try to put basic tile formats into v4l2 fourcc

Hsia-Jun(Randy) Li (3):
  drm/fourcc: Add Synaptics VideoSmart tiled modifiers
  media: videodev2.h: add pixel format modifiers
  media: videodev2.h: add Synaptics tiled modifiers

Randy Li (1):
  media: docs: Add Synpatics tile modifiers

 .../media/v4l/pixfmt-synaptics.rst| 80 +++
 .../userspace-api/media/v4l/pixfmt.rst|  1 +
 include/uapi/drm/drm_fourcc.h | 75 +
 include/uapi/linux/videodev2.h| 50 
 4 files changed, 206 insertions(+)
 create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst

-- 
2.17.1



[PATCH v2 2/2] media: docs: Add Synpatics tile modifiers

2022-10-30 Thread Hsia-Jun Li
From: Randy Li 

The pixel formats used in Synpatics video smart platform
are too many. It is impossible to store them in fourcc
namespace.

Signed-off-by: Randy Li 
---
 .../media/v4l/pixfmt-synaptics.rst| 80 +++
 .../userspace-api/media/v4l/pixfmt.rst|  1 +
 2 files changed, 81 insertions(+)
 create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst

diff --git a/Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst 
b/Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst
new file mode 100644
index ..bc86737febb7
--- /dev/null
+++ b/Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
+
+.. _pixfmt-synaptics:
+
+***
+Synaptics Pixel Format Modifiers
+***
+
+The tiled pixel formats in synpatics video smart platform have
+many variants. Here just list the most widely pixel format modifiers
+here. The value here should be the same as the one defined in the
+``drm_fourcc.h`` file.
+
+.. tabularcolumns:: |p{6.6cm}|p{2.2cm}|p{8.5cm}|
+
+.. raw:: latex
+
+   \small
+
+.. _reserved-formats:
+
+.. flat-table:: Synpatics Image Format Modifiers
+   :header-rows:  1
+   :stub-columns: 0
+   :widths:   3 1 4
+
+   * - Identifier
+ - Code
+ - Details
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H1:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H1``
+ - '0x0b000101'
+ - The plain uncompressed 8bits tile format. It sounds similar to
+   Intel's Y-tile. but it won't take any pixel from the next X direction
+   in a tile group. The line stride and image height must be aligned to
+   a multiple of 16. The height of chrominance plane would plus 8.
+   This modifier current would be in conjunction with ``V4L2_PIX_FMT_NV12``
+   or ``V4L2_PIX_FMT_NV12M``.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H3P8:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H3P8``
+ - '0x0b080301'
+ - The plain uncompressed 10bits tile format. It stores pixel in 2D
+   3x4 tiles with a 8bits padding to each of tiles. Then tile is in a
+   128 bytes cache line. This modifier would be in conjunction with
+   ``V4L2_PIX_FMT_NV15``.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H1-64L4C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H1_64L4C``
+ - '0x0b0026010101'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H1``. It stores 64x4 pixels
+   in 1x4 tiles. Each plane would request a meta plane (MTR plane) for
+   decompression. A MTR plane would have a 32 bytes parameters set.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H3P8-64L4C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H3P8_64L4C``
+ - '0x0b0026090301'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H3``. It stores 64x4 pixels
+   in tiles. Each plane would request a meta plane (MTR plane) for
+   decompression. A MTR plane would have a 32 bytes parameters set.
+
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H1-128L128C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H1_128L128C``
+ - '0x0b0077010101'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H1``. It stores 128x128 pixels
+   in 1x4 tiles. Each plane would request a meta plane (MTR plane) for
+   decompression. A MTR plane would have a 32 bytes parameters set.
+   * .. _V4L2-PIX-FMT-MOD-SYNA-V4H3P8-128L128C:
+
+ - ``V4L2_PIX_FMT_MOD_SYNA_V4H3P8_128L128C``
+ - '0x0b0077090301'
+ - Compressed ``V4L2_PIX_FMT_MOD_SYNA_V4H3``. It stores 128x128 pixels
+   in tiles. Each plane would request a meta plane (MTR plane) for
+   decompression. A MTR plane would have a 32 bytes parameters set.
+
+.. raw:: latex
+
+   \normalsize
diff --git a/Documentation/userspace-api/media/v4l/pixfmt.rst 
b/Documentation/userspace-api/media/v4l/pixfmt.rst
index 11dab4a90630..bfe4fdb52b6b 100644
--- a/Documentation/userspace-api/media/v4l/pixfmt.rst
+++ b/Documentation/userspace-api/media/v4l/pixfmt.rst
@@ -36,3 +36,4 @@ see also :ref:`VIDIOC_G_FBUF `.)
 colorspaces
 colorspaces-defs
 colorspaces-details
+   pixfmt-synaptics 
-- 
2.17.1



[PATCH v2 1/2] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-10-30 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Memory Traffic Reduction(MTR) is a module in Synaptics
VideoSmart platform could process lossless compression image
and cache the tile memory line.

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

We would allocate the same size of memory for uncompressed
and compressed luma and chroma data, while the compressed buffer
would request two extra planes holding the metadata for
the decompression.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_fourcc.h | 75 +++
 1 file changed, 75 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index bc056f2d537d..4b587a4694f7 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -407,6 +407,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b
 
 /* add more to the end as needed */
 
@@ -1507,6 +1508,80 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 #define AMD_FMT_MOD_CLEAR(field) \
(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
 
+/*
+ * Synaptics VideoSmart modifiers
+ *
+ * Tiles could be arranged in Groups of Tiles (GOTs), it is a small tile
+ * within a tile. GOT size and layout varies based on platform and
+ * performance concern. When the compression is applied, it is possible
+ * that we would have two tile type in the GOT, these parameters can't
+ * tell the secondary tile type.
+ *
+ * Besides, an 8 size 4 bytes arrary (32 bytes) would be need to store
+ * some compression parameters for a compression meta data plane.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.
+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.
+ *
+ * 20:20 g GOT packing flag.
+ *
+ * 23:21 - Reserved for future use.  Must be zero.
+ *
+ * 27:24 h log2(horizontal) of bytes, in GOTs.
+ *
+ * 31:28 v log2(vertical) of bytes, in GOTs.
+ *
+ * 35:32 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, g, h, v, c) \
+   fourcc_mod_code(SYNAPTICS, (((f) & 0xff) | \
+(((m) & 0xff) << 8) | \
+(((p) & 0xf) << 16) | \
+(((g) & 0x1) << 16) | \
+(((h) & 0xf) << 24) | \
+(((v) & 0xf) << 28) | \
+(((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0, 0, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_64L4_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 6, 2, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1, 7, 7, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_128L128_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1, 7, 7, 1)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.17.1



[PATCH v2 0/2] Add pixel formats used in Synatpics SoC

2022-10-30 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Those pixel formats are used in Synaptics's VideoSmart series SoCs,
likes VS640, VS680. I just disclose the pixel formats used in the video
codecs and display pipeline this time. Actually any device connected to
the MTR module could support those tiled and compressed pixel formats.
The more detail about MTR module could be found in the first patch of
this serial of mail.

We may not be able to post any drivers here in a short time, the most of
work in this platform is done in the Trusted Execution Environment and
we didn't use the optee event its client framework.

Please notice that, the memory planes used for video codecs could be
one more than display case. That extra planes in the video codecs is
for the decoding internally usage, it can't append the luma or chroma
buffer as many other drivers do, because this buffer could be only
accessed by the video codecs itself, it requests a different memory
security attributes. There is not a proper place in v4l2 m2m to allocate
a large size buffer, we don't know when the users won't allocate more
graphics buffers. Although we could allocate it in a step likes
STREAMON, it would lead unusual delaying in starting of video playbacl.

https://synaptics.com/products/multimedia-solutions

Changlog
v2:
The DRM modifiers in the first draft is too simple, it can't tell
the tiles in group attribute in memory layout.
Removing the v4l2 fourcc. Adding a document for the future v4l2 extended
fmt.
v1:
first draft of DRM modifiers
Try to put basic tile formats into v4l2 fourcc

Hsia-Jun(Randy) Li (1):
  drm/fourcc: Add Synaptics VideoSmart tiled modifiers

Randy Li (1):
  media: docs: Add Synpatics tile modifiers

 .../media/v4l/pixfmt-synaptics.rst| 80 +++
 .../userspace-api/media/v4l/pixfmt.rst|  1 +
 include/uapi/drm/drm_fourcc.h | 75 +
 3 files changed, 156 insertions(+)
 create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-synaptics.rst

-- 
2.17.1



Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-23 Thread Hsia-Jun Li




On 8/22/22 22:15, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Le samedi 20 août 2022 à 08:10 +0800, Hsia-Jun Li a écrit :


On 8/20/22 03:17, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 23:44 +0800, Hsia-Jun Li a écrit :


On 8/19/22 23:28, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :

On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:

On 8/18/22 14:06, Tomasz Figa wrote:

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:


From: "Hsia-Jun(Randy) Li" 

The most of detail has been written in the drm.


This patch still needs a description of the format, which should go to
Documentation/userspace-api/media/v4l/.


Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.


Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?


No, users don't need to access them at all. Just they need a different
dma-heap.

You would only get the stateful version of both encoder and decoder.


Shouldn't the motion vectors be stored in a separate V4L2 buffer,
submitted through a different queue then ?


Imho, I believe these should be invisible to users and pooled separately to
reduce the overhead. The number of reference is usually lower then the number of
allocated display buffers.


You can't. The motion vector buffer can't share with the luma and chroma
data planes, nor the data plane for the compression meta data.

You could consider this as a security requirement(the memory region for
the MV could only be accessed by the decoder) or hardware limitation.

It is also not very easy to manage such a large buffer that would change
when the resolution changed.


Your argument are just aiming toward the fact that you should not let the user
allocate these in the first place. They should not be bound to the v4l2 buffer.
Allocate these in your driver, and leave to your user the pixel buffer (and
compress meta) allocation work.


What I want to say is that userspace could allocate buffers then make
the v4l2 decoder import these buffers, but each planes should come from
the right DMA-heaps. Usually the userspace would know better the memory
occupation, it would bring some flexibility here.

Currently, they are another thing bothers me, I need to allocate a small
piece of memory(less than 128KiB) as the compression metadata buffers as
I mentioned here. And these pieces of memory should be located in a
small region, or the performance could be badly hurt, besides, we don't
support IOMMU for this kind of data.

Any idea about assign a small piece of memory from a pre-allocated
memory or select region(I don't think I could reserve them in a
DMA-heap) for a plane in the MMAP type buffer ?


A V4L2 driver should first implement the V4L2 semantic before adding optional
use case like buffer importation. For this reason, your V4L2 driver should know
all the memory requirements and how to allocate that memory. 

Yes, that is what I intend to. Or I just smuggle those things somewhere.
Later on, your

importing driver will have to validate that the userland did it right at
importation. This is to follow V4L2 semantic and security model. If you move
simply trust the userland (gralloc), you are not doing it right.


Yes, that is what I try to describe in the other thread
https://lore.kernel.org/linux-media/b4b3306f-c3b4-4594-bdf9-4bbc59c62...@soulik.info/
I don't have the problem that let the userspace decided where and how to 
allocate the memory, but we need a new protocol here to let the 
userspace do it right.


Besides, I am not very satisfied with the dynamic resolution change
steps if I understand it correct. Buffers reallocation should happen
when we receive the event not until the drain is done. A resolution
rising is very common when you are playing a network stream, it would be
better that the decoder decided how many buffers it need for the
previous sequence while the userspace could reallocate the reset of
buffers in the CAPTURE queue.

Other driver handle this just fine, if your v4l2 driver implement the v4l2
resolution change mechanism, is should be very simple to manage.


This is a limitation of the queue design of V4L2. While streaming the buffers
associated with the queue must currently be large enough to support the selec

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-23 Thread Hsia-Jun Li




On 8/23/22 14:05, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


On Sat, Aug 20, 2022 at 12:44 AM Hsia-Jun Li  wrote:




On 8/19/22 23:28, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :

On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:

On 8/18/22 14:06, Tomasz Figa wrote:

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:


From: "Hsia-Jun(Randy) Li" 

The most of detail has been written in the drm.


This patch still needs a description of the format, which should go to
Documentation/userspace-api/media/v4l/.


Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.


Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?


No, users don't need to access them at all. Just they need a different
dma-heap.

You would only get the stateful version of both encoder and decoder.


Shouldn't the motion vectors be stored in a separate V4L2 buffer,
submitted through a different queue then ?


Imho, I believe these should be invisible to users and pooled separately to
reduce the overhead. The number of reference is usually lower then the number of
allocated display buffers.


You can't. The motion vector buffer can't share with the luma and chroma
data planes, nor the data plane for the compression meta data.


I believe what Nicolas is suggesting is to just keep the MV buffer
handling completely separate from video buffers. Just keep a map
between frame buffer and MV buffer in the driver and use the right
buffer when triggering a decode.



You could consider this as a security requirement(the memory region for
the MV could only be accessed by the decoder) or hardware limitation.

It is also not very easy to manage such a large buffer that would change
when the resolution changed.


How does it differ from managing additional planes of video buffers?
I should say I am not against his suggestion if I could make a DMA-heap 
v4l2 allocator merge into kernel in the future. Although I think we need 
two heaps here one for the normal video and one for the secure video, I 
don't have much idea on how to determine whether we are decoding a 
secure or non-secure video here (The design here is that the kernel 
didn't know, only hardware and TEE care about that).


Just one place that I think it would be more simple for me to manage the 
buffer here. When the decoder goes to the drain stage, then the MV 
buffer goes when the data buffer goes and create when the data buffer 
creates.
I know that is not a lot of work to doing the mapping between them. I 
just need to convince the other accepting that do not allocator the MV 
buffer outside.


Best regards,
Tomasz




Signed-off-by: Hsia-Jun(Randy) Li 
---
drivers/media/v4l2-core/v4l2-common.c | 1 +
drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
include/uapi/linux/videodev2.h| 2 ++
3 files changed, 5 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-common.c 
b/drivers/media/v4l2-core/v4l2-common.c
index e0fbe6ba4b6c..f645278b3055 100644
--- a/drivers/media/v4l2-core/v4l2-common.c
+++ b/drivers/media/v4l2-core/v4l2-common.c
@@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
   { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
   { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
   { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
+   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, 
.hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
   };
   unsigned int i;

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index e6fd355a2e92..8f65964aff08 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
   case V4L2_PIX_FMT_MT21C:descr = "Mediatek Compressed 
Format"; break;
   case V4L2_PIX_FMT_

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-19 Thread Hsia-Jun Li




On 8/20/22 03:17, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 23:44 +0800, Hsia-Jun Li a écrit :


On 8/19/22 23:28, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :

On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:

On 8/18/22 14:06, Tomasz Figa wrote:

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:


From: "Hsia-Jun(Randy) Li" 

The most of detail has been written in the drm.


This patch still needs a description of the format, which should go to
Documentation/userspace-api/media/v4l/.


Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.


Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?


No, users don't need to access them at all. Just they need a different
dma-heap.

You would only get the stateful version of both encoder and decoder.


Shouldn't the motion vectors be stored in a separate V4L2 buffer,
submitted through a different queue then ?


Imho, I believe these should be invisible to users and pooled separately to
reduce the overhead. The number of reference is usually lower then the number of
allocated display buffers.


You can't. The motion vector buffer can't share with the luma and chroma
data planes, nor the data plane for the compression meta data.

You could consider this as a security requirement(the memory region for
the MV could only be accessed by the decoder) or hardware limitation.

It is also not very easy to manage such a large buffer that would change
when the resolution changed.


Your argument are just aiming toward the fact that you should not let the user
allocate these in the first place. They should not be bound to the v4l2 buffer.
Allocate these in your driver, and leave to your user the pixel buffer (and
compress meta) allocation work.

What I want to say is that userspace could allocate buffers then make 
the v4l2 decoder import these buffers, but each planes should come from 
the right DMA-heaps. Usually the userspace would know better the memory 
occupation, it would bring some flexibility here.


Currently, they are another thing bothers me, I need to allocate a small 
piece of memory(less than 128KiB) as the compression metadata buffers as 
I mentioned here. And these pieces of memory should be located in a 
small region, or the performance could be badly hurt, besides, we don't 
support IOMMU for this kind of data.


Any idea about assign a small piece of memory from a pre-allocated 
memory or select region(I don't think I could reserve them in a 
DMA-heap) for a plane in the MMAP type buffer ?


Besides, I am not very satisfied with the dynamic resolution change 
steps if I understand it correct. Buffers reallocation should happen 
when we receive the event not until the drain is done. A resolution 
rising is very common when you are playing a network stream, it would be 
better that the decoder decided how many buffers it need for the 
previous sequence while the userspace could reallocate the reset of 
buffers in the CAPTURE queue.

Other driver handle this just fine, if your v4l2 driver implement the v4l2
resolution change mechanism, is should be very simple to manage.




Signed-off-by: Hsia-Jun(Randy) Li 
---
drivers/media/v4l2-core/v4l2-common.c | 1 +
drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
include/uapi/linux/videodev2.h| 2 ++
3 files changed, 5 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-common.c 
b/drivers/media/v4l2-core/v4l2-common.c
index e0fbe6ba4b6c..f645278b3055 100644
--- a/drivers/media/v4l2-core/v4l2-common.c
+++ b/drivers/media/v4l2-core/v4l2-common.c
@@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
   { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
   { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
   { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
+   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, 
.hdiv = 2, .vdiv =

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-19 Thread Hsia-Jun Li




On 8/19/22 23:28, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :

On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:

On 8/18/22 14:06, Tomasz Figa wrote:

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:


From: "Hsia-Jun(Randy) Li" 

The most of detail has been written in the drm.


This patch still needs a description of the format, which should go to
Documentation/userspace-api/media/v4l/.


Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.


Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?


No, users don't need to access them at all. Just they need a different
dma-heap.

You would only get the stateful version of both encoder and decoder.


Shouldn't the motion vectors be stored in a separate V4L2 buffer,
submitted through a different queue then ?


Imho, I believe these should be invisible to users and pooled separately to
reduce the overhead. The number of reference is usually lower then the number of
allocated display buffers.

You can't. The motion vector buffer can't share with the luma and chroma 
data planes, nor the data plane for the compression meta data.


You could consider this as a security requirement(the memory region for 
the MV could only be accessed by the decoder) or hardware limitation.


It is also not very easy to manage such a large buffer that would change 
when the resolution changed.



Signed-off-by: Hsia-Jun(Randy) Li 
---
   drivers/media/v4l2-core/v4l2-common.c | 1 +
   drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
   include/uapi/linux/videodev2.h| 2 ++
   3 files changed, 5 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-common.c 
b/drivers/media/v4l2-core/v4l2-common.c
index e0fbe6ba4b6c..f645278b3055 100644
--- a/drivers/media/v4l2-core/v4l2-common.c
+++ b/drivers/media/v4l2-core/v4l2-common.c
@@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
  { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
  { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
  { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
+   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, 
.hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
  };
  unsigned int i;

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index e6fd355a2e92..8f65964aff08 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
  case V4L2_PIX_FMT_MT21C:descr = "Mediatek Compressed 
Format"; break;
  case V4L2_PIX_FMT_QC08C:descr = "QCOM Compressed 8-bit 
Format"; break;
  case V4L2_PIX_FMT_QC10C:descr = "QCOM Compressed 10-bit 
Format"; break;
+   case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics Compressed 8-bit 
tiled Format";break;
+   case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = "Synaptics 
Compressed 10-bit tiled Format";break;
  default:
  if (fmt->description[0])
  return;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 01e630f2ec78..7e928cb69e7c 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -661,6 +661,8 @@ struct v4l2_pix_format {
   #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12  
Y/CbCr 4:2:0 16x16 tiles */
   #define V4L2_PIX_FMT_NV12M_8L128  v4l2_fourcc('N', 'A', '1', '2') /* 
Y/CbCr 4:2:0 8x128 tiles */
   #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', '2') /* 
Y/CbCr 4:2:0 10-bit 8x128 tiles */
+#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 12  
Y/CbCr 4:2:

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-18 Thread Hsia-Jun Li




On 8/19/22 07:13, Laurent Pinchart wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:

On 8/18/22 14:06, Tomasz Figa wrote:

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:


From: "Hsia-Jun(Randy) Li" 

The most of detail has been written in the drm.


This patch still needs a description of the format, which should go to
Documentation/userspace-api/media/v4l/.


I just want t tell people we need an extra plane for MVTP and I don't 
have enough space here to place all the pixel formats.


Besides, I was thinking a modifer in v4l2_ext_pix_format is not enough.
Let's take a compression NV12 tile format as an example, we need
1. luma planes
2. chroma planes
3. compression meta data for luma
4. compression meta data for chroma
5. mvtp
and a single data planer version would be
1. luma and chroma data
2. compression meta data
3. mvtp

You see, we actually have 3 kind of data here(not including the 
compression options that I am thinking of storing them somewhere else).



Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.


Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?


No, users don't need to access them at all. Just they need a different
dma-heap.

You would only get the stateful version of both encoder and decoder.


Shouldn't the motion vectors be stored in a separate V4L2 buffer,
submitted through a different queue then ?

Yes, I like that.
Proposal: A third buffer type for the reconstruction buffers in V4L2 M2M 
encoder

https://www.spinics.net/lists/linux-media/msg214565.html

Although the major usage for the decoder here is producing the non-tile 
buffers, the decoder of us could product the NV12 or the pixel formats 
that GPU likes, but it must happen at the same time a frame is decoded.
Still the reference buffer or we call them the real decoded frame would 
stay in a tiled format. More than one queue would be need here.



Signed-off-by: Hsia-Jun(Randy) Li 
---
   drivers/media/v4l2-core/v4l2-common.c | 1 +
   drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
   include/uapi/linux/videodev2.h| 2 ++
   3 files changed, 5 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-common.c 
b/drivers/media/v4l2-core/v4l2-common.c
index e0fbe6ba4b6c..f645278b3055 100644
--- a/drivers/media/v4l2-core/v4l2-common.c
+++ b/drivers/media/v4l2-core/v4l2-common.c
@@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
  { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
  { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
  { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
+   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, 
.hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
  };
  unsigned int i;

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index e6fd355a2e92..8f65964aff08 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
  case V4L2_PIX_FMT_MT21C:descr = "Mediatek Compressed 
Format"; break;
  case V4L2_PIX_FMT_QC08C:descr = "QCOM Compressed 8-bit 
Format"; break;
  case V4L2_PIX_FMT_QC10C:descr = "QCOM Compressed 10-bit 
Format"; break;
+   case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics Compressed 8-bit 
tiled Format";break;
+   case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = "Synaptics 
Compressed 10-bit tiled Format";break;
  default:
  if (fmt->description[0])
  return;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 01e630f2ec78..7e928cb69e7c 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -661,6 +661,8 @@ struct v4l2_pix_format {
   #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2'

Re: [PATCH 1/2] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-08-18 Thread Hsia-Jun Li




On 8/19/22 07:16, Laurent Pinchart wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Hsia-Jun,

Thank you for the patch.

On Tue, Aug 09, 2022 at 12:27:49AM +0800, Hsia-Jun Li wrote:

From: "Hsia-Jun(Randy) Li" 

Memory Traffic Reduction(MTR) is a module in Synaptics
VideoSmart platform could process lossless compression image
and cache the tile memory line.

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

We would allocate the same size of memory for uncompressed
and compressed luma and chroma data, while the compressed buffer
would request two extra planes holding the metadata for
the decompression.

The reason why we need to allocate the same size of memory for
the compressed frame:
1. The compression ratio is not fixed and differnt platforms could
use a different compression protocol. These protocols are complete
vendor proprietaries, the other device won't be able to use them.
It is not necessary to define the version of them here.

2. Video codec could discard the compression attribute when the
intra block copy applied to this frame. It would waste lots of
time on re-allocation.

I am wondering if it is better to add an addtional plane property to
describe whether the current framebuffer is compressed?
While the compression flag is still a part of format modifier,
because it would have two extra meta data planes in the compression
version.


Would it possible to show an example of how these modifiers apply to a
particular format (such as NV12 for instance) ? Otherwise I'm having
trouble understanding how they actually workThis version didn't contains the big tile information as I was 
considering moving them into compression options. The uncompressed tile 
in big tile pixel formats do exist, but I never see them being used.

Anyway, let me just try to describe the most simple tile here.
For example a NV12 tile format(V4H1)
(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1)
(2, 0) (2, 1)
(3, 0)
(0, 0) (1, 0) (2, 0) (3, 0) is a tile, then (0, 1)..(3, 1) after that.
(4, 0) is after (3, y).

For example a NV15 tiled format(V4H3P8),
(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1)
(2, 0) (2, 1)
(3, 0)

(0, 0) (1, 0) (2, 0) (3, 0) (0, 1) (1, 1) (2, 1) ... (3, 2) are 120bits, 
then fill it with an extra 8 bits, (0, 3) would be placed that the first 
128bits in the memory.



Signed-off-by: Hsia-Jun(Randy) Li 
---
  include/uapi/drm/drm_fourcc.h | 49 +++
  1 file changed, 49 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 0206f812c569..b67884e8bc69 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -381,6 +381,7 @@ extern "C" {
  #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
  #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b

  /* add more to the end as needed */

@@ -1452,6 +1453,54 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
  #define AMD_FMT_MOD_CLEAR(field) \
   (~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))

+/*
+ * Synaptics VideoSmart modifiers
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.
+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.
+ *
+ * 35:20 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, c) \
+ fourcc_mod_code(SYNAPTICS, (((f) & 0xff) | \
+  (((m) & 0xff) << 8) | \
+  (((p) & 0xf) << 16) | \
+  (((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+ DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+ DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_COMPRESSED \
+ DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_COMPRESSED \
+ DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1)
+
  #if defined(__cplusplus)
  }
  #endif


--
Regards,

Laurent Pinchart


--
Hsia-Jun(Randy) Li


Re: [PATCH 0/2] Add pixel formats used in Synatpics SoC

2022-08-18 Thread Hsia-Jun Li




On 8/19/22 07:08, Laurent Pinchart wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Hsia-Jun,

On Tue, Aug 09, 2022 at 12:27:48AM +0800, Hsia-Jun Li wrote:

From: "Hsia-Jun(Randy) Li" 

Those pixel formats are used in Synaptics's VideoSmart series SoCs,
likes VS640, VS680.  I just disclose the pixel formats used in the video
codecs and display pipeline this time. Actually any device with a MTR
module could support those tiled and compressed pixel formats. The more
detail about MTR module could be found in the first patch of this serial
of mail.

We may not be able to post any drivers here in a short time, the most of
work in this platform is done in the Trusted Execution Environment and
we didn't use the optee framework.


Is that so for the display side too, or only for the video decoder ?
These pixel formats are using in both video decoder and display(Not the 
GPU). Besides, ISP and NPU in vs680 support some patterns of them.


Please notice that after I reviewed the compression options of our 
platform, I found using modifies are not enough to store all the 
compression options here. I would post a second version here.


I may use the same way that Intel, I would try to disclose more details 
here, hoping we could find a better way to describe them.



Please notice that, the memory planes used for video codecs would be 5
when the compression is invoked while it would be 4 for display, the
extra planes in the video codecs is for the decoding internally usage,
it can't append the luma or chroma buffer as many other drivers do,
because this buffer could be only accessed by the video codecs itself,
it requests a different memory security attributes. Any other reason is
described in the v4l pixel formats's patch. I don't know whether a
different numbers of memory planes between drm and v4l2 is acceptable.


I don't think that's a problem as such, as long as both the V4L2 and DRM
formats make sense on their own.


I only posted the compression fourcc for the v4l2, because it is really
hard to put the uncompression version of pixel formats under the fourcc.
I would be better that we could have something likes format modifers in
drm here.


Agreed, we need modifiers support in V4L2. This has been discussed
previously ([1]), and a proposal ([2]) has been submitted two years ago,
it needs to be revived.
Thank you, I have found those v4l2_ext_pix_format, I would relay my 
comment in the email that posting synaptics v4l2 pixel formats.


[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-2Dmedia_20170821155203.GB38943-40e107564-2Dlin.cambridge.arm.com_&d=DwIBaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=Ktu-e-R1Mn89Laxioh6RlL6Y2aycZ9NrJTIyONaDdRQvnlv-Nd570KldQ51vmigK&s=_7eMTIYwWUOWkXijcRfotLJlpR7G5yx-ZXuTwh9uZw4&e=
[2] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-2Dmedia_20200804192939.2251988-2D1-2Dhelen.koike-40collabora.com_&d=DwIBaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=Ktu-e-R1Mn89Laxioh6RlL6Y2aycZ9NrJTIyONaDdRQvnlv-Nd570KldQ51vmigK&s=f1dbc5ciUeIkO6VMtlRuEvXqJad2NsoaDBFyNUsSdpg&e=


https://synaptics.com/products/multimedia-solutions

Hsia-Jun(Randy) Li (2):
   drm/fourcc: Add Synaptics VideoSmart tiled modifiers
   [WIP]: media: Add Synaptics compressed tiled format

  drivers/media/v4l2-core/v4l2-common.c |  1 +
  drivers/media/v4l2-core/v4l2-ioctl.c  |  2 ++
  include/uapi/drm/drm_fourcc.h | 49 +++
  include/uapi/linux/videodev2.h|  2 ++
  4 files changed, 54 insertions(+)


--
Regards,

Laurent Pinchart


--
Hsia-Jun(Randy) Li


Re: [PATCH 1/2] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-08-17 Thread Hsia-Jun Li




On 8/18/22 14:07, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Randy,

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:


From: "Hsia-Jun(Randy) Li" 

Memory Traffic Reduction(MTR) is a module in Synaptics
VideoSmart platform could process lossless compression image
and cache the tile memory line.

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

We would allocate the same size of memory for uncompressed
and compressed luma and chroma data, while the compressed buffer
would request two extra planes holding the metadata for
the decompression.

The reason why we need to allocate the same size of memory for
the compressed frame:
1. The compression ratio is not fixed and differnt platforms could
use a different compression protocol. These protocols are complete
vendor proprietaries, the other device won't be able to use them.
It is not necessary to define the version of them here.

2. Video codec could discard the compression attribute when the
intra block copy applied to this frame. It would waste lots of
time on re-allocation.

I am wondering if it is better to add an addtional plane property to
describe whether the current framebuffer is compressed?
While the compression flag is still a part of format modifier,
because it would have two extra meta data planes in the compression
version.

Signed-off-by: Hsia-Jun(Randy) Li 
---
  include/uapi/drm/drm_fourcc.h | 49 +++
  1 file changed, 49 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 0206f812c569..b67884e8bc69 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -381,6 +381,7 @@ extern "C" {
  #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
  #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b

  /* add more to the end as needed */

@@ -1452,6 +1453,54 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
  #define AMD_FMT_MOD_CLEAR(field) \
 (~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))

+/*
+ * Synaptics VideoSmart modifiers
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.


I guess 2..255 are all reserved for future use?

Likes the Intel has y-tile and x-tile.



+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.


What is the meaning of "scan" and "whole scan" here?

For example a NV15 tiled format,
(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1)
(2, 0) (2, 1)
(3, 0)

(0, 0) (1, 0) (2, 0) (3, 0) (0, 1) (1, 1) (2, 1) ... (3, 2) are 120bits, 
then fill it with an extra 8 bits, (0, 3) would be placed that the first 
128bits in the memory.


Besides, I found even with four 64bits modifiers, it is not possible to 
store all the compression options we need there. I need to borrow what 
Intel did, hiding the tile flags somewhere(I would not use the userspace 
gmmlib way, our codecs is based on v4l2, even drm framework may not be a 
good place).


Although the compression options could affect the memory layout but 
userspace really don't need to know that.


Best regards,
Tomasz


+ *
+ * 35:20 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, c) \
+   fourcc_mod_code(SYNAPTICS, (((f) & 0xff) | \
+(((m) & 0xff) << 8) | \
+(((p) & 0xf) << 16) | \
+(((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1)
+
  #if defined(__cplusplus)
  }
  #endif
--
2.17.1



--
Hsia-Jun(Randy) Li


Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-17 Thread Hsia-Jun Li




On 8/18/22 14:06, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Randy,

On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:


From: "Hsia-Jun(Randy) Li" 

The most of detail has been written in the drm.
Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.


Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?

No, users don't need to access them at all. Just they need a different 
dma-heap.


You would only get the stateful version of both encoder and decoder.

Best regards,
Tomasz



Signed-off-by: Hsia-Jun(Randy) Li 
---
  drivers/media/v4l2-core/v4l2-common.c | 1 +
  drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
  include/uapi/linux/videodev2.h| 2 ++
  3 files changed, 5 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-common.c 
b/drivers/media/v4l2-core/v4l2-common.c
index e0fbe6ba4b6c..f645278b3055 100644
--- a/drivers/media/v4l2-core/v4l2-common.c
+++ b/drivers/media/v4l2-core/v4l2-common.c
@@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
 { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
 { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
 { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
+   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, 
.hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
 };
 unsigned int i;

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index e6fd355a2e92..8f65964aff08 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
 case V4L2_PIX_FMT_MT21C:descr = "Mediatek Compressed 
Format"; break;
 case V4L2_PIX_FMT_QC08C:descr = "QCOM Compressed 8-bit 
Format"; break;
 case V4L2_PIX_FMT_QC10C:descr = "QCOM Compressed 10-bit 
Format"; break;
+   case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics Compressed 8-bit 
tiled Format";break;
+   case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = "Synaptics 
Compressed 10-bit tiled Format";break;
 default:
 if (fmt->description[0])
 return;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 01e630f2ec78..7e928cb69e7c 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -661,6 +661,8 @@ struct v4l2_pix_format {
  #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12  
Y/CbCr 4:2:0 16x16 tiles */
  #define V4L2_PIX_FMT_NV12M_8L128  v4l2_fourcc('N', 'A', '1', '2') /* 
Y/CbCr 4:2:0 8x128 tiles */
  #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', '2') /* 
Y/CbCr 4:2:0 10-bit 8x128 tiles */
+#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 12  
Y/CbCr 4:2:0 tiles */
+#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0')   /* 12  
Y/CbCr 4:2:0 10-bits tiles */

  /* Bayer formats - see 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.siliconimaging.com_RGB-2520Bayer.htm&d=DwIBaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=vmpysqneiHK3UXcq6UOewdMwobFa70zKB3RuOgYT02aFw9fCs6qd7j-U1sYSey79&s=yblzF1GwanMEJFC3yt9nBAQjaaAEJKKlNgj4k64v5eE&e=
   */
  #define V4L2_PIX_FMT_SBGGR8  v4l2_fourcc('B', 'A', '8', '1') /*  8  BGBG.. 
GRGR.. */
--
2.17.1



--
Hsia-Jun(Randy) Li


Re: [PATCH] [Draft]: media: videobuf2-dma-heap: add a vendor defined memory runtine

2022-08-17 Thread Hsia-Jun Li




On 8/18/22 13:50, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi Randy,

Sorry for the late reply, I went on vacation last week.

On Sun, Aug 7, 2022 at 12:23 AM Hsia-Jun Li  wrote:




On 8/5/22 18:09, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


On Tue, Aug 2, 2022 at 9:21 PM ayaka  wrote:


Sorry, the previous one contains html data.


On Aug 2, 2022, at 3:33 PM, Tomasz Figa  wrote:

On Mon, Aug 1, 2022 at 8:43 PM ayaka  wrote:

Sent from my iPad

On Aug 1, 2022, at 5:46 PM, Tomasz Figa  wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.

On Mon, Aug 1, 2022 at 3:44 PM Hsia-Jun Li  wrote:

On 8/1/22 14:19, Tomasz Figa wrote:

Hello Tomasz

?Hi Randy,
On Mon, Aug 1, 2022 at 5:21 AM  wrote:

From: Randy Li 
This module is still at a early stage, I wrote this for showing what
APIs we need here.
Let me explain why we need such a module here.
If you won't allocate buffers from a V4L2 M2M device, this module
may not be very useful. I am sure the most of users won't know a
device would require them allocate buffers from a DMA-Heap then
import those buffers into a V4L2's queue.
Then the question goes back to why DMA-Heap. From the Android's
description, we know it is about the copyright's DRM.
When we allocate a buffer in a DMA-Heap, it may register that buffer
in the trusted execution environment so the firmware which is running
or could only be acccesed from there could use that buffer later.
The answer above leads to another thing which is not done in this
version, the DMA mapping. Although in some platforms, a DMA-Heap
responses a IOMMU device as well. For the genernal purpose, we would
be better assuming the device mapping should be done for each device
itself. The problem here we only know alloc_devs in those DMAbuf
methods, which are DMA-heaps in my design, the device from the queue
is not enough, a plane may requests another IOMMU device or table
for mapping.
Signed-off-by: Randy Li 
---
drivers/media/common/videobuf2/Kconfig|   6 +
drivers/media/common/videobuf2/Makefile   |   1 +
.../common/videobuf2/videobuf2-dma-heap.c | 350 ++
include/media/videobuf2-dma-heap.h|  30 ++
4 files changed, 387 insertions(+)
create mode 100644 drivers/media/common/videobuf2/videobuf2-dma-heap.c
create mode 100644 include/media/videobuf2-dma-heap.h

First of all, thanks for the series.
Possibly a stupid question, but why not just allocate the DMA-bufs
directly from the DMA-buf heap device in the userspace and just import
the buffers to the V4L2 device using V4L2_MEMORY_DMABUF?

Sometimes the allocation policy could be very complex, let's suppose a
multiple planes pixel format enabling with frame buffer compression.
Its luma, chroma data could be allocated from a pool which is delegated
for large buffers while its metadata would come from a pool which many
users could take some few slices from it(likes system pool).
Then when we have a new users knowing nothing about this platform, if we
just configure the alloc_devs in each queues well. The user won't need
to know those complex rules.
The real situation could be more complex, Samsung MFC's left and right
banks could be regarded as two pools, many devices would benefit from
this either from the allocation times or the security buffers policy.
In our design, when we need to do some security decoding(DRM video),
codecs2 would allocate buffers from the pool delegated for that. While
the non-DRM video, users could not care about this.

I'm a little bit surprised about this, because on Android all the
graphics buffers are allocated from the system IAllocator and imported
to the specific devices.

In the non-tunnel mode, yes it is. While the tunnel mode is completely vendor 
defined. Neither HWC nor codec2 cares about where the buffers coming from, you 
could do what ever you want.
Besides there are DRM video in GNU Linux platform, I heard the webkit has made 
huge effort here and Playready is one could work in non-Android Linux.

Would it make sense to instead extend the UAPI to expose enough
information about the allocation requirements to the userspace, so it
can allocate correctly?

Yes, it could. But as I said it would need the users to do more works.

My reasoning here is that it's not a driver's decision to allocate
from a DMA-buf heap (and which one) or not. It's the userspace which
knows that, based on the specific use case that it wants to fulfill.

Although I would like to let the users decide that, users just can’t do that 
which would violate the security rules in some platforms.
For example,  video codec and display device could 

[PATCH 1/2] drm/fourcc: Add Synaptics VideoSmart tiled modifiers

2022-08-08 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Memory Traffic Reduction(MTR) is a module in Synaptics
VideoSmart platform could process lossless compression image
and cache the tile memory line.

Those modifiers only record the parameters would effort pixel
layout or memory layout. Whether physical memory page mapping
is used is not a part of format.

We would allocate the same size of memory for uncompressed
and compressed luma and chroma data, while the compressed buffer
would request two extra planes holding the metadata for
the decompression.

The reason why we need to allocate the same size of memory for
the compressed frame:
1. The compression ratio is not fixed and differnt platforms could
use a different compression protocol. These protocols are complete
vendor proprietaries, the other device won't be able to use them.
It is not necessary to define the version of them here.

2. Video codec could discard the compression attribute when the
intra block copy applied to this frame. It would waste lots of
time on re-allocation.

I am wondering if it is better to add an addtional plane property to
describe whether the current framebuffer is compressed?
While the compression flag is still a part of format modifier,
because it would have two extra meta data planes in the compression
version.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 include/uapi/drm/drm_fourcc.h | 49 +++
 1 file changed, 49 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 0206f812c569..b67884e8bc69 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -381,6 +381,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_ARM 0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
 #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
+#define DRM_FORMAT_MOD_VENDOR_SYNAPTICS 0x0b
 
 /* add more to the end as needed */
 
@@ -1452,6 +1453,54 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 #define AMD_FMT_MOD_CLEAR(field) \
(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
 
+/*
+ * Synaptics VideoSmart modifiers
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  7:0  f Scan direction description.
+ *
+ *   0 = Invalid
+ *   1 = V4, the scan would always start from vertical for 4 pixel
+ *   then move back to the start pixel of the next horizontal
+ *   direction.
+ *   2 = Reserved for future use.
+ *
+ * 15:8  m The times of pattern repeat in the right angle direction from
+ * the first scan direction.
+ *
+ * 19:16 p The padding bits after the whole scan, could be zero.
+ *
+ * 35:20 - Reserved for future use.  Must be zero.
+ *
+ * 36:36 c Compression flag.
+ *
+ * 55:37 - Reserved for future use.  Must be zero.
+ *
+ */
+
+#define DRM_FORMAT_MOD_SYNA_V4_TILED   fourcc_mod_code(SYNAPTICS, 1)
+
+#define DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(f, m, p, c) \
+   fourcc_mod_code(SYNAPTICS, (((f) & 0xff) | \
+(((m) & 0xff) << 8) | \
+(((p) & 0xf) << 16) | \
+(((c) & 0x1) << 36)))
+
+#define DRM_FORMAT_MOD_SYNA_V4H1 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8 \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 0)
+
+#define DRM_FORMAT_MOD_SYNA_V4H1_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 1, 0, 1)
+
+#define DRM_FORMAT_MOD_SYNA_V4H3P8_COMPRESSED \
+   DRM_FORMAT_MOD_SYNA_MTR_LINEAR_2D(1, 3, 8, 1)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.17.1



[PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-08 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

The most of detail has been written in the drm.
Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.

Signed-off-by: Hsia-Jun(Randy) Li 
---
 drivers/media/v4l2-core/v4l2-common.c | 1 +
 drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
 include/uapi/linux/videodev2.h| 2 ++
 3 files changed, 5 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-common.c 
b/drivers/media/v4l2-core/v4l2-common.c
index e0fbe6ba4b6c..f645278b3055 100644
--- a/drivers/media/v4l2-core/v4l2-common.c
+++ b/drivers/media/v4l2-core/v4l2-common.c
@@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
{ .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
{ .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
{ .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, 
.hdiv = 1, .vdiv = 1 },
+   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, 
.hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
};
unsigned int i;
 
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index e6fd355a2e92..8f65964aff08 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_PIX_FMT_MT21C:descr = "Mediatek Compressed 
Format"; break;
case V4L2_PIX_FMT_QC08C:descr = "QCOM Compressed 8-bit 
Format"; break;
case V4L2_PIX_FMT_QC10C:descr = "QCOM Compressed 10-bit 
Format"; break;
+   case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics Compressed 
8-bit tiled Format";break;
+   case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = "Synaptics 
Compressed 10-bit tiled Format";break;
default:
if (fmt->description[0])
return;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 01e630f2ec78..7e928cb69e7c 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -661,6 +661,8 @@ struct v4l2_pix_format {
 #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12  
Y/CbCr 4:2:0 16x16 tiles */
 #define V4L2_PIX_FMT_NV12M_8L128  v4l2_fourcc('N', 'A', '1', '2') /* 
Y/CbCr 4:2:0 8x128 tiles */
 #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', '2') /* 
Y/CbCr 4:2:0 10-bit 8x128 tiles */
+#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 12  
Y/CbCr 4:2:0 tiles */
+#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0')   /* 12  
Y/CbCr 4:2:0 10-bits tiles */
 
 /* Bayer formats - see http://www.siliconimaging.com/RGB%20Bayer.htm */
 #define V4L2_PIX_FMT_SBGGR8  v4l2_fourcc('B', 'A', '8', '1') /*  8  BGBG.. 
GRGR.. */
-- 
2.17.1



[PATCH 0/2] Add pixel formats used in Synatpics SoC

2022-08-08 Thread Hsia-Jun Li
From: "Hsia-Jun(Randy) Li" 

Those pixel formats are used in Synaptics's VideoSmart series SoCs,
likes VS640, VS680.  I just disclose the pixel formats used in the video
codecs and display pipeline this time. Actually any device with a MTR
module could support those tiled and compressed pixel formats. The more
detail about MTR module could be found in the first patch of this serial
of mail.

We may not be able to post any drivers here in a short time, the most of
work in this platform is done in the Trusted Execution Environment and
we didn't use the optee framework.

Please notice that, the memory planes used for video codecs would be 5
when the compression is invoked while it would be 4 for display, the
extra planes in the video codecs is for the decoding internally usage,
it can't append the luma or chroma buffer as many other drivers do,
because this buffer could be only accessed by the video codecs itself,
it requests a different memory security attributes. Any other reason is
described in the v4l pixel formats's patch. I don't know whether a
different numbers of memory planes between drm and v4l2 is acceptable.

I only posted the compression fourcc for the v4l2, because it is really
hard to put the uncompression version of pixel formats under the fourcc.
I would be better that we could have something likes format modifers in
drm here.

https://synaptics.com/products/multimedia-solutions

Hsia-Jun(Randy) Li (2):
  drm/fourcc: Add Synaptics VideoSmart tiled modifiers
  [WIP]: media: Add Synaptics compressed tiled format

 drivers/media/v4l2-core/v4l2-common.c |  1 +
 drivers/media/v4l2-core/v4l2-ioctl.c  |  2 ++
 include/uapi/drm/drm_fourcc.h | 49 +++
 include/uapi/linux/videodev2.h|  2 ++
 4 files changed, 54 insertions(+)

-- 
2.17.1



Re: [PATCH] [Draft]: media: videobuf2-dma-heap: add a vendor defined memory runtine

2022-08-07 Thread Hsia-Jun Li




On 8/5/22 18:09, Tomasz Figa wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


On Tue, Aug 2, 2022 at 9:21 PM ayaka  wrote:


Sorry, the previous one contains html data.


On Aug 2, 2022, at 3:33 PM, Tomasz Figa  wrote:

On Mon, Aug 1, 2022 at 8:43 PM ayaka  wrote:

Sent from my iPad

On Aug 1, 2022, at 5:46 PM, Tomasz Figa  wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.

On Mon, Aug 1, 2022 at 3:44 PM Hsia-Jun Li  wrote:

On 8/1/22 14:19, Tomasz Figa wrote:

Hello Tomasz

?Hi Randy,
On Mon, Aug 1, 2022 at 5:21 AM  wrote:

From: Randy Li 
This module is still at a early stage, I wrote this for showing what
APIs we need here.
Let me explain why we need such a module here.
If you won't allocate buffers from a V4L2 M2M device, this module
may not be very useful. I am sure the most of users won't know a
device would require them allocate buffers from a DMA-Heap then
import those buffers into a V4L2's queue.
Then the question goes back to why DMA-Heap. From the Android's
description, we know it is about the copyright's DRM.
When we allocate a buffer in a DMA-Heap, it may register that buffer
in the trusted execution environment so the firmware which is running
or could only be acccesed from there could use that buffer later.
The answer above leads to another thing which is not done in this
version, the DMA mapping. Although in some platforms, a DMA-Heap
responses a IOMMU device as well. For the genernal purpose, we would
be better assuming the device mapping should be done for each device
itself. The problem here we only know alloc_devs in those DMAbuf
methods, which are DMA-heaps in my design, the device from the queue
is not enough, a plane may requests another IOMMU device or table
for mapping.
Signed-off-by: Randy Li 
---
drivers/media/common/videobuf2/Kconfig|   6 +
drivers/media/common/videobuf2/Makefile   |   1 +
.../common/videobuf2/videobuf2-dma-heap.c | 350 ++
include/media/videobuf2-dma-heap.h|  30 ++
4 files changed, 387 insertions(+)
create mode 100644 drivers/media/common/videobuf2/videobuf2-dma-heap.c
create mode 100644 include/media/videobuf2-dma-heap.h

First of all, thanks for the series.
Possibly a stupid question, but why not just allocate the DMA-bufs
directly from the DMA-buf heap device in the userspace and just import
the buffers to the V4L2 device using V4L2_MEMORY_DMABUF?

Sometimes the allocation policy could be very complex, let's suppose a
multiple planes pixel format enabling with frame buffer compression.
Its luma, chroma data could be allocated from a pool which is delegated
for large buffers while its metadata would come from a pool which many
users could take some few slices from it(likes system pool).
Then when we have a new users knowing nothing about this platform, if we
just configure the alloc_devs in each queues well. The user won't need
to know those complex rules.
The real situation could be more complex, Samsung MFC's left and right
banks could be regarded as two pools, many devices would benefit from
this either from the allocation times or the security buffers policy.
In our design, when we need to do some security decoding(DRM video),
codecs2 would allocate buffers from the pool delegated for that. While
the non-DRM video, users could not care about this.

I'm a little bit surprised about this, because on Android all the
graphics buffers are allocated from the system IAllocator and imported
to the specific devices.

In the non-tunnel mode, yes it is. While the tunnel mode is completely vendor 
defined. Neither HWC nor codec2 cares about where the buffers coming from, you 
could do what ever you want.
Besides there are DRM video in GNU Linux platform, I heard the webkit has made 
huge effort here and Playready is one could work in non-Android Linux.

Would it make sense to instead extend the UAPI to expose enough
information about the allocation requirements to the userspace, so it
can allocate correctly?

Yes, it could. But as I said it would need the users to do more works.

My reasoning here is that it's not a driver's decision to allocate
from a DMA-buf heap (and which one) or not. It's the userspace which
knows that, based on the specific use case that it wants to fulfill.

Although I would like to let the users decide that, users just can’t do that 
which would violate the security rules in some platforms.
For example,  video codec and display device could only access a region of 
memory, any other device or trusted apps can’t access it. Users have to 
allocate the buffer from the pool the vendor decided.
So why not we offer a quick way that users don’t need to try and error.


In principle, I'm not against integrating DMA-buf heap with vb2,
however