Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

2024-05-15 Thread nicolas . dufresne
Le mardi 14 mai 2024 à 23:42 +0300, Laurent Pinchart a écrit :
> > You'll hit the same limitation as we hit in GStreamer, which is that KMS 
> > driver
> > only offer allocation for render buffers and most of them are missing 
> > allocators
> > for YUV buffers, even though they can import in these formats. (kms 
> > allocators,
> > except dumb, which has other issues, are format aware).
> 
> My experience on Arm platforms is that the KMS drivers offer allocation
> for scanout buffers, not render buffers, and mostly using the dumb
> allocator API. If the KMS device can scan out YUV natively, YUV buffer
> allocation should be supported. Am I missing something here ?

There is two APIs, Dumb is the legacy allocation API, only used by display
drivers indeed, and the API does not include a pixel format or a modifier. The
allocation of YUV buffer has been made through a small hack, 

  bpp = number of bits per component (of luma plane if multiple planes)
  width = width
  height = height * X

Where X will vary, "3 / 2" is used for 420 subsampling, "2" for 422 and "3" for
444. It is far from idea, requires deep knowledge of each formats in the
application and cannot allocate each planes seperatly.

The second is to use the driver specific allocation API. This is then abstracted
by GBM. This allows allocating render buffers with notably modifiers and/or use
cases. But no support for YUV formats or multi-planar formats.

Nicolas


Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

2024-05-14 Thread Nicolas Dufresne
Hi,

Le mardi 14 mai 2024 à 23:45 +0300, Laurent Pinchart a écrit :
> > And finally, none of this fixes the issue that the heap allocation are not 
> > being
> > accounted properly and allow of an easy memory DoS. So uaccess should be 
> > granted
> > with care, meaning that defaulting a "desktop" library to that, means it 
> > will
> > most of the time not work at all.
> 
> I think that issue should be fixed, regardless of whether or not we end
> up using dma heaps for libcamera. If we do use them, maybe there will be
> a higher incentive for somebody involved in this conversation to tackle
> that problem first :-) And maybe, as a result, the rest of the Linux
> community will consider with a more open mind usage of dma heaps on
> desktop systems.

The strict reality is that if libcamera offer no alternatives, some OS will
enable it and reduce their security. I totally agree this issue needs to be
fixed regardless of libcamera, or even dma heaps. DMABuf allocation should be
accounted and limited to quotas whether it comes from a GPU, Display, V4L2 or
other type of supported devices. I would also not recommend dropping your heap
support (or preventing it from being merged) in libcamera.

Nicolas


Re: [PATCH v7 7/8] media: imagination: Round to closest multiple for cropping region

2024-05-14 Thread Nicolas Dufresne
Le samedi 11 mai 2024 à 22:38 +0530, Devarsh Thakkar a écrit :
> Hi Andy,
> 
> Thanks for the quick review.
> On 10/05/24 20:40, Andy Shevchenko wrote:
> > On Fri, May 10, 2024 at 12:10:01AM +0530, Devarsh Thakkar wrote:
> > > If neither of the flags to round down (V4L2_SEL_FLAG_LE) or round up
> > > (V4L2_SEL_FLAG_GE) are specified by the user, then round to nearest
> > > multiple of requested value while updating the crop rectangle coordinates.
> > > 
> > > Use the rounding macro which gives preference to rounding down in case two
> > > nearest values (high and low) are possible to raise the probability of
> > > cropping rectangle falling inside the bound region.
> > 
> > This is arguable. How do we know that the bigger range is supported?
> > The safest side is to go smaller than bigger.
> > 
> 
> Yes and that's what the driver does when do when application passes
> V4L2_SEL_FLAG_LE while doing the selection. If application does not
> specify explicitly whether to round down or round up the cropping
> parameters requested by it (i.e app is neither passing V4L2_SEL_FLAG_LE
> nor V4L2_SEL_FLAG_GE flags), then it is preferred by driver to round the
> cropping parameters to nearest possible value by either rounding down or
> rounding up to align with hardware requirements.
> 
> For e.g. If requested width for cropping region is 127 and HW requires
> width to be multiple of 64 then we would prefer to round it up to 128
> rather than rounding down to a more distant value (i.e. 64), but if
> requested cropping width is 129 then we would prefer to instead round it
> down to 128. But if requested cropping width is 160 then there are two
> nearest possible values 160 - 32 = 128 and 160 + 32 = 192 and in which
> case we prefer the smaller value as you suggested and that's why the
> driver uses round_closest_down.
> 
> For any reason, if still the cropping rectangle falls beyond the bound
> region, then driver will return out of range error (-ERANGE) to
> application.

I would appreciate if this change was based on specification text, meaning
improving the next if that behaviour is undefined. We might not be able to fix
it everywhere, but we can recommend something.

Nicolas

> 
> Regards
> Devarsh
> 
> 



Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

2024-05-13 Thread Nicolas Dufresne
Le lundi 13 mai 2024 à 11:34 +0300, Laurent Pinchart a écrit :
> On Mon, May 13, 2024 at 10:29:22AM +0200, Maxime Ripard wrote:
> > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > Hi,
> > > > 
> > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > until we can do better.
> > > > 
> > > > Considering the security concerned raised on this thread with dmabuf 
> > > > heap
> > > > allocation not be restricted by quotas, you'd get what you want quickly 
> > > > with
> > > > memfd + udmabuf instead (which is accounted already).
> > > > 
> > > > It was raised that distro don't enable udmabuf, but as stated there by 
> > > > Hans, in
> > > > any cases distro needs to take action to make the softISP works. This
> > > > alternative is easy and does not interfere in anyway with your future 
> > > > plan or
> > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) 
> > > > and the
> > > > safer memfd+udmabuf for the distro with security concerns.
> > > > 
> > > > And for the long term plan, we can certainly get closer by fixing that 
> > > > issue
> > > > with accounting. This issue also applied to v4l2 io-ops, so it would be 
> > > > nice to
> > > > find common set of helpers to fix these exporters.
> > > 
> > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > about to suggest. Not just as a stopgap, but as the real official thing.
> > > 
> > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > adding the right accounting and then either let mlock rlimits or cgroups
> > > kernel memory limits enforce good behavior.
> > 
> > I think the main drawback with memfd is that it'll be broken for devices
> > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > definitely not for codecs and display engines.
> 
> If the application wants to share buffers between the camera and a
> display engine or codec, it should arguably not use the libcamera
> FrameBufferAllocator, but allocate the buffers from the display or the
> encoder. memfd wouldn't be used in that case.
> 
> We need to eat our own dogfood though. If we want to push the
> responsibility for buffer allocation in the buffer sharing case to the
> application, we need to modify the cam application to do so when using
> the KMS backend.
> 

Agreed, and the new dmabuf feedback on wayland can also be used on top of this.

You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
only offer allocation for render buffers and most of them are missing allocators
for YUV buffers, even though they can import in these formats. (kms allocators,
except dumb, which has other issues, are format aware).

Nicolas


Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

2024-05-13 Thread Nicolas Dufresne
Le lundi 13 mai 2024 à 15:51 +0200, Maxime Ripard a écrit :
> On Mon, May 13, 2024 at 09:42:00AM -0400, Nicolas Dufresne wrote:
> > Le lundi 13 mai 2024 à 10:29 +0200, Maxime Ripard a écrit :
> > > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > > Hi,
> > > > > 
> > > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > > Shorter term, we have a problem to solve, and the best option we 
> > > > > > have
> > > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > > buffer allocatro helper in libcamera for the use case described 
> > > > > > above.
> > > > > > This won't work in 100% of the cases, clearly. It's a stop-gap 
> > > > > > measure
> > > > > > until we can do better.
> > > > > 
> > > > > Considering the security concerned raised on this thread with dmabuf 
> > > > > heap
> > > > > allocation not be restricted by quotas, you'd get what you want 
> > > > > quickly with
> > > > > memfd + udmabuf instead (which is accounted already).
> > > > > 
> > > > > It was raised that distro don't enable udmabuf, but as stated there 
> > > > > by Hans, in
> > > > > any cases distro needs to take action to make the softISP works. This
> > > > > alternative is easy and does not interfere in anyway with your future 
> > > > > plan or
> > > > > the libcamera API. You could even have both dmabuf heap (for 
> > > > > Raspbian) and the
> > > > > safer memfd+udmabuf for the distro with security concerns.
> > > > > 
> > > > > And for the long term plan, we can certainly get closer by fixing 
> > > > > that issue
> > > > > with accounting. This issue also applied to v4l2 io-ops, so it would 
> > > > > be nice to
> > > > > find common set of helpers to fix these exporters.
> > > > 
> > > > Yeah if this is just for softisp, then memfd + udmabuf is also what I 
> > > > was
> > > > about to suggest. Not just as a stopgap, but as the real official thing.
> > > > 
> > > > udmabuf does kinda allow you to pin memory, but we can easily fix that 
> > > > by
> > > > adding the right accounting and then either let mlock rlimits or cgroups
> > > > kernel memory limits enforce good behavior.
> > > 
> > > I think the main drawback with memfd is that it'll be broken for devices
> > > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > > definitely not for codecs and display engines.
> > 
> > In the context of libcamera, the allocation and the alignment done to the 
> > video
> > frame is done completely blindly. In that context, there is a lot more then 
> > just
> > the allocation type that can go wrong and will lead to a memory copy. The 
> > upside
> > of memfd, is that the read cache will help speeding up the copies if they 
> > are
> > needed.
> 
> dma-heaps provide cacheable buffers too...

Yes, and why we have cache hints in V4L2 now. There is no clue that softISP code
can read to make the right call. The required cache management in undefined
until all the importer are known. I also don't think heaps currently care to
adapt the dmabuf sync behaviour based on the different importers, or the
addition of a new importer. On top of which, there is insufficient information
on the device to really deduce what is needed.

> 
> > Another important point is that this is only used if the application haven't
> > provided frames. If your embedded application is non-generic, and you have
> > permissions to access the right heap, the application can solve your 
> > specific
> > issue. But in the generic Linux space, Linux kernel API are just 
> > insufficient
> > for the "just work" scenario.
> 
> ... but they also provide semantics around the memory buffers that no
> other allocation API do. There's at least the mediatek secure playback
> series and another one that I've started to work on to allocate ECC
> protected or unprotected buffers that are just the right use case for
> the heaps, and the target frameworks aren't.

Let's agree we are both off topic now. The libcamera softISP is currently purely
software, and cannot write to any form of protected memory. As for ECC, I would
hope this usage will be coded in the application and that this application has
been authorized to access the appropriate heaps.

And finally, none of this fixes the issue that the heap allocation are not being
accounted properly and allow of an easy memory DoS. So uaccess should be granted
with care, meaning that defaulting a "desktop" library to that, means it will
most of the time not work at all.

Nicolas


Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

2024-05-13 Thread Nicolas Dufresne
Le lundi 13 mai 2024 à 10:29 +0200, Maxime Ripard a écrit :
> On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > Hi,
> > > 
> > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > Shorter term, we have a problem to solve, and the best option we have
> > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > buffer allocatro helper in libcamera for the use case described above.
> > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > until we can do better.
> > > 
> > > Considering the security concerned raised on this thread with dmabuf heap
> > > allocation not be restricted by quotas, you'd get what you want quickly 
> > > with
> > > memfd + udmabuf instead (which is accounted already).
> > > 
> > > It was raised that distro don't enable udmabuf, but as stated there by 
> > > Hans, in
> > > any cases distro needs to take action to make the softISP works. This
> > > alternative is easy and does not interfere in anyway with your future 
> > > plan or
> > > the libcamera API. You could even have both dmabuf heap (for Raspbian) 
> > > and the
> > > safer memfd+udmabuf for the distro with security concerns.
> > > 
> > > And for the long term plan, we can certainly get closer by fixing that 
> > > issue
> > > with accounting. This issue also applied to v4l2 io-ops, so it would be 
> > > nice to
> > > find common set of helpers to fix these exporters.
> > 
> > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > about to suggest. Not just as a stopgap, but as the real official thing.
> > 
> > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > adding the right accounting and then either let mlock rlimits or cgroups
> > kernel memory limits enforce good behavior.
> 
> I think the main drawback with memfd is that it'll be broken for devices
> without an IOMMU, and while you said that it's uncommon for GPUs, it's
> definitely not for codecs and display engines.

In the context of libcamera, the allocation and the alignment done to the video
frame is done completely blindly. In that context, there is a lot more then just
the allocation type that can go wrong and will lead to a memory copy. The upside
of memfd, is that the read cache will help speeding up the copies if they are
needed.

Another important point is that this is only used if the application haven't
provided frames. If your embedded application is non-generic, and you have
permissions to access the right heap, the application can solve your specific
issue. But in the generic Linux space, Linux kernel API are just insufficient
for the "just work" scenario.

Nicolas


Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

2024-05-07 Thread Nicolas Dufresne
Hi,

Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> Shorter term, we have a problem to solve, and the best option we have
> found so far is to rely on dma-buf heaps as a backend for the frame
> buffer allocatro helper in libcamera for the use case described above.
> This won't work in 100% of the cases, clearly. It's a stop-gap measure
> until we can do better.

Considering the security concerned raised on this thread with dmabuf heap
allocation not be restricted by quotas, you'd get what you want quickly with
memfd + udmabuf instead (which is accounted already).

It was raised that distro don't enable udmabuf, but as stated there by Hans, in
any cases distro needs to take action to make the softISP works. This
alternative is easy and does not interfere in anyway with your future plan or
the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
safer memfd+udmabuf for the distro with security concerns.

And for the long term plan, we can certainly get closer by fixing that issue
with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
find common set of helpers to fix these exporters.

regards,
Nicolas


Re: [PATCH v5 1/9] drm/mediatek/uapi: Add DRM_MTK_GEM_CREATE_ENCRYPTED flag

2024-04-16 Thread Nicolas Dufresne
Hi,

Le mercredi 03 avril 2024 à 18:26 +0800, Shawn Sung a écrit :
> From: "Jason-JH.Lin" 
> 
> Add DRM_MTK_GEM_CREATE_ENCRYPTED flag to allow user to allocate

Is "ENCRYPTED" a proper naming ? My expectation is that this would hold data in
a PROTECTED memory region but that no cryptographic algorithm will be involved.

Nicolas

> a secure buffer to support secure video path feature.
> 
> Signed-off-by: Jason-JH.Lin 
> Signed-off-by: Hsiao Chien Sung 
> ---
>  include/uapi/drm/mediatek_drm.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/uapi/drm/mediatek_drm.h b/include/uapi/drm/mediatek_drm.h
> index b0dea00bacbc4..e9125de3a24ad 100644
> --- a/include/uapi/drm/mediatek_drm.h
> +++ b/include/uapi/drm/mediatek_drm.h
> @@ -54,6 +54,7 @@ struct drm_mtk_gem_map_off {
>  
>  #define DRM_MTK_GEM_CREATE   0x00
>  #define DRM_MTK_GEM_MAP_OFFSET   0x01
> +#define DRM_MTK_GEM_CREATE_ENCRYPTED 0x02
>  
>  #define DRM_IOCTL_MTK_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + \
>   DRM_MTK_GEM_CREATE, struct drm_mtk_gem_create)



Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing

2024-03-07 Thread Nicolas Dufresne
Le jeudi 29 février 2024 à 10:02 +0100, Maxime Ripard a écrit :
> Hi Helen,
> 
> Thanks for working on this
> 
> On Wed, Feb 28, 2024 at 07:55:25PM -0300, Helen Koike wrote:
> > This patch introduces a `.gitlab-ci` file along with a `ci/` folder,
> > defininga basic test pipeline triggered by code pushes to a GitLab-CI
> > instance. This initial version includes static checks (checkpatch and
> > smatch for now) and build tests across various architectures and
> > configurations. It leverages an integrated cache for efficient build
> > times and introduces a flexible 'scenarios' mechanism for
> > subsystem-specific extensions.
> > 
> > [ci: add prerequisites to run check-patch on MRs]
> > Co-developed-by: Tales Aparecida 
> > Signed-off-by: Tales Aparecida 
> > Signed-off-by: Helen Koike 
> > 
> > ---
> > 
> > Hey all,
> > 
> > You can check the validation of this patchset on:
> > https://gitlab.collabora.com/koike/linux/-/pipelines/87035
> > 
> > I would appreciate your feedback on this work, what do you think?
> > 
> > If you would rate from 0 to 5, where:
> > 
> > [ ] 0. I don't think this is useful at all, and I doubt it will ever be. It 
> > doesn't seem worthwhile.
> > [ ] 1. I don't find it useful in its current form.
> > [ ] 2. It might be useful to others, but not for me.
> > [ ] 3. It has potential, but it's not yet something I can incorporate into 
> > my workflow.
> > [ ] 4. This is useful, but it needs some adjustments before I can include 
> > it in my workflow.
> > [ ] 5. This is really useful! I'm eager to start using it right away. Why 
> > didn't you send this earlier? :)
> > 
> > Which rating would you select?
> 
> 4.5 :)
> 
> One thing I'm wondering here is how we're going to cope with the
> different requirements each user / framework has.
> 
> Like, Linus probably want to have a different set of CI before merging a
> PR than (say) linux-next does, or stable, or before doing an actual
> release.
> 
> Similarly, DRM probably has a different set of requirements than
> drm-misc, drm-amd or nouveau.
> 
> I don't see how the current architecture could accomodate for that. I
> know that Gitlab allows to store issues template in a separate repo,
> maybe we could ask them to provide a feature where the actions would be
> separate from the main repo? That way, any gitlab project could provide
> its own set of tests, without conflicting with each others (and we could
> still share them if we wanted to)
> 
> I know some of use had good relationship with Gitlab, so maybe it would
> be worth asking?

As agreed, the .gitlab-ci.yaml file at the list will go away. Its a default
location, but not a required location. This way, each sub-system can have their
own (or not have one). The different sub-system forks will have to be configured
to point to their respective CI main configuration.

Of course nothing prevents having common set of configuration for jobs and jobs
template. As an example, we could have a job template common for checkpatch, and
allow each subsystem adding their own sauce on top. It can save the duplicate
effort of parsing the tool results and reporting it in a format gitlab
understand.

This also allow subsystems to offer different coverage, e.g. a fast vs a full
coverage. Or perhaps a configuration to focus on specific devices. But in
general, just not having a central config is enough to support the idea. What we
have now could be entirely drm specific and "commonized" later when other
subsystem wanting to use gitlab joins (Linux Media is among those).

Nicolas


Re: [PATCH 0/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing

2024-02-29 Thread Nicolas Dufresne
Hi,

Le jeudi 29 février 2024 à 16:16 +0200, Nikolai Kondrashov a écrit :
> On 2/29/24 2:20 PM, Guillaume Tucker wrote:
> > Hello,
> > 
> > On 28/02/2024 23:55, Helen Koike wrote:
> > > Dear Kernel Community,
> > > 
> > > This patch introduces a `.gitlab-ci` file along with a `ci/` folder, 
> > > defining a
> > > basic test pipeline triggered by code pushes to a GitLab-CI instance. This
> > > initial version includes static checks (checkpatch and smatch for now) 
> > > and build
> > > tests across various architectures and configurations. It leverages an
> > > integrated cache for efficient build times and introduces a flexible 
> > > 'scenarios'
> > > mechanism for subsystem-specific extensions.
> > 
> > This sounds like a nice starting point to me as an additional way
> > to run tests upstream.  I have one particular question as I see a
> > pattern through the rest of the email, please see below.
> > 
> > [...]
> > 
> > > 4. **Collaborative Testing Environment:** The kernel community is already
> > > engaged in numerous testing efforts, including various GitLab-CI 
> > > pipelines such
> > > as DRM-CI, which I maintain, along with other solutions like KernelCI and
> > > BPF-CI. This proposal is designed to further stimulate contributions to 
> > > the
> > > evolving testing landscape. Our goal is to establish a comprehensive 
> > > suite of
> > > common tools and files.
> > 
> > [...]
> > 
> > > **Leveraging External Test Labs:**
> > > We can extend our testing to external labs, similar to what DRM-CI 
> > > currently
> > > does. This includes:
> > > - Lava labs
> > > - Bare metal labs
> > > - Using KernelCI-provided labs
> > > 
> > > **Other integrations**
> > > - Submit results to KCIDB
> > 
> > [...]
> > 
> > > **Join Our Slack Channel:**
> > > We have a Slack channel, #gitlab-ci, on the KernelCI Slack instance 
> > > https://kernelci.slack.com/ .
> > > Feel free to join and contribute to the conversation. The KernelCI team 
> > > has
> > > weekly calls where we also discuss the GitLab-CI pipeline.
> > > 
> > > **Acknowledgments:**
> > > A special thanks to Nikolai Kondrashov, Tales da Aparecida - both from 
> > > Red Hat -
> > > and KernelCI community for their valuable feedback and support in this 
> > > proposal.
> > 
> > Where does this fit on the KernelCI roadmap?
> > 
> > I see it mentioned a few times but it's not entirely clear
> > whether this initiative is an independent one or in some way
> > linked to KernelCI.  Say, are you planning to use the kci tool,
> > new API, compiler toolchains, user-space and Docker images etc?
> > Or, are KernelCI plans evolving to follow this move?
> 
> I would say this is an important part of KernelCI the project, considering 
> its 
> aim to improve testing and CI in the kernel. It's not a part of KernelCI the 
> service as it is right now, although I would say it would be good to have 
> ability to submit KernelCI jobs from GitLab CI and pull results in the same 
> pipeline, as we discussed earlier.

I'd like to add that both CI have a different purpose in the Linux project. This
CI work is a pre-merge verification. Everyone needs to run checkpatch and
smatch, this is automating it (and will catch those that forgot or ran it
incorrectly). But it can go further by effectively testing specific patches on
real hardware (with pretty narrow filters). It will help catch submission issues
earlier, and reduce kernelCI regression rate. As a side effect, kernelCI infra
will endup catching the "integration" issues, which are the issue as a result of
simultenous changes in different trees. They are also often more complex and
benefit from the bisection capabilities.

kernelCI tests are also a lot more intensive, they usually covers everything,
but they bundle multiple changes per run. The pre-merge tests will be reduced to
what seems meaningful for the changes. Its important to understand that pre-
merge CI have a time cost, and we need to make sure CI time does not exceed the
merge window period.

Nicolas


Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing

2024-02-29 Thread Nicolas Dufresne
Hi Tim,

just replying below to one of your comment which I happen to be involved in, but
I'll let others reply for the more specific comments.

Le jeudi 29 février 2024 à 02:44 +, Bird, Tim a écrit :
> > -Original Message-
> > From: Helen Koike 
> > 
> 
> 
> > Hey all,
> > 
> > You can check the validation of this patchset on:
> > https://gitlab.collabora.com/koike/linux/-/pipelines/87035
> > 
> > I would appreciate your feedback on this work, what do you think?
> > 
> > If you would rate from 0 to 5, where:
> > 
> > [ ] 0. I don't think this is useful at all, and I doubt it will ever be. It 
> > doesn't seem worthwhile.
> > [ ] 1. I don't find it useful in its current form.
> > [ ] 2. It might be useful to others, but not for me.
> > [ ] 3. It has potential, but it's not yet something I can incorporate into 
> > my workflow.
> > [ ] 4. This is useful, but it needs some adjustments before I can include 
> > it in my workflow.
> > [ ] 5. This is really useful! I'm eager to start using it right away. Why 
> > didn't you send this earlier? :)
> > 
> > Which rating would you select?
> 
> For me, this is a "5".  I don't currently use gitlab, but I might install it 
> just to test this out.
> 
> It looks like there are a large number of YAML files which define the 
> integration between the
> test scripts and gitlab.  Also, there are a number of shell scripts to 
> perform some of the setup
> and tests.  Do you have any idea how difficult it would be to use the shell 
> scripts outside of
> the CI system (e.g. manually)?  That is, are there dependencies between the 
> CI system
> and the shell scripts?

You are effectively the second person I'm aware to provide similar feedback. We
agreed to conduct an effort to remove the gitlab specifics from the script.
Avoid using gitlab CI shell environment in favour of command line arguments.
Also ensure scripts have a "-h" option. This should ease local reproduction and
allow for other CI integration. After all, the Linux kernel is a large community
and gitlab is just one option for managing CI. It is a big system, so we rarely
"just install it" ourself. DRM and Linux Media community are using the
Freedesktop instance, sharing resources and cost within that instance. In Linux
Media we are developing out of tree scripts with similar purpose, but we also
believe a common set of tool, directly in the kernel tree would be a better long
term solution.

> 
> I think the convention, of putting this kind of stuff under a "ci" directory, 
> makes sense.
> And it appears that the sub-dir structure allows for other CI systems to add 
> their
> own config and files also.

CI scripts have the particularity of being very granular, which is very unlike
what a human user would prefer. But when CI fails, you really want to know which
small step failed, which can sometimes be hidden by more en-globing scripts. We
also care a lot about parallelism, since we have hundreds of runners available
to execute these tests.

Short answer, I also like that this is under a CI directory, its makes ensure
the purpose and intention of this work is clear.

regards,
Nicolas


Re: [PATCH v3,04/21] v4l: add documentation for secure memory flag

2024-01-17 Thread Nicolas Dufresne
Hi,

Le mercredi 06 décembre 2023 à 16:15 +0800, Yunfei Dong a écrit :
> From: Jeffrey Kardatzke 
> 
> Adds documentation for V4L2_MEMORY_FLAG_SECURE.

As I noticed from DMA Heap discussions, shall this also be renamed SECURE ->
RESTRICTED ?

regards,
Nicolas

> 
> Signed-off-by: Jeffrey Kardatzke 
> Signed-off-by: Yunfei Dong 
> ---
>  Documentation/userspace-api/media/v4l/buffer.rst | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
> b/Documentation/userspace-api/media/v4l/buffer.rst
> index 52bbee81c080..a5a7d1c72d53 100644
> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> @@ -696,7 +696,7 @@ enum v4l2_memory
>  
>  .. _memory-flags:
>  
> -Memory Consistency Flags
> +Memory Flags
>  
>  
>  .. raw:: latex
> @@ -728,6 +728,12 @@ Memory Consistency Flags
>   only if the buffer is used for :ref:`memory mapping ` I/O and the
>   queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
>   ` capability.
> +* .. _`V4L2-MEMORY-FLAG-SECURE`:
> +
> +  - ``V4L2_MEMORY_FLAG_SECURE``
> +  - 0x0002
> +  - DMA bufs passed into the queue will be validated to ensure they were
> + allocated from a secure dma-heap.
>  
>  .. raw:: latex
>  



Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-12 Thread Nicolas Dufresne
Le mardi 12 septembre 2023 à 08:47 +, Yong Wu (吴勇) a écrit :
> On Mon, 2023-09-11 at 12:12 -0400, Nicolas Dufresne wrote:
> >  
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  Hi,
> > 
> > Le lundi 11 septembre 2023 à 10:30 +0800, Yong Wu a écrit :
> > > From: John Stultz 
> > > 
> > > This allows drivers who don't want to create their own
> > > DMA-BUF exporter to be able to allocate DMA-BUFs directly
> > > from existing DMA-BUF Heaps.
> > > 
> > > There is some concern that the premise of DMA-BUF heaps is
> > > that userland knows better about what type of heap memory
> > > is needed for a pipeline, so it would likely be best for
> > > drivers to import and fill DMA-BUFs allocated by userland
> > > instead of allocating one themselves, but this is still
> > > up for debate.
> > 
> > 
> > Would be nice for the reviewers to provide the information about the
> > user of
> > this new in-kernel API. I noticed it because I was CCed, but
> > strangely it didn't
> > make it to the mailing list yet and its not clear in the cover what
> > this is used
> > with. 
> > 
> > I can explain in my words though, my read is that this is used to
> > allocate both
> > user visible and driver internal memory segments in MTK VCODEC
> > driver.
> > 
> > I'm somewhat concerned that DMABuf objects are used to abstract
> > secure memory
> > allocation from tee. For framebuffers that are going to be exported
> > and shared
> > its probably fair use, but it seems that internal shared memory and
> > codec
> > specific reference buffers also endup with a dmabuf fd (often called
> > a secure fd
> > in the v4l2 patchset) for data that is not being shared, and requires
> > a 1:1
> > mapping to a tee handle anyway. Is that the design we'd like to
> > follow ? 
> 
> Yes. basically this is right.
> 
> > Can't
> > we directly allocate from the tee, adding needed helper to make this
> > as simple
> > as allocating from a HEAP ?
> 
> If this happens, the memory will always be inside TEE. Here we create a
> new _CMA heap, it will cma_alloc/free dynamically. Reserve it before
> SVP start, and release to kernel after SVP done.

Ok, I see the benefit of having a common driver then. It would add to the
complexity, but having a driver for the tee allocator and v4l2/heaps would be
another option?

>   
> Secondly. the v4l2/drm has the mature driver control flow, like
> drm_gem_prime_import_dev that always use dma_buf ops. So we can use the
> current flow as much as possible without having to re-plan a flow in
> the TEE.

>From what I've read from Yunfei series, this is only partially true for V4L2.
The vb2 queue MMAP feature have dmabuf exportation as optional, but its not a
problem to always back it up with a dmabuf object. But for internal SHM buffers
used for firmware communication, I've never seen any driver use a DMABuf.

Same applies for primary decode buffers when frame buffer compression or post-
processing it used (or reconstruction buffer in encoders), these are not user
visible and are usually not DMABuf.

> 
> > 
> > Nicolas
> > 
> > > 
> > > Signed-off-by: John Stultz 
> > > Signed-off-by: T.J. Mercier 
> > > Signed-off-by: Yong Wu 
> > > [Yong: Fix the checkpatch alignment warning]
> > > ---
> > >  drivers/dma-buf/dma-heap.c | 60 
> > --
> > >  include/linux/dma-heap.h   | 25 
> > >  2 files changed, 69 insertions(+), 16 deletions(-)
> > > 
> [snip]



Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-12 Thread Nicolas Dufresne
Le mardi 12 septembre 2023 à 16:46 +0200, Christian König a écrit :
> Am 12.09.23 um 10:52 schrieb Yong Wu (吴勇):
> > [SNIP]
> > > But what we should try to avoid is that newly merged drivers provide
> > > both a driver specific UAPI and DMA-heaps. The justification that
> > > this
> > > makes it easier to transit userspace to the new UAPI doesn't really
> > > count.
> > > 
> > > That would be adding UAPI already with a plan to deprecate it and
> > > that
> > > is most likely not helpful considering that UAPI must be supported
> > > forever as soon as it is upstream.
> > Sorry, I didn't understand this. I think we have not change the UAPI.
> > Which code are you referring to?
> 
> Well, what do you need this for if not a new UAPI?
> 
> My assumption here is that you need to export the DMA-heap allocation 
> function so that you can server an UAPI in your new driver. Or what else 
> is that good for?
> 
> As far as I understand you try to upstream your new vcodec driver. So 
> while this change here seems to be a good idea to clean up existing 
> drivers it doesn't look like a good idea for a newly created driver.

MTK VCODEC has been upstream for quite some time now. The other patchset is
trying to add secure decoding/encoding support to that existing upstream driver.

Regarding the uAPI, it seems that this addition to dmabuf heap internal API is
exactly the opposite. By making heaps available to drivers, modification to the
V4L2 uAPI is being reduced to adding "SECURE_MODE" + "SECURE_HEAP_ID" controls
(this is not debated yet has an approach). The heaps is being used internally in
replacement to every allocation, user visible or not.

Nicolas

> 
> Regards,
> Christian.
> 
> > > > So I think this patch is a little confusing in this series, as I
> > > don't
> > > > see much of it actually being used here (though forgive me if I'm
> > > > missing it).
> > > > 
> > > > Instead, It seems it get used in a separate patch series here:
> > > > 
> > > https://lore.kernel.org/all/20230911125936.10648-1-yunfei.d...@mediatek.com/
> > > 
> > > Please try to avoid stuff like that it is really confusing and eats
> > > reviewers time.
> > My fault, I thought dma-buf and media belonged to the different tree,
> > so I send them separately. The cover letter just said "The consumers of
> > the new heap and new interface are our codecs and DRM, which will be
> > sent upstream soon", and there was no vcodec link at that time.
> > 
> > In the next version, we will put the first three patches into the
> > vcodec patchset.
> > 
> > Thanks.
> > 
> 



Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-12 Thread Nicolas Dufresne
Le lundi 11 septembre 2023 à 12:13 +0200, Christian König a écrit :
> Am 11.09.23 um 04:30 schrieb Yong Wu:
> > From: John Stultz 
> > 
> > This allows drivers who don't want to create their own
> > DMA-BUF exporter to be able to allocate DMA-BUFs directly
> > from existing DMA-BUF Heaps.
> > 
> > There is some concern that the premise of DMA-BUF heaps is
> > that userland knows better about what type of heap memory
> > is needed for a pipeline, so it would likely be best for
> > drivers to import and fill DMA-BUFs allocated by userland
> > instead of allocating one themselves, but this is still
> > up for debate.
> 
> The main design goal of having DMA-heaps in the first place is to avoid 
> per driver allocation and this is not necessary because userland know 
> better what type of memory it wants.

If the memory is user visible, yes. When I look at the MTK VCODEC changes, this
seems to be used for internal codec state and SHM buffers used to communicate
with firmware.

> 
> The background is rather that we generally want to decouple allocation 
> from having a device driver connection so that we have better chance 
> that multiple devices can work with the same memory.
> 
> I once create a prototype which gives userspace a hint which DMA-heap to 
> user for which device: 
> https://patchwork.kernel.org/project/linux-media/patch/20230123123756.401692-2-christian.koe...@amd.com/
> 
> Problem is that I don't really have time to look into it and maintain 
> that stuff, but I think from the high level design that is rather the 
> general direction we should push at.
> 
> Regards,
> Christian.
> 
> > 
> > Signed-off-by: John Stultz 
> > Signed-off-by: T.J. Mercier 
> > Signed-off-by: Yong Wu 
> > [Yong: Fix the checkpatch alignment warning]
> > ---
> >   drivers/dma-buf/dma-heap.c | 60 --
> >   include/linux/dma-heap.h   | 25 
> >   2 files changed, 69 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index dcc0e38c61fa..908bb30dc864 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -53,12 +53,15 @@ static dev_t dma_heap_devt;
> >   static struct class *dma_heap_class;
> >   static DEFINE_XARRAY_ALLOC(dma_heap_minors);
> >   
> > -static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > -unsigned int fd_flags,
> > -unsigned int heap_flags)
> > +struct dma_buf *dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > + unsigned int fd_flags,
> > + unsigned int heap_flags)
> >   {
> > -   struct dma_buf *dmabuf;
> > -   int fd;
> > +   if (fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   if (heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > +   return ERR_PTR(-EINVAL);
> >   
> > /*
> >  * Allocations from all heaps have to begin
> > @@ -66,9 +69,20 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, 
> > size_t len,
> >  */
> > len = PAGE_ALIGN(len);
> > if (!len)
> > -   return -EINVAL;
> > +   return ERR_PTR(-EINVAL);
> >   
> > -   dmabuf = heap->ops->allocate(heap, len, fd_flags, heap_flags);
> > +   return heap->ops->allocate(heap, len, fd_flags, heap_flags);
> > +}
> > +EXPORT_SYMBOL_GPL(dma_heap_buffer_alloc);
> > +
> > +static int dma_heap_bufferfd_alloc(struct dma_heap *heap, size_t len,
> > +  unsigned int fd_flags,
> > +  unsigned int heap_flags)
> > +{
> > +   struct dma_buf *dmabuf;
> > +   int fd;
> > +
> > +   dmabuf = dma_heap_buffer_alloc(heap, len, fd_flags, heap_flags);
> > if (IS_ERR(dmabuf))
> > return PTR_ERR(dmabuf);
> >   
> > @@ -106,15 +120,9 @@ static long dma_heap_ioctl_allocate(struct file *file, 
> > void *data)
> > if (heap_allocation->fd)
> > return -EINVAL;
> >   
> > -   if (heap_allocation->fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
> > -   return -EINVAL;
> > -
> > -   if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > -   return -EINVAL;
> > -
> > -   fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > -  heap_allocation->fd_flags,
> > -  heap_allocation->heap_flags);
> > +   fd = dma_heap_bufferfd_alloc(heap, heap_allocation->len,
> > +heap_allocation->fd_flags,
> > +heap_allocation->heap_flags);
> > if (fd < 0)
> > return fd;
> >   
> > @@ -205,6 +213,7 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> >   {
> > return heap->name;
> >   }
> > +EXPORT_SYMBOL_GPL(dma_heap_get_name);
> >   
> >   struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> >   {
> > @@ -290,6 +299,24 @@ struct dma_heap *dma_heap_add(const 

Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-11 Thread Nicolas Dufresne
Hi,

Le lundi 11 septembre 2023 à 10:30 +0800, Yong Wu a écrit :
> From: John Stultz 
> 
> This allows drivers who don't want to create their own
> DMA-BUF exporter to be able to allocate DMA-BUFs directly
> from existing DMA-BUF Heaps.
> 
> There is some concern that the premise of DMA-BUF heaps is
> that userland knows better about what type of heap memory
> is needed for a pipeline, so it would likely be best for
> drivers to import and fill DMA-BUFs allocated by userland
> instead of allocating one themselves, but this is still
> up for debate.


Would be nice for the reviewers to provide the information about the user of
this new in-kernel API. I noticed it because I was CCed, but strangely it didn't
make it to the mailing list yet and its not clear in the cover what this is used
with. 

I can explain in my words though, my read is that this is used to allocate both
user visible and driver internal memory segments in MTK VCODEC driver.

I'm somewhat concerned that DMABuf objects are used to abstract secure memory
allocation from tee. For framebuffers that are going to be exported and shared
its probably fair use, but it seems that internal shared memory and codec
specific reference buffers also endup with a dmabuf fd (often called a secure fd
in the v4l2 patchset) for data that is not being shared, and requires a 1:1
mapping to a tee handle anyway. Is that the design we'd like to follow ? Can't
we directly allocate from the tee, adding needed helper to make this as simple
as allocating from a HEAP ?

Nicolas

> 
> Signed-off-by: John Stultz 
> Signed-off-by: T.J. Mercier 
> Signed-off-by: Yong Wu 
> [Yong: Fix the checkpatch alignment warning]
> ---
>  drivers/dma-buf/dma-heap.c | 60 --
>  include/linux/dma-heap.h   | 25 
>  2 files changed, 69 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index dcc0e38c61fa..908bb30dc864 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -53,12 +53,15 @@ static dev_t dma_heap_devt;
>  static struct class *dma_heap_class;
>  static DEFINE_XARRAY_ALLOC(dma_heap_minors);
>  
> -static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> -  unsigned int fd_flags,
> -  unsigned int heap_flags)
> +struct dma_buf *dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> +   unsigned int fd_flags,
> +   unsigned int heap_flags)
>  {
> - struct dma_buf *dmabuf;
> - int fd;
> + if (fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
> + return ERR_PTR(-EINVAL);
> +
> + if (heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> + return ERR_PTR(-EINVAL);
>  
>   /*
>* Allocations from all heaps have to begin
> @@ -66,9 +69,20 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, 
> size_t len,
>*/
>   len = PAGE_ALIGN(len);
>   if (!len)
> - return -EINVAL;
> + return ERR_PTR(-EINVAL);
>  
> - dmabuf = heap->ops->allocate(heap, len, fd_flags, heap_flags);
> + return heap->ops->allocate(heap, len, fd_flags, heap_flags);
> +}
> +EXPORT_SYMBOL_GPL(dma_heap_buffer_alloc);
> +
> +static int dma_heap_bufferfd_alloc(struct dma_heap *heap, size_t len,
> +unsigned int fd_flags,
> +unsigned int heap_flags)
> +{
> + struct dma_buf *dmabuf;
> + int fd;
> +
> + dmabuf = dma_heap_buffer_alloc(heap, len, fd_flags, heap_flags);
>   if (IS_ERR(dmabuf))
>   return PTR_ERR(dmabuf);
>  
> @@ -106,15 +120,9 @@ static long dma_heap_ioctl_allocate(struct file *file, 
> void *data)
>   if (heap_allocation->fd)
>   return -EINVAL;
>  
> - if (heap_allocation->fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
> - return -EINVAL;
> -
> - if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> - return -EINVAL;
> -
> - fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> -heap_allocation->fd_flags,
> -heap_allocation->heap_flags);
> + fd = dma_heap_bufferfd_alloc(heap, heap_allocation->len,
> +  heap_allocation->fd_flags,
> +  heap_allocation->heap_flags);
>   if (fd < 0)
>   return fd;
>  
> @@ -205,6 +213,7 @@ const char *dma_heap_get_name(struct dma_heap *heap)
>  {
>   return heap->name;
>  }
> +EXPORT_SYMBOL_GPL(dma_heap_get_name);
>  
>  struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
>  {
> @@ -290,6 +299,24 @@ struct dma_heap *dma_heap_add(const struct 
> dma_heap_export_info *exp_info)
>   kfree(heap);
>   return err_ret;
>  }
> +EXPORT_SYMBOL_GPL(dma_heap_add);
> +
> +struct dma_heap *dma_heap_find(const char *name)
> +{
> 

Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-22 Thread Nicolas Dufresne
Hi,

Le mardi 22 août 2023 à 19:14 +0800, Hsia-Jun Li a écrit :
> Hello
> 
> I would like to introduce a usage of SHMEM slimier to DMA-buf, the major 
> purpose of that is sharing metadata or just a pure container for cross 
> drivers.
> 
> We need to exchange some sort of metadata between drivers, likes dynamic 
> HDR data between video4linux2 and DRM. Or the graphics frame buffer is 
> too complex to be described with plain plane's DMA-buf fd.
> An issue between DRM and V4L2 is that DRM could only support 4 planes 
> while it is 8 for V4L2. It would be pretty hard for DRM to expend its 
> interface to support that 4 more planes which would lead to revision of 
> many standard likes Vulkan, EGL.
> 
> Also, there is no reason to consume a device's memory for the content 
> that device can't read it, or wasting an entry of IOMMU for such data.
> Usually, such a metadata would be the value should be written to a 
> hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.
> 
> Still, I have some problems with SHMEM:
> 1. I don't want thhe userspace modify the context of the SHMEM allocated 
> by the kernel, is there a way to do so?
> 2. Should I create a helper function for installing the SHMEM file as a fd?

Please have a look at memfd and the seal feature, it does cover the reason why
unsealed shared memory require full trust. For controls, the SEAL_WRITE is even
needed, as with appropriate timing, a malicous process can modify the data in-
between validation and allocation, causing possible memory overflow.

https://man7.org/linux/man-pages/man2/memfd_create.2.html
File sealing
   In the absence of file sealing, processes that communicate via
   shared memory must either trust each other, or take measures to
   deal with the possibility that an untrusted peer may manipulate
   the shared memory region in problematic ways.  For example, an
   untrusted peer might modify the contents of the shared memory at
   any time, or shrink the shared memory region.  The former
   possibility leaves the local process vulnerable to time-of-check-
   to-time-of-use race conditions (typically dealt with by copying
   data from the shared memory region before checking and using it).
   The latter possibility leaves the local process vulnerable to
   SIGBUS signals when an attempt is made to access a now-
   nonexistent location in the shared memory region.  (Dealing with
   this possibility necessitates the use of a handler for the SIGBUS
   signal.)

   Dealing with untrusted peers imposes extra complexity on code
   that employs shared memory.  Memory sealing enables that extra
   complexity to be eliminated, by allowing a process to operate
   secure in the knowledge that its peer can't modify the shared
   memory in an undesired fashion.

   [...]

regards,
Nicolas


Re: [v2] media: mediatek: vcodec: fix AV1 decode fail for 36bit iova

2023-08-02 Thread Nicolas Dufresne
Hi,

Le mardi 04 juillet 2023 à 09:51 +0800, Xiaoyong Lu a écrit :
> Fix av1 decode fail when iova is 36bit.

I'd change the subject to "media: mediatek: vcodec: fix AV1 decoding on MT8188"
And rephrase this one to:

  Fix AV1 decoding failure when the iova is 36bit.

> 
> Decoder hardware will access incorrect iova address when tile buffer is
> 36bit, it will lead to iommu fault when hardware access dram data.

Suggest to rephrase this:

   Before this fix, the decoder was accessing incorrect addresses with 36bit
   iova tile buffer, leading to iommu faults.

> 
> Fixes: 2f5d0aef37c6 ("media: mediatek: vcodec: support stateless AV1 decoder")
> Signed-off-by: Xiaoyong Lu

With some rework of the commit message, see my suggestions above:

Reviewed-by: Nicolas Dufresne 

> ---
> Changes from v1
> 
> - prefer '|' rather than '+'
> - prefer '&' rather than shift operation
> - add comments for address operations
> 
> v1:
> - VDEC HW can access tile buffer and decode normally.
> - Test ok by mt8195 32bit and mt8188 36bit iova.
> 
> ---
>  .../mediatek/vcodec/vdec/vdec_av1_req_lat_if.c   | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git 
> a/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c 
> b/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c
> index 404a1a23fd402..e9f2393f6a883 100644
> --- a/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c
> +++ b/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c
> @@ -1658,9 +1658,9 @@ static void vdec_av1_slice_setup_tile_buffer(struct 
> vdec_av1_slice_instance *ins
>   u32 allow_update_cdf = 0;
>   u32 sb_boundary_x_m1 = 0, sb_boundary_y_m1 = 0;
>   int tile_info_base;
> - u32 tile_buf_pa;
> + u64 tile_buf_pa;
>   u32 *tile_info_buf = instance->tile.va;
> - u32 pa = (u32)bs->dma_addr;
> + u64 pa = (u64)bs->dma_addr;
>  
>   if (uh->disable_cdf_update == 0)
>   allow_update_cdf = 1;
> @@ -1673,8 +1673,12 @@ static void vdec_av1_slice_setup_tile_buffer(struct 
> vdec_av1_slice_instance *ins
>   tile_info_buf[tile_info_base + 0] = 
> (tile_group->tile_size[tile_num] << 3);
>   tile_buf_pa = pa + tile_group->tile_start_offset[tile_num];
>  
> - tile_info_buf[tile_info_base + 1] = (tile_buf_pa >> 4) << 4;
> - tile_info_buf[tile_info_base + 2] = (tile_buf_pa % 16) << 3;
> + /* save av1 tile high 4bits(bit 32-35) address in lower 4 bits 
> position
> +  * and clear original for hw requirement.
> +  */
> + tile_info_buf[tile_info_base + 1] = (tile_buf_pa & 
> 0xFFF0ull) |
> + ((tile_buf_pa & 0xFull) >> 32);
> + tile_info_buf[tile_info_base + 2] = (tile_buf_pa & 0xFull) << 3;
>  
>   sb_boundary_x_m1 =
>   (tile->mi_col_starts[tile_col + 1] - 
> tile->mi_col_starts[tile_col] - 1) &



Re: [PATCH 3/9] drm/verisilicon: Add basic drm driver

2023-07-13 Thread Nicolas Dufresne
Le samedi 08 juillet 2023 à 21:11 +0200, Thomas Zimmermann a écrit :
> Hi
> 
> Am 07.07.23 um 20:09 schrieb Nicolas Dufresne:
> [...]
> > > > +config DRM_VERISILICON
> > > > +   tristate "DRM Support for VeriSilicon"
> > > 
> > > Can you rename the driver and files? 'VeriSilicon' seems
> > > unpronounceable. Simply 'StarFive' and starfive/ would be fine.
> > 
> > Are you sure you want to request this ? If the display controller is a
> > Verisilicon design, it will be super odd to use on other SoC that aren't 
> > from
> > StarFive. Think about STM network driver, which is DesignWare.
> 
> It's not a hard requirement. If that's the name, so be it.

If that helps you pronouncing this, it is commonly pronounced has:

  very-silicon

Or just a caulking mess if you really hate it :-D

Nicolas

> 
> Best regards
> Thomas
> 
> > 
> > Nicolas
> > 
> > > 
> > > > +   depends on DRM
> > > > +   select DRM_KMS_HELPER
> > > > +   select CMA
> > > > +   select DMA_CMA
> > > > +   help
> > > > + Choose this option if you have a VeriSilicon soc chipset.
> > > > + This driver provides VeriSilicon kernel mode
> > > > + setting and buffer management. It does not
> > > > + provide 2D or 3D acceleration.
> > > > diff --git a/drivers/gpu/drm/verisilicon/Makefile 
> > > > b/drivers/gpu/drm/verisilicon/Makefile
> > > > new file mode 100644
> > > > index ..64ce1b26546c
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/verisilicon/Makefile
> > > > @@ -0,0 +1,6 @@
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +vs_drm-objs := vs_drv.o
> > > > +
> > > > +obj-$(CONFIG_DRM_VERISILICON) += vs_drm.o
> > > > +
> > > > diff --git a/drivers/gpu/drm/verisilicon/vs_drv.c 
> > > > b/drivers/gpu/drm/verisilicon/vs_drv.c
> > > > new file mode 100644
> > > > index ..24d333598477
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/verisilicon/vs_drv.c
> > > > @@ -0,0 +1,284 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * Copyright (C) 2023 VeriSilicon Holdings Co., Ltd.
> > > > + */
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#include "vs_drv.h"
> > > > +
> > > > +#define DRV_NAME   "starfive"
> > > > +#define DRV_DESC   "Starfive DRM driver"
> > > > +#define DRV_DATE   "202305161"
> > > > +#define DRV_MAJOR  1
> > > > +#define DRV_MINOR  0
> > > > +
> > > > +static struct platform_driver vs_drm_platform_driver;
> > > > +
> > > > +static const struct file_operations fops = {
> > > > +   .owner  = THIS_MODULE,
> > > > +   .open   = drm_open,
> > > > +   .release= drm_release,
> > > > +   .unlocked_ioctl = drm_ioctl,
> > > > +   .compat_ioctl   = drm_compat_ioctl,
> > > > +   .poll   = drm_poll,
> > > > +   .read   = drm_read,
> > > > +};
> > > > +
> > > > +static struct drm_driver vs_drm_driver = {
> > > > +   .driver_features= DRIVER_MODESET | DRIVER_ATOMIC | 
> > > > DRIVER_GEM,
> > > > +   .lastclose  = drm_fb_helper_lastclose,
> > > > +   .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > > +   .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > +   .fops   = ,
> >

Re: [PATCH 3/9] drm/verisilicon: Add basic drm driver

2023-07-07 Thread Nicolas Dufresne
Hi Thomas,

Le lundi 19 juin 2023 à 14:59 +0200, Thomas Zimmermann a écrit :
> Hi,
> 
> I appreciate that you split the driver into small patches. Please find 
> some comments below.
> 
> Am 02.06.23 um 09:40 schrieb Keith Zhao:
> > Add a basic platform driver of the DRM driver for JH7110 SoC.
> > 
> > Signed-off-by: Keith Zhao 
> > ---
> >   MAINTAINERS  |   2 +
> >   drivers/gpu/drm/Kconfig  |   2 +
> >   drivers/gpu/drm/Makefile |   1 +
> >   drivers/gpu/drm/verisilicon/Kconfig  |  13 ++
> >   drivers/gpu/drm/verisilicon/Makefile |   6 +
> >   drivers/gpu/drm/verisilicon/vs_drv.c | 284 +++
> >   drivers/gpu/drm/verisilicon/vs_drv.h |  48 +
> >   include/uapi/drm/drm_fourcc.h|  83 
> >   include/uapi/drm/vs_drm.h|  50 +
> >   9 files changed, 489 insertions(+)
> >   create mode 100644 drivers/gpu/drm/verisilicon/Kconfig
> >   create mode 100644 drivers/gpu/drm/verisilicon/Makefile
> >   create mode 100644 drivers/gpu/drm/verisilicon/vs_drv.c
> >   create mode 100644 drivers/gpu/drm/verisilicon/vs_drv.h
> >   create mode 100644 include/uapi/drm/vs_drm.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 293aa13d484c..da5b6766a7bb 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -7055,6 +7055,8 @@ L:dri-devel@lists.freedesktop.org
> >   S:Maintained
> >   T:git git://anongit.freedesktop.org/drm/drm-misc
> >   F:Documentation/devicetree/bindings/display/verisilicon/
> > +F: drivers/gpu/drm/verisilicon/
> > +F: include/uapi/drm/vs_drm.h
> >   
> >   DRM DRIVERS FOR VIVANTE GPU IP
> >   M:Lucas Stach 
> > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> > index ba3fb04bb691..f7e461fa4656 100644
> > --- a/drivers/gpu/drm/Kconfig
> > +++ b/drivers/gpu/drm/Kconfig
> > @@ -371,6 +371,8 @@ source "drivers/gpu/drm/solomon/Kconfig"
> >   
> >   source "drivers/gpu/drm/sprd/Kconfig"
> >   
> > +source "drivers/gpu/drm/verisilicon/Kconfig"
> > +
> >   config DRM_HYPERV
> > tristate "DRM Support for Hyper-V synthetic video device"
> > depends on DRM && PCI && MMU && HYPERV
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index a33257d2bc7f..e50622ee4e46 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -194,3 +194,4 @@ obj-y   += gud/
> >   obj-$(CONFIG_DRM_HYPERV) += hyperv/
> >   obj-y += solomon/
> >   obj-$(CONFIG_DRM_SPRD) += sprd/
> > +obj-$(CONFIG_DRM_VERISILICON) += verisilicon/
> > diff --git a/drivers/gpu/drm/verisilicon/Kconfig 
> > b/drivers/gpu/drm/verisilicon/Kconfig
> > new file mode 100644
> > index ..89d12185f73b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/verisilicon/Kconfig
> > @@ -0,0 +1,13 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +config DRM_VERISILICON
> > +   tristate "DRM Support for VeriSilicon"
> 
> Can you rename the driver and files? 'VeriSilicon' seems 
> unpronounceable. Simply 'StarFive' and starfive/ would be fine.

Are you sure you want to request this ? If the display controller is a
Verisilicon design, it will be super odd to use on other SoC that aren't from
StarFive. Think about STM network driver, which is DesignWare.

Nicolas

> 
> > +   depends on DRM
> > +   select DRM_KMS_HELPER
> > +   select CMA
> > +   select DMA_CMA
> > +   help
> > + Choose this option if you have a VeriSilicon soc chipset.
> > + This driver provides VeriSilicon kernel mode
> > + setting and buffer management. It does not
> > + provide 2D or 3D acceleration.
> > diff --git a/drivers/gpu/drm/verisilicon/Makefile 
> > b/drivers/gpu/drm/verisilicon/Makefile
> > new file mode 100644
> > index ..64ce1b26546c
> > --- /dev/null
> > +++ b/drivers/gpu/drm/verisilicon/Makefile
> > @@ -0,0 +1,6 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +vs_drm-objs := vs_drv.o
> > +
> > +obj-$(CONFIG_DRM_VERISILICON) += vs_drm.o
> > +
> > diff --git a/drivers/gpu/drm/verisilicon/vs_drv.c 
> > b/drivers/gpu/drm/verisilicon/vs_drv.c
> > new file mode 100644
> > index ..24d333598477
> > --- /dev/null
> > +++ b/drivers/gpu/drm/verisilicon/vs_drv.c
> > @@ -0,0 +1,284 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2023 VeriSilicon Holdings Co., Ltd.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vs_drv.h"
> > +
> > +#define DRV_NAME   "starfive"
> > +#define DRV_DESC   "Starfive DRM driver"
> > +#define DRV_DATE   "202305161"
> > +#define DRV_MAJOR  1
> > +#define DRV_MINOR  0
> > +
> > +static struct platform_driver 

Re: media: mediatek: vcodec: fix AV1 decode fail for 36bit iova

2023-06-28 Thread Nicolas Dufresne
Hi,

Le mercredi 28 juin 2023 à 13:41 +0800, Xiaoyong Lu a écrit :
> Decoder hardware will access incorrect iova address when tile buffer is
> 36bit, leading to iommu fault when hardware access dram data.
> 
> Fixes: 2f5d0aef37c6 ("media: mediatek: vcodec: support stateless AV1 decoder")
> Signed-off-by: Xiaoyong Lu
> ---
> - Test ok: mt8195 32bit and mt8188 36bit iova.
> ---
>  .../platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c| 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git 
> a/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c 
> b/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c
> index 404a1a23fd40..420222c8a56d 100644
> --- a/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c
> +++ b/drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c
> @@ -1658,9 +1658,9 @@ static void vdec_av1_slice_setup_tile_buffer(struct 
> vdec_av1_slice_instance *ins
>   u32 allow_update_cdf = 0;
>   u32 sb_boundary_x_m1 = 0, sb_boundary_y_m1 = 0;
>   int tile_info_base;
> - u32 tile_buf_pa;
> + u64 tile_buf_pa;
>   u32 *tile_info_buf = instance->tile.va;
> - u32 pa = (u32)bs->dma_addr;
> + u64 pa = (u64)bs->dma_addr;
>  
>   if (uh->disable_cdf_update == 0)
>   allow_update_cdf = 1;
> @@ -1673,7 +1673,8 @@ static void vdec_av1_slice_setup_tile_buffer(struct 
> vdec_av1_slice_instance *ins
>   tile_info_buf[tile_info_base + 0] = 
> (tile_group->tile_size[tile_num] << 3);
>   tile_buf_pa = pa + tile_group->tile_start_offset[tile_num];
>  
> - tile_info_buf[tile_info_base + 1] = (tile_buf_pa >> 4) << 4;
> + tile_info_buf[tile_info_base + 1] = (unsigned int)(tile_buf_pa 
> >> 4) << 4 +
> + ((unsigned int)(tile_buf_pa >> 32) & 0xf);

I'm not clear on how this works. In the original code, it was a complicated way
to ignore the 4 least significant bits. Something like this would avoid the cast
and clarify it:

tile_info_buf[tile_info_base + 1] = tile_buf_pa & 
0xFF00ull;

But in the updated code, if you have 36 bit, you store these 2 bits in the lower
part, which was originally cleared. Can you confirm this is exactly what you
wanted ? And if so add a comment ? It could also be written has (but this is
just me considering this more readable, I also prefer | (or) rather then +, and
hates casting):

tile_info_buf[tile_info_base + 1] = (tile_buf_pa & 
0xFF00ull) |
(tile_buf_pa & 0x000Full) >> 32;

>   tile_info_buf[tile_info_base + 2] = (tile_buf_pa % 16) << 3;

Is this the same as ?

tile_info_buf[tile_info_base + 2] = (tile_buf_pa & 0x00FFull) 
<< 3;

> 
>  
>   sb_boundary_x_m1 =



Re: [RFC PATCH v8] media: mediatek: vcodec: support stateless AV1 decoder

2023-03-31 Thread Nicolas Dufresne
Hi Xiao,

Le lundi 30 janvier 2023 à 20:38 +0800, Xiaoyong Lu a écrit :
> Add mediatek av1 decoder linux driver which use the stateless API in
> MT8195.
> 

I think this no longer needs an RFC tag. While at it, it would be nice for the
maintainer to rebase on top if latest media stage (you still have to pull the
uAPI of course).

> 
> Signed-off-by: Xiaoyong Lu

Tested-by: Nicolas Dufresne 
Reviewed-by: Nicolas Dufresne 

> ---
> Changes from v7:

Please, don't forget to include your fluster test result here too. Fluster has 3
test suites, you should provide the score for each of them, and perhaps explain
the failures if any (I think 10bit/422/444 is what remains, and is unsupported
atm).

Also, don't forget to double check with checkpatch (with --strict) to make sure
you have no style issue.

> 
> - change V4L2_CID_STATELESS_AV1_PROFILE to V4L2_CID_MPEG_VIDEO_AV1_PROFILE,
> V4L2_CID_STATELESS_AV1_LEVEL to V4L2_CID_MPEG_VIDEO_AV1_LEVEL to match av1 
> uAPI V4.
> - remove vsi and ctx null check in vdec_av1_slice_init_cdf_table, 
> vdec_av1_slice_init_iq_table for the never true condition.
> - add inline in function vdec_av1_slice_clear_fb, 
> vdec_av1_slice_vsi_from_remote,
> vdec_av1_slice_vsi_to_remote, vdec_av1_slice_setup_state, 
> vdec_av1_slice_setup_operating_mode and vdec_av1_slice_get_dpb_size.
> - remove fb_idx check in vdec_av1_slice_decrease_ref_count.
> - add define AV1_CDF_TABLE_BUFFER_SIZE for magic number 16384.
> - remove intermediate variable "size" at the end of 
> vdec_av1_slice_alloc_working_buffer.
> - use define V4L2_AV1_WARP_MODEL_AFFINE to replace magic number 3 in 
> vdec_av1_slice_setup_gm.
> - change api name vdec_av1_slice_get_relative_dist to 
> vdec_av1_slice_get_sign_bias and return 0 or 1 for the caller directly use.
> - add define AV1_PRIMARY_REF_NONE for magic number 7.
> - remove TODO comment in vdec_av1_slice_update_core.
> - change name irq to irq_enabled in struct vdec_av1_slice_instance.
> - Add newline before return statememt in vdec_av1_slice_init and 
> vdec_av1_slice_flush.
> - remove work_buffer assignment and merge 3 loops with one in 
> vdec_av1_slice_alloc_working_buffer.
> - remove va null check in vdec_av1_slice_free_working_buffer.
> - swap order between vdec_av1_slice_clear_fb and 
> vdec_msg_queue_wait_lat_buf_full in vdec_av1_slice_flush.
> - test by av1 fluster, result is 173/239
> 
> Changes from v6:
> 
> - change slot_id type from u8 to s8
> - test by av1 fluster, result is 173/239
> 
> Changes from v5:
> 
> - change av1 PROFILE and LEVEL cfg
> - test by av1 fluster, result is 173/239
> 
> Changes from v4:
> 
> - convert vb2_find_timestamp to vb2_find_buffer
> - test by av1 fluster, result is 173/239
> 
> Changes from v3:
> 
> - modify comment for struct vdec_av1_slice_slot
> - add define SEG_LVL_ALT_Q
> - change use_lr/use_chroma_lr parse from av1 spec
> - use ARRAY_SIZE to replace size for loop_filter_level and 
> loop_filter_mode_deltas
> - change array size of loop_filter_mode_deltas from 4 to 2
> - add define SECONDARY_FILTER_STRENGTH_NUM_BITS
> - change some hex values from upper case to lower case
> - change *dpb_sz equal to V4L2_AV1_TOTAL_REFS_PER_FRAME + 1
> - test by av1 fluster, result is 173/239
> 
> Changes from v2:
> 
> - Match with av1 uapi v3 modify
> - test by av1 fluster, result is 173/239
> 
> ---
> Reference series:
> [1]: v4 of this series is presend by Daniel Almeida.
>  message-id: 20230103154832.6982-1-daniel.alme...@collabora.com
> 
>  .../media/platform/mediatek/vcodec/Makefile   |1 +
>  .../vcodec/mtk_vcodec_dec_stateless.c |   47 +-
>  .../platform/mediatek/vcodec/mtk_vcodec_drv.h |1 +
>  .../vcodec/vdec/vdec_av1_req_lat_if.c | 2203 +
>  .../platform/mediatek/vcodec/vdec_drv_if.c|4 +
>  .../platform/mediatek/vcodec/vdec_drv_if.h|1 +
>  .../platform/mediatek/vcodec/vdec_msg_queue.c |   27 +
>  .../platform/mediatek/vcodec/vdec_msg_queue.h |4 +
>  8 files changed, 2287 insertions(+), 1 deletion(-)
>  create mode 100644 
> drivers/media/platform/mediatek/vcodec/vdec/vdec_av1_req_lat_if.c
> 
> diff --git a/drivers/media/platform/mediatek/vcodec/Makefile 
> b/drivers/media/platform/mediatek/vcodec/Makefile
> index 93e7a343b5b0e..7537259130072 100644
> --- a/drivers/media/platform/mediatek/vcodec/Makefile
> +++ b/drivers/media/platform/mediatek/vcodec/Makefile
> @@ -10,6 +10,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp8_req_if.o \
>   vdec/vdec_vp9_if.o \
>   vdec/vdec_vp9_req_lat_if.o \
> + vdec/vdec_av1_req_lat_if.o \
>   vdec/vdec_h264_req_if.o \
>   vdec/vdec_h264_r

Re: [PATCH v3 2/7] media: Add Y210, Y212 and Y216 formats

2023-02-24 Thread Nicolas Dufresne
Le jeudi 23 février 2023 à 15:10 +0200, Tomi Valkeinen a écrit :
> Hi,
> 
> On 22/02/2023 17:28, Nicolas Dufresne wrote:
> > Hi Tomi,
> > 
> > Le mercredi 21 décembre 2022 à 11:24 +0200, Tomi Valkeinen a écrit :
> > > Add Y210, Y212 and Y216 formats.
> > > 
> > > Signed-off-by: Tomi Valkeinen 
> > > ---
> > >   .../media/v4l/pixfmt-packed-yuv.rst   | 49 ++-
> > >   drivers/media/v4l2-core/v4l2-ioctl.c  |  3 ++
> > >   include/uapi/linux/videodev2.h|  8 +++
> > >   3 files changed, 58 insertions(+), 2 deletions(-)
> > 
> > It seems you omitted to update v4l2-common.c, Ming Qian had made a 
> > suplicated
> > commit for this, I'll ask him if he can keep the -common changes you forgot.
> 
> Ah, I wasn't aware of the format list in that file.
> 
> I think you refer to the "media: imx-jpeg: Add support for 12 bit 
> extended jpeg" series. Yes, I'm fine if he can add the -common changes 
> there, but I can also send a separate patch. In fact, maybe a separate 
> fix patch is better, so that we can have it merged in the early 6.3 rcs.

I don't think we need to worry about backporting this though. I simply care that
we keep updating -common and encourage using it. The goal of this lib is to
provide a common set of helpers to do calculate format related information. You
don't have to use it at any cost. Allocation is often the cause of memory
corruption issues, and is a very recurrent thing we have to debug and fix.

This was also discussed on IRC yesterday, for Renesas driver, "just porting it"
to use that could mean duplicating the lookup, as Renesas driver also needs its
own map to get the HW specific formats and other information. This looks like a
valid use case to me, and is definitely something -common could improve on.

Nicolas


Re: [PATCH v3 2/7] media: Add Y210, Y212 and Y216 formats

2023-02-22 Thread Nicolas Dufresne
Hi Tomi,

Le mercredi 21 décembre 2022 à 11:24 +0200, Tomi Valkeinen a écrit :
> Add Y210, Y212 and Y216 formats.
> 
> Signed-off-by: Tomi Valkeinen 
> ---
>  .../media/v4l/pixfmt-packed-yuv.rst   | 49 ++-
>  drivers/media/v4l2-core/v4l2-ioctl.c  |  3 ++
>  include/uapi/linux/videodev2.h|  8 +++
>  3 files changed, 58 insertions(+), 2 deletions(-)

It seems you omitted to update v4l2-common.c, Ming Qian had made a suplicated
commit for this, I'll ask him if he can keep the -common changes you forgot.

> 
> diff --git a/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst 
> b/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst
> index bf283a1b5581..24a771542059 100644
> --- a/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst
> +++ b/Documentation/userspace-api/media/v4l/pixfmt-packed-yuv.rst
> @@ -262,7 +262,12 @@ the second byte and Y'\ :sub:`7-0` in the third byte.
>  =
>  
>  These formats, commonly referred to as YUYV or YUY2, subsample the chroma
> -components horizontally by 2, storing 2 pixels in 4 bytes.
> +components horizontally by 2, storing 2 pixels in a container. The container
> +is 32-bits for 8-bit formats, and 64-bits for 10+-bit formats.
> +
> +The packed YUYV formats with more than 8 bits per component are stored as 
> four
> +16-bit little-endian words. Each word's most significant bits contain one
> +component, and the least significant bits are zero padding.
>  
>  .. raw:: latex
>  
> @@ -270,7 +275,7 @@ components horizontally by 2, storing 2 pixels in 4 bytes.
>  
>  .. tabularcolumns:: 
> |p{3.4cm}|p{1.2cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|
>  
> -.. flat-table:: Packed YUV 4:2:2 Formats
> +.. flat-table:: Packed YUV 4:2:2 Formats in 32-bit container
>  :header-rows: 1
>  :stub-columns: 0
>  
> @@ -337,6 +342,46 @@ components horizontally by 2, storing 2 pixels in 4 
> bytes.
>- Y'\ :sub:`3`
>- Cb\ :sub:`2`
>  
> +.. tabularcolumns:: 
> |p{3.4cm}|p{1.2cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|p{0.8cm}|
> +
> +.. flat-table:: Packed YUV 4:2:2 Formats in 64-bit container
> +:header-rows: 1
> +:stub-columns: 0
> +
> +* - Identifier
> +  - Code
> +  - Word 0
> +  - Word 1
> +  - Word 2
> +  - Word 3
> +* .. _V4L2-PIX-FMT-Y210:
> +
> +  - ``V4L2_PIX_FMT_Y210``
> +  - 'Y210'
> +
> +  - Y'\ :sub:`0` (bits 15-6)
> +  - Cb\ :sub:`0` (bits 15-6)
> +  - Y'\ :sub:`1` (bits 15-6)
> +  - Cr\ :sub:`0` (bits 15-6)
> +* .. _V4L2-PIX-FMT-Y212:
> +
> +  - ``V4L2_PIX_FMT_Y212``
> +  - 'Y212'
> +
> +  - Y'\ :sub:`0` (bits 15-4)
> +  - Cb\ :sub:`0` (bits 15-4)
> +  - Y'\ :sub:`1` (bits 15-4)
> +  - Cr\ :sub:`0` (bits 15-4)
> +* .. _V4L2-PIX-FMT-Y216:
> +
> +  - ``V4L2_PIX_FMT_Y216``
> +  - 'Y216'
> +
> +  - Y'\ :sub:`0` (bits 15-0)
> +  - Cb\ :sub:`0` (bits 15-0)
> +  - Y'\ :sub:`1` (bits 15-0)
> +  - Cr\ :sub:`0` (bits 15-0)
> +
>  .. raw:: latex
>  
>  \normalsize
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
> b/drivers/media/v4l2-core/v4l2-ioctl.c
> index 875b9a95e3c8..a244d5181120 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1449,6 +1449,9 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>   case V4L2_META_FMT_RK_ISP1_STAT_3A: descr = "Rockchip ISP1 3A 
> Statistics"; break;
>   case V4L2_PIX_FMT_NV12M_8L128:  descr = "NV12M (8x128 Linear)"; break;
>   case V4L2_PIX_FMT_NV12M_10BE_8L128: descr = "10-bit NV12M (8x128 
> Linear, BE)"; break;
> + case V4L2_PIX_FMT_Y210: descr = "10-bit YUYV Packed"; break;
> + case V4L2_PIX_FMT_Y212: descr = "12-bit YUYV Packed"; break;
> + case V4L2_PIX_FMT_Y216: descr = "16-bit YUYV Packed"; break;
>  
>   default:
>   /* Compressed formats */
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 51d6a8aa4e17..403db3fb5cfa 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -621,6 +621,14 @@ struct v4l2_pix_format {
>  #define V4L2_PIX_FMT_YUVX32  v4l2_fourcc('Y', 'U', 'V', 'X') /* 32  
> YUVX-8-8-8-8  */
>  #define V4L2_PIX_FMT_M420v4l2_fourcc('M', '4', '2', '0') /* 12  YUV 
> 4:2:0 2 lines y, 1 line uv interleaved */
>  
> +/*
> + * YCbCr packed format. For each Y2xx format, xx bits of valid data occupy 
> the MSBs
> + * of the 16 bit components, and 16-xx bits of zero padding occupy the LSBs.
> + */
> +#define V4L2_PIX_FMT_Y210v4l2_fourcc('Y', '2', '1', '0') /* 32  YUYV 
> 4:2:2 */
> +#define V4L2_PIX_FMT_Y212v4l2_fourcc('Y', '2', '1', '2') /* 32  YUYV 
> 4:2:2 */
> +#define V4L2_PIX_FMT_Y216v4l2_fourcc('Y', '2', '1', '6') /* 32  YUYV 
> 4:2:2 */
> +
>  /* two planes -- one Y, one Cr + Cb interleaved  */
>  #define V4L2_PIX_FMT_NV12

Re: Try to address the DMA-buf coherency problem

2022-12-06 Thread Nicolas Dufresne
Le lundi 05 décembre 2022 à 09:28 +0100, Christian König a écrit :
> Hi Tomasz,
> 
> Am 05.12.22 um 07:41 schrieb Tomasz Figa:
> > [SNIP]
> > > In other words explicit ownership transfer is not something we would
> > > want as requirement in the framework, cause otherwise we break tons of
> > > use cases which require concurrent access to the underlying buffer.
> > > 
> > > When a device driver needs explicit ownership transfer it's perfectly
> > > possible to implement this using the dma_fence objects mentioned above.
> > > E.g. drivers can already look at who is accessing a buffer currently and
> > > can even grab explicit ownership of it by adding their own dma_fence
> > > objects.
> > > 
> > > The only exception is CPU based access, e.g. when something is written
> > > with the CPU a cache flush might be necessary and when something is read
> > > with the CPU a cache invalidation might be necessary.
> > > 
> > Okay, that's much clearer now, thanks for clarifying this. So we
> > should be covered for the cache maintenance needs originating from CPU
> > accesses already, +/- the broken cases which don't call the begin/end
> > CPU access routines that I mentioned above.
> > 
> > Similarly, for any ownership transfer between different DMA engines,
> > we should be covered either by the userspace explicitly flushing the
> > hardware pipeline or attaching a DMA-buf fence to the buffer.
> > 
> > But then, what's left to be solved? :) (Besides the cases of missing
> > begin/end CPU access calls.)
> 
> Well there are multiple problems here:
> 
> 1. A lot of userspace applications/frameworks assume that it can 
> allocate the buffer anywhere and it just works.

I know you have said that about 10 times, perhaps I'm about to believe it, but
why do you think userspace assumes this ? Did you actually read code that does
this (that isn't meant to run on controlled environment). And can you provide
some example of broken generic userspace ? The DMABuf flow is meant to be trial
and error. At least in GStreamer, yes, mostly only device allocation (when
genericly usable) is implemented, but the code that has been contribute will try
and fallback back like documented. Still fails sometimes, but that's exactly the
kind of kernel bugs your patchset is trying to address. I don't blame anyone
here, since why would folks on GStreamer/FFMPEG or any other "generic media
framework" spend so much time implement "per linux device code", when non-
embedded (constraint) linux is just handful of users (compare to Windows,
Android, iOS users).

To me, this shouldn't be #1 issue. Perhaps it should simply be replaced by
userspace not supporting DMABuf Heaps. Perhaps add that Linux distribution don't
always enable (or allow normal users to access) heaps (though you point 2. gets
in the way) ? Unlike virtual memory, I don't think there is very good accounting
and reclaiming mechanism for that memory, hence opening these means any
userspace could possibly impair the system functioning. If you can't e.g. limit
their usage within containers, this is pretty difficult for generic linux to
carry. This is a wider problem of course, which likely affect a lot of GPU usage
too, but perhaps it should be in the lower priority part of the todo.

> 
> This isn't true at all, we have tons of cases where device can only 
> access their special memory for certain use cases.
> Just look at scanout for displaying on dGPU, neither AMD nor NVidia 
> supports system memory here. Similar cases exists for audio/video codecs 
> where intermediate memory is only accessible by certain devices because 
> of content protection.

nit: content protection is not CODEC specific, its a platform feature, its also
not really a thing upstream yet from what I'm aware of. This needs unified
design and documentation imho, but also enough standardisation so that a generic
application can use it. Right now, content protection people have been
complaining that V4L2 (and most generic userspace) don't work with their design,
rather then trying to figure-out a design that works with existing API.

> 
> 2. We don't properly communicate allocation requirements to userspace.
> 
> E.g. even if you allocate from DMA-Heaps userspace can currently only 
> guess if normal, CMA or even device specific memory is needed.
> 
> 3. We seem to lack some essential parts of those restrictions in the 
> documentation.

Agreed (can't always disagree).

regards,
Nicolas

> 
> > > > > So if a device driver uses cached system memory on an architecture 
> > > > > which
> > > > > devices which can't access it the right approach is clearly to reject
> > > > > the access.
> > > > I'd like to accent the fact that "requires cache maintenance" != "can't 
> > > > access".
> > > Well that depends. As said above the exporter exports the buffer as it
> > > was allocated.
> > > 
> > > If that means the the exporter provides a piece of memory which requires
> > > CPU cache snooping to access correctly then the best 

Re: [PATCH 2/7] media: Add Y210, Y212 and Y216 formats

2022-12-06 Thread Nicolas Dufresne
Hi,

Le mardi 06 décembre 2022 à 15:39 +0200, Tomi Valkeinen a écrit :
> Add Y210, Y212 and Y216 formats.
> 
> Signed-off-by: Tomi Valkeinen 

This patch is simply missing an update to:

Documentation/userspace-api/media/v4l/yuv-formats.rst

regards,
Nicolas

> ---
>  drivers/media/v4l2-core/v4l2-ioctl.c | 3 +++
>  include/uapi/linux/videodev2.h   | 8 
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
> b/drivers/media/v4l2-core/v4l2-ioctl.c
> index 964300deaf62..ba95389a59b5 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1449,6 +1449,9 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>   case V4L2_META_FMT_RK_ISP1_STAT_3A: descr = "Rockchip ISP1 3A 
> Statistics"; break;
>   case V4L2_PIX_FMT_NV12M_8L128:  descr = "NV12M (8x128 Linear)"; break;
>   case V4L2_PIX_FMT_NV12M_10BE_8L128: descr = "10-bit NV12M (8x128 
> Linear, BE)"; break;
> + case V4L2_PIX_FMT_Y210: descr = "10-bit YUYV Packed"; break;
> + case V4L2_PIX_FMT_Y212: descr = "12-bit YUYV Packed"; break;
> + case V4L2_PIX_FMT_Y216: descr = "16-bit YUYV Packed"; break;
>  
>   default:
>   /* Compressed formats */
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 877fd61693b8..15b640d2da8a 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -621,6 +621,14 @@ struct v4l2_pix_format {
>  #define V4L2_PIX_FMT_YUVX32  v4l2_fourcc('Y', 'U', 'V', 'X') /* 32  
> YUVX-8-8-8-8  */
>  #define V4L2_PIX_FMT_M420v4l2_fourcc('M', '4', '2', '0') /* 12  YUV 
> 4:2:0 2 lines y, 1 line uv interleaved */
>  
> +/*
> + * YCbCr packed format. For each Y2xx format, xx bits of valid data occupy 
> the MSBs
> + * of the 16 bit components, and 16-xx bits of zero padding occupy the LSBs.
> + */
> +#define V4L2_PIX_FMT_Y210v4l2_fourcc('Y', '2', '1', '0') /* 32  YUYV 
> 4:2:2 */
> +#define V4L2_PIX_FMT_Y212v4l2_fourcc('Y', '2', '1', '2') /* 32  YUYV 
> 4:2:2 */
> +#define V4L2_PIX_FMT_Y216v4l2_fourcc('Y', '2', '1', '6') /* 32  YUYV 
> 4:2:2 */
> +
>  /* two planes -- one Y, one Cr + Cb interleaved  */
>  #define V4L2_PIX_FMT_NV12v4l2_fourcc('N', 'V', '1', '2') /* 12  Y/CbCr 
> 4:2:0  */
>  #define V4L2_PIX_FMT_NV21v4l2_fourcc('N', 'V', '2', '1') /* 12  Y/CrCb 
> 4:2:0  */



Re: [PATCH 1/7] media: Add 2-10-10-10 RGB formats

2022-12-06 Thread Nicolas Dufresne
Hi,

Le mardi 06 décembre 2022 à 15:39 +0200, Tomi Valkeinen a écrit :
> Add XBGR2101010, ABGR2101010 and BGRA1010102 formats.
> 
> Signed-off-by: Tomi Valkeinen 

This patch is simply missing an update to

Documentation/userspace-api/media/v4l/pixfmt-rgb.rst

regards,
Nicolas

> ---
>  drivers/media/v4l2-core/v4l2-ioctl.c | 3 +++
>  include/uapi/linux/videodev2.h   | 3 +++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
> b/drivers/media/v4l2-core/v4l2-ioctl.c
> index fddba75d9074..964300deaf62 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1304,6 +1304,9 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
>   case V4L2_PIX_FMT_BGRX32:   descr = "32-bit XBGR 8-8-8-8"; break;
>   case V4L2_PIX_FMT_RGBA32:   descr = "32-bit RGBA 8-8-8-8"; break;
>   case V4L2_PIX_FMT_RGBX32:   descr = "32-bit RGBX 8-8-8-8"; break;
> + case V4L2_PIX_FMT_XBGR2101010:  descr = "32-bit XBGR 2-10-10-10"; break;
> + case V4L2_PIX_FMT_ABGR2101010:  descr = "32-bit ABGR 2-10-10-10"; break;
> + case V4L2_PIX_FMT_BGRA1010102:  descr = "32-bit BGRA 10-10-10-2"; break;
>   case V4L2_PIX_FMT_GREY: descr = "8-bit Greyscale"; break;
>   case V4L2_PIX_FMT_Y4:   descr = "4-bit Greyscale"; break;
>   case V4L2_PIX_FMT_Y6:   descr = "6-bit Greyscale"; break;
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 29da1f4b4578..877fd61693b8 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -576,6 +576,9 @@ struct v4l2_pix_format {
>  #define V4L2_PIX_FMT_RGBX32  v4l2_fourcc('X', 'B', '2', '4') /* 32  
> RGBX-8-8-8-8  */
>  #define V4L2_PIX_FMT_ARGB32  v4l2_fourcc('B', 'A', '2', '4') /* 32  
> ARGB-8-8-8-8  */
>  #define V4L2_PIX_FMT_XRGB32  v4l2_fourcc('B', 'X', '2', '4') /* 32  
> XRGB-8-8-8-8  */
> +#define V4L2_PIX_FMT_XBGR2101010 v4l2_fourcc('R', 'X', '3', '0') /* 32  
> XBGR-2-10-10-10  */
> +#define V4L2_PIX_FMT_ABGR2101010 v4l2_fourcc('R', 'A', '3', '0') /* 32  
> ABGR-2-10-10-10  */
> +#define V4L2_PIX_FMT_BGRA1010102 v4l2_fourcc('A', 'R', '3', '0') /* 32  
> BGRA-10-10-10-2  */
>  
>  /* Grey formats */
>  #define V4L2_PIX_FMT_GREYv4l2_fourcc('G', 'R', 'E', 'Y') /*  8  
> Greyscale */



Re: Try to address the DMA-buf coherency problem

2022-11-25 Thread Nicolas Dufresne
Le mercredi 23 novembre 2022 à 17:33 +0100, Daniel Vetter a écrit :
> On Wed, Nov 23, 2022 at 10:33:38AM +0200, Pekka Paalanen wrote:
> > On Tue, 22 Nov 2022 18:33:59 +0100
> > Christian König  wrote:
> > 
> > > We should have come up with dma-heaps earlier and make it clear that 
> > > exporting a DMA-buf from a device gives you something device specific 
> > > which might or might not work with others.
> > > 
> > > Apart from that I agree, DMA-buf should be capable of handling this. 
> > > Question left is what documentation is missing to make it clear how 
> > > things are supposed to work?
> > 
> > Perhaps somewhat related from Daniel Stone that seems to have been
> > forgotten:
> > https://lore.kernel.org/dri-devel/20210905122742.86029-1-dani...@collabora.com/
> > 
> > It aimed mostly at userspace, but sounds to me like the coherency stuff
> > could use a section of its own there?
> 
> Hm yeah it would be great to land that and then eventually extend. Daniel?

There is a lot of things documented in this document that have been said to be
completely wrong user-space behaviour in this thread. But it seems to pre-date
the DMA Heaps. The document also assume that DMA Heaps completely solves the CMA
vs system memory issue. But it also underline a very important aspect, that
userland is not aware which one to use. What this document suggest though seems
more realist then what has been said here.

Its overall a great document, it unfortunate that it only makes it into the DRM
mailing list.

Nicolas


Re: Try to address the DMA-buf coherency problem

2022-11-19 Thread Nicolas Dufresne
Le vendredi 18 novembre 2022 à 11:32 -0800, Rob Clark a écrit :
> On Thu, Nov 17, 2022 at 7:38 AM Nicolas Dufresne  wrote:
> > 
> > Le jeudi 17 novembre 2022 à 13:10 +0100, Christian König a écrit :
> > > > > DMA-Buf let's the exporter setup the DMA addresses the importer uses 
> > > > > to
> > > > > be able to directly decided where a certain operation should go. E.g. 
> > > > > we
> > > > > have cases where for example a P2P write doesn't even go to memory, 
> > > > > but
> > > > > rather a doorbell BAR to trigger another operation. Throwing in CPU
> > > > > round trips for explicit ownership transfer completely breaks that
> > > > > concept.
> > > > It sounds like we should have a dma_dev_is_coherent_with_dev() which
> > > > accepts two (or an array?) of devices and tells the caller whether the
> > > > devices need explicit ownership transfer.
> > > 
> > > No, exactly that's the concept I'm pushing back on very hard here.
> > > 
> > > In other words explicit ownership transfer is not something we would
> > > want as requirement in the framework, cause otherwise we break tons of
> > > use cases which require concurrent access to the underlying buffer.
> > 
> > I'm not pushing for this solution, but really felt the need to correct you 
> > here.
> > I have quite some experience with ownership transfer mechanism, as this is 
> > how
> > GStreamer framework works since 2000. Concurrent access is a really common 
> > use
> > cases and it is quite well defined in that context. The bracketing system 
> > (in
> > this case called map() unmap(), with flag stating the usage intention like 
> > reads
> > and write) is combined the the refcount. The basic rules are simple:
> 
> This is all CPU oriented, I think Christian is talking about the case
> where ownership transfer happens without CPU involvement, such as via
> GPU waiting on a fence

HW fences and proper ownership isn't incompatible at all. Even if you have no
software involved during the usage, software still need to share the dmabuf (at
least once), and sharing modify the ownership, and can be made explicit.

p.s. I will agree if someone raises that this is totally off topic

Nicolas
> BR,
> -R



Re: Try to address the DMA-buf coherency problem

2022-11-17 Thread Nicolas Dufresne
Le jeudi 17 novembre 2022 à 13:10 +0100, Christian König a écrit :
> > > DMA-Buf let's the exporter setup the DMA addresses the importer uses to
> > > be able to directly decided where a certain operation should go. E.g. we
> > > have cases where for example a P2P write doesn't even go to memory, but
> > > rather a doorbell BAR to trigger another operation. Throwing in CPU
> > > round trips for explicit ownership transfer completely breaks that
> > > concept.
> > It sounds like we should have a dma_dev_is_coherent_with_dev() which
> > accepts two (or an array?) of devices and tells the caller whether the
> > devices need explicit ownership transfer.
> 
> No, exactly that's the concept I'm pushing back on very hard here.
> 
> In other words explicit ownership transfer is not something we would 
> want as requirement in the framework, cause otherwise we break tons of 
> use cases which require concurrent access to the underlying buffer.

I'm not pushing for this solution, but really felt the need to correct you here.
I have quite some experience with ownership transfer mechanism, as this is how
GStreamer framework works since 2000. Concurrent access is a really common use
cases and it is quite well defined in that context. The bracketing system (in
this case called map() unmap(), with flag stating the usage intention like reads
and write) is combined the the refcount. The basic rules are simple:

- An object with a refcount higher then 2 is shared, hence read-only
- An object with refcount of one, mapped for writes becomes exclusive
- Non exclusive writes can be done, but that has to be explicit (intentional),
we didn't go as far as Rust in that domain
- Wrappers around these object can use mechanism like "copy-on-write" and can
also maintain the state of shadow buffers (e.g. GL upload slow cases) even with
concurrent access.

Just hope it clarify, Rust language works, yet its all based on explicit
ownership transfers. Its not limiting, but it requires a different way of
thinking how data is to be accessed.

Nicolas



Re: Try to address the DMA-buf coherency problem

2022-11-04 Thread Nicolas Dufresne
Le vendredi 04 novembre 2022 à 10:03 +0100, Christian König a écrit :
> Am 03.11.22 um 23:16 schrieb Nicolas Dufresne:
> > [SNIP]
> > 
> > Was there APIs suggested to actually make it manageable by userland to 
> > allocate
> > from the GPU? Yes, this what Linux Device Allocator idea is for. Is that API
> > ready, no.
> 
> Well, that stuff is absolutely ready: 
> https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/heaps/system_heap.c#L175
>  
> What do you think I'm talking about all the time?

I'm aware of DMA Heap, still have few gaps, but this unrelated to coherency (we
can discuss offline, with Daniel S.). For DMABuf Heap, its used in many forks by
vendors in production. There is an upstream proposal for GStreamer, but review
comments were never addressed, in short, its stalled, and it waiting for a
volunteer. It might also be based on very old implementation of DMABuf Heap,
needs to be verified in depth for sure as the time have passed.

https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/1391

> 
> DMA-buf has a lengthy section about CPU access to buffers and clearly 
> documents how all of that is supposed to work: 
> https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-buf.c#L1160
>  
> This includes braketing of CPU access with dma_buf_begin_cpu_access() 
> and dma_buf_end_cpu_access(), as well as transaction management between 
> devices and the CPU and even implicit synchronization.
> 
> This specification is then implemented by the different drivers 
> including V4L2: 
> https://elixir.bootlin.com/linux/latest/source/drivers/media/common/videobuf2/videobuf2-dma-sg.c#L473
> 
> As well as the different DRM drivers: 
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L117
>  
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c#L234

I know, I've implement the userspace bracketing for this in GStreamer [1] before
DMAbuf Heap was merged and was one of the reporter for the missing bracketing in
VB2. Was tested against i915 driver. Note, this is just a fallback, the
performance is terrible, memory exported by (at least my old i915 HW) is not
cacheable on CPU. Though, between corrupted image and bad performance or just
bad performance, we decided that it was better to have the second. When the
DMABuf is backed by CPU cacheable memory, peformance is great and CPU fallback
works. Work is in progress to better handle these two cases generically. For
now, sometimes the app need to get involved, but this is only happens on
embedded/controlled kind of use cases. What matters is that application that
needs this can do it.

[1] 
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-base/gst-libs/gst/allocators/gstdmabuf.c

> 
> This design was then used by us with various media players on different 
> customer projects, including QNAP https://www.qnap.com/en/product/ts-877 
> as well as the newest Tesla 
> https://www.amd.com/en/products/embedded-automotive-solutions
> 
> I won't go into the details here, but we are using exactly the approach 
> I've outlined to let userspace control the DMA between the different 
> device in question. I'm one of the main designers of that and our 
> multimedia and mesa team has up-streamed quite a number of changes for 
> this project.
> 
> I'm not that well into different ARM based solutions because we are just 
> recently getting results that this starts to work with AMD GPUs, but I'm 
> pretty sure that the design should be able to handle that as well.
> 
> So we have clearly prove that this design works, even with special 
> requirements which are way more complex than what we are discussing 
> here. We had cases where we used GStreamer to feed DMA-buf handles into 
> multiple devices with different format requirements and that seems to 
> work fine.

Sounds like you have a love/hate relationship with GStreamer. Glad the framework
is working for you too. The framework have had bidirectional memory allocation
for over a decade, it also has context sharing for stacks like
D3D11,12/GL/Vulkan/CUDA etc. I strictly didn't understand what you were
complaining about. As a vendor, you can solve all this in your BSP. Though,
translating BSP patches into a generic upstream-able features is not as simple.
The solution that works for vendor is usually the most cost effective one. I'm
sure, Tesla or AMD Automotive are no exceptions.

> 
> -
> What is clearly a bug in the kernel is that we don't reject things which 
> won't work correctly and this is what this patch here addresses. What we 
> could talk about is backward compatibility for this patch, cause it 
> might look like it breaks things which previously used to work at least 
> pa

Re: Try to address the DMA-buf coherency problem

2022-11-03 Thread Nicolas Dufresne
Le mercredi 02 novembre 2022 à 12:18 +0100, Christian König a écrit :
> Am 01.11.22 um 22:09 schrieb Nicolas Dufresne:
> > [SNIP]
> > > > But the client is just a video player. It doesn't understand how to
> > > > allocate BOs for Panfrost or AMD or etnaviv. So without a universal
> > > > allocator (again ...), 'just allocate on the GPU' isn't a useful
> > > > response to the client.
> > > Well exactly that's the point I'm raising: The client *must* understand
> > > that!
> > > 
> > > See we need to be able to handle all restrictions here, coherency of the
> > > data is just one of them.
> > > 
> > > For example the much more important question is the location of the data
> > > and for this allocating from the V4L2 device is in most cases just not
> > > going to fly.
> > It feels like this is a generic statement and there is no reason it could 
> > not be
> > the other way around.
> 
> And exactly that's my point. You always need to look at both ways to 
> share the buffer and can't assume that one will always work.
> 
> As far as I can see it you guys just allocate a buffer from a V4L2 
> device, fill it with data and send it to Wayland for displaying.

That paragraph is a bit sloppy. By "you guys" you mean what exactly ? Normal
users will let V4L2 device allocate and write into their own memory (the device
fill it, not "you guys"). This is done like this simply because this is
guarantied to work with the V4L2 device. Most V4L2 device produces known by
userpsace pixel formats and layout, for which userspace know for sure it can
implement a GPU shader or software fallback for. I'm still to see one of these
format that cannot be efficiently imported into a modern GPU and converted using
shaders. I'm not entirely sure what/which GPU a dGPU is compared to a GPU btw.

In many cases, camera kind of V4L2 devices will have 1 producer for many
consumers. Consider your photo application, the streams will likely be capture
and displayed while being encoded by one of more CODEC, while being streamed to
a Machine Learning model for analyses. The software complexity to communicate
back the list of receiver devices and implementing all their non-standard way to
allocate memory, so all the combination of trial and error is just ridiculously
high. Remember that each GPU have their own allocation methods and corner cases,
this is simply not manageable by "you guys", which I pretty much assume is
everyone writing software for Generic Linux these days (non-Android/ChromeOS).

> 
> To be honest I'm really surprised that the Wayland guys hasn't pushed 
> back on this practice already.
> 
> This only works because the Wayland as well as X display pipeline is 
> smart enough to insert an extra copy when it find that an imported 
> buffer can't be used as a framebuffer directly.

This is a bit inaccurate. The compositor I've worked with (Gnome and Weston)
will only memcpy SHM. For DMABuf, they will fail importation if its not usable
either by the display or the GPU. Specially on the GPU side though (which is the
ultimate compositor fallback), there exists efficient HW copy mechanism that may
be used, and this is fine, since unlike your scannout example, it won't be
uploading over and over, but will do later re-display from a remote copy (or
transformed copy). Or if you prefer, its cached at the cost of higher memory
usage.

I think it would be preferable to speak about device to device sharing, since
V4L2 vs GPU is not really representative of the program. I think V4L2 vs GPU and
"you guys" simply contribute to the never ending, and needless friction around
that difficulty that exists with current support for memory sharing in Linux.

> 
> >   I have colleague who integrated PCIe CODEC (Blaize Xplorer
> > X1600P PCIe Accelerator) hosting their own RAM. There was large amount of 
> > ways
> > to use it. Of course, in current state of DMABuf, you have to be an 
> > exporter to
> > do anything fancy, but it did not have to be like this, its a design 
> > choice. I'm
> > not sure in the end what was the final method used, the driver isn't yet
> > upstream, so maybe that is not even final. What I know is that there is 
> > various
> > condition you may use the CODEC for which the optimal location will vary. 
> > As an
> > example, using the post processor or not, see my next comment for more 
> > details.
> 
> Yeah, and stuff like this was already discussed multiple times. Local 
> memory of devices can only be made available by the exporter, not the 
> importer.
> 
> So in the case of separated camera and encoder you run into exactly the 
> same limitation that some device needs the allocatio

Re: Try to address the DMA-buf coherency problem

2022-11-01 Thread Nicolas Dufresne
Le mardi 01 novembre 2022 à 18:40 +0100, Christian König a écrit :
> Am 28.10.22 um 20:47 schrieb Daniel Stone:
> > Hi Christian,
> > 
> > On Fri, 28 Oct 2022 at 18:50, Christian König
> >  wrote:
> > > Am 28.10.22 um 17:46 schrieb Nicolas Dufresne:
> > > > Though, its not generically possible to reverse these roles. If you 
> > > > want to do
> > > > so, you endup having to do like Android (gralloc) and ChromeOS 
> > > > (minigbm),
> > > > because you will have to allocate DRM buffers that knows about importer 
> > > > specific
> > > > requirements. See link [1] for what it looks like for RK3399, with 
> > > > Motion Vector
> > > > size calculation copied from the kernel driver into a userspace lib 
> > > > (arguably
> > > > that was available from V4L2 sizeimage, but this is technically 
> > > > difficult to
> > > > communicate within the software layers). If you could let the decoder 
> > > > export
> > > > (with proper cache management) the non-generic code would not be needed.
> > > Yeah, but I can also reverse the argument:
> > > 
> > > Getting the parameters for V4L right so that we can share the image is
> > > tricky, but getting the parameters so that the stuff is actually
> > > directly displayable by GPUs is even trickier.
> > > 
> > > Essentially you need to look at both sides and interference to get to a
> > > common ground, e.g. alignment, pitch, width/height, padding, etc.
> > > 
> > > Deciding from which side to allocate from is just one step in this
> > > process. For example most dGPUs can't display directly from system
> > > memory altogether, but it is possible to allocate the DMA-buf through
> > > the GPU driver and then write into device memory with P2P PCI transfers.
> > > 
> > > So as far as I can see switching importer and exporter roles and even
> > > having performant extra fallbacks should be a standard feature of 
> > > userspace.
> > > 
> > > > Another case where reversing the role is difficult is for case where 
> > > > you need to
> > > > multiplex the streams (let's use a camera to illustrate) and share that 
> > > > with
> > > > multiple processes. In these uses case, the DRM importers are volatile, 
> > > > which
> > > > one do you abuse to do allocation from ? In multimedia server like 
> > > > PipeWire, you
> > > > are not really aware if the camera will be used by DRM or not, and if 
> > > > something
> > > > "special" is needed in term of role inversion. It is relatively easy to 
> > > > deal
> > > > with matching modifiers, but using downstream (display/gpu) as an 
> > > > exporter is
> > > > always difficult (and require some level of abuse and guessing).
> > > Oh, very good point! Yeah we do have use cases for this where an input
> > > buffer is both displayed as well as encoded.
> > This is the main issue, yeah.
> > 
> > For a standard media player, they would try to allocate through V4L2
> > and decode through that into locally-allocated buffers. All they know
> > is that there's a Wayland server at the other end of a socket
> > somewhere which will want to import the FD. The server does give you
> > some hints along the way: it will tell you that importing into a
> > particular GPU target device is necessary as the ultimate fallback,
> > and importing into a particular KMS device is preferable as the
> > optimal path to hit an overlay.
> > 
> > So let's say that the V4L2 client does what you're proposing: it
> > allocates a buffer chain, schedules a decode into that buffer, and
> > passes it along to the server to import. The server fails to import
> > the buffer into the GPU, and tells the client this. The client then
> > ... well, it doesn't know that it needs to allocate within the GPU
> > instead, but it knows that doing so might be one thing which would
> > make the request succeed.
> > 
> > But the client is just a video player. It doesn't understand how to
> > allocate BOs for Panfrost or AMD or etnaviv. So without a universal
> > allocator (again ...), 'just allocate on the GPU' isn't a useful
> > response to the client.
> 
> Well exactly that's the point I'm raising: The client *must* understand 
> that!
> 
> See we need to be able to handle all restrictions here, coherency of the 
> data is just one of them.
>

Re: Try to address the DMA-buf coherency problem

2022-10-28 Thread Nicolas Dufresne
Hi,

just dropping some real live use case, sorry I'm not really proposing solutions,
I believe you are much more knowledgeable in this regard.

Le vendredi 28 octobre 2022 à 16:26 +0200, Christian König a écrit :
> Am 28.10.22 um 13:42 schrieb Lucas Stach:
> > Am Freitag, dem 28.10.2022 um 10:40 +0200 schrieb Christian König:
> > > But essentially the right thing to do. The only alternative I can see is
> > > to reverse the role of exporter and importer.
> > > 
> > I don't think that would work generally either, as buffer exporter and
> > importer isn't always a 1:1 thing. As soon as any attached importer has
> > a different coherency behavior than the others, things fall apart.
> 
> I've just mentioned it because somebody noted that when you reverse the 
> roles of exporter and importer with the V4L driver and i915 then the use 
> case suddenly starts working.

Though, its not generically possible to reverse these roles. If you want to do
so, you endup having to do like Android (gralloc) and ChromeOS (minigbm),
because you will have to allocate DRM buffers that knows about importer specific
requirements. See link [1] for what it looks like for RK3399, with Motion Vector
size calculation copied from the kernel driver into a userspace lib (arguably
that was available from V4L2 sizeimage, but this is technically difficult to
communicate within the software layers). If you could let the decoder export
(with proper cache management) the non-generic code would not be needed.

Another case where reversing the role is difficult is for case where you need to
multiplex the streams (let's use a camera to illustrate) and share that with
multiple processes. In these uses case, the DRM importers are volatile, which
one do you abuse to do allocation from ? In multimedia server like PipeWire, you
are not really aware if the camera will be used by DRM or not, and if something
"special" is needed in term of role inversion. It is relatively easy to deal
with matching modifiers, but using downstream (display/gpu) as an exporter is
always difficult (and require some level of abuse and guessing).

[1]
https://android.googlesource.com/platform/external/minigbm/+/refs/heads/master/rockchip.c#140

> 
> > > > > For DRM and most V4L2 devices I then fill in the dma_coherent flag 
> > > > > based on the
> > > > > return value of dev_is_dma_coherent(). Exporting drivers are allowed 
> > > > > to clear
> > > > > the flag for their buffers if special handling like the USWC flag in 
> > > > > amdgpu or
> > > > > the uncached allocations for radeon/nouveau are in use.
> > > > > 
> > > > I don't think the V4L2 part works for most ARM systems. The default
> > > > there is for devices to be noncoherent unless explicitly marked
> > > > otherwise. I don't think any of the "devices" writing the video buffers
> > > > in cached memory with the CPU do this. While we could probably mark
> > > > them as coherent, I don't think this is moving in the right direction.
> > > Well why not? Those devices are coherent in the sense of the DMA API
> > > that they don't need an extra CPU copy on sync_to_cpu/sync_to_device.
> > > 
> > > We could come up with a better name for coherency, e.g. snooping for
> > > example. But that is just an documentation detail.
> > > 
> > I agree that those devices copying data into a CPU cacheable buffer
> > should be marked as coherent, just not sure right now if other things
> > like DMA mappings are done on that device, which would require the
> > cache maintenance.
> 
> Yeah, good point.
> 
> > > And this the exact wrong approach as far as I can see. As Daniel noted
> > > as well we absolutely need some kind of coherency between exporter and
> > > importer.
> > > 
> > I think it's important that we are very specific about the thing we are
> > talking about here: I guess when you say coherency you mean hardware
> > enforced coherency on cacheable memory, which is the default on
> > x86/PCI.
> 
> Well, no. What I mean with coherency is that the devices don't need 
> insert special operation to access each others data.
> 
> This can be archived by multiple approaches, e.g. by the PCI coherency 
> requirements, device internal connections (XGMI, NVLink, CXL etc...) as 
> well as using uncached system memory.
> 
> The key point is what we certainly don't want is special operations 
> which say: Ok, now device A can access the data, now device B. 
> because this breaks tons of use cases.

I'm coming back again with the multiplexing case. We keep having mixed uses case
with multiple receiver. In some case, data may endup on CPU while being encoded
in HW. Current approach of disabling cache does work, but CPU algorithm truly
suffer in performance. Doing a full memcpy to a cached buffer helps, but remains
slower then if the cache had been snooped by the importer (encoder here) driver.

> 
> > The other way to enforce coherency is to either insert cache
> > maintenance operations, or make sure that the buffer is not cacheable

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-24 Thread Nicolas Dufresne
Le vendredi 19 août 2022 à 23:44 +0800, Hsia-Jun Li a écrit :
> 
> On 8/19/22 23:28, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
> > > On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
> > > > On 8/18/22 14:06, Tomasz Figa wrote:
> > > > > On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  
> > > > > wrote:
> > > > > > 
> > > > > > From: "Hsia-Jun(Randy) Li" 
> > > > > > 
> > > > > > The most of detail has been written in the drm.
> > > 
> > > This patch still needs a description of the format, which should go to
> > > Documentation/userspace-api/media/v4l/.
> > > 
> > > > > > Please notice that the tiled formats here request
> > > > > > one more plane for storing the motion vector metadata.
> > > > > > This buffer won't be compressed, so you can't append
> > > > > > it to luma or chroma plane.
> > > > > 
> > > > > Does the motion vector buffer need to be exposed to userspace? Is the
> > > > > decoder stateless (requires userspace to specify the reference frames)
> > > > > or stateful (manages the entire decoding process internally)?
> > > > 
> > > > No, users don't need to access them at all. Just they need a different
> > > > dma-heap.
> > > > 
> > > > You would only get the stateful version of both encoder and decoder.
> > > 
> > > Shouldn't the motion vectors be stored in a separate V4L2 buffer,
> > > submitted through a different queue then ?
> > 
> > Imho, I believe these should be invisible to users and pooled separately to
> > reduce the overhead. The number of reference is usually lower then the 
> > number of
> > allocated display buffers.
> > 
> You can't. The motion vector buffer can't share with the luma and chroma 
> data planes, nor the data plane for the compression meta data.
> 
> You could consider this as a security requirement(the memory region for 
> the MV could only be accessed by the decoder) or hardware limitation.
> 
> It is also not very easy to manage such a large buffer that would change 
> when the resolution changed.

Your argument are just aiming toward the fact that you should not let the user
allocate these in the first place. They should not be bound to the v4l2 buffer.
Allocate these in your driver, and leave to your user the pixel buffer (and
compress meta) allocation work.

Other driver handle this just fine, if your v4l2 driver implement the v4l2
resolution change mechanism, is should be very simple to manage.

> > > 
> > > > > > Signed-off-by: Hsia-Jun(Randy) Li 
> > > > > > ---
> > > > > >drivers/media/v4l2-core/v4l2-common.c | 1 +
> > > > > >drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
> > > > > >include/uapi/linux/videodev2.h| 2 ++
> > > > > >3 files changed, 5 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/media/v4l2-core/v4l2-common.c 
> > > > > > b/drivers/media/v4l2-core/v4l2-common.c
> > > > > > index e0fbe6ba4b6c..f645278b3055 100644
> > > > > > --- a/drivers/media/v4l2-core/v4l2-common.c
> > > > > > +++ b/drivers/media/v4l2-core/v4l2-common.c
> > > > > > @@ -314,6 +314,7 @@ const struct v4l2_format_info 
> > > > > > *v4l2_format_info(u32 format)
> > > > > >   { .format = V4L2_PIX_FMT_SGBRG12,   
> > > > > > .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 
> > > > > > 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > >   { .format = V4L2_PIX_FMT_SGRBG12,   
> > > > > > .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 
> > > > > > 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > >   { .format = V4L2_PIX_FMT_SRGGB12,   
> > > > > > .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 
> > > > > > 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > > +   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
> > > > > > V4L2_P

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-24 Thread Nicolas Dufresne
Le jeudi 18 août 2022 à 14:33 +0800, Hsia-Jun Li a écrit :
> 
> On 8/18/22 14:06, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > Hi Randy,
> > 
> > On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:
> > > 
> > > From: "Hsia-Jun(Randy) Li" 
> > > 
> > > The most of detail has been written in the drm.
> > > Please notice that the tiled formats here request
> > > one more plane for storing the motion vector metadata.
> > > This buffer won't be compressed, so you can't append
> > > it to luma or chroma plane.
> > 
> > Does the motion vector buffer need to be exposed to userspace? Is the
> > decoder stateless (requires userspace to specify the reference frames)
> > or stateful (manages the entire decoding process internally)?
> > 
> No, users don't need to access them at all. Just they need a different 
> dma-heap.
> 
> You would only get the stateful version of both encoder and decoder.

Can't you just allocate and manage these internally in the kernel driver without
adding kernel APIs ? This is notably what Mediatek and (downstream) RPi HEVC
driver do, as it allow reducing quite a lot the memory usage. In Hantro, we bind
them due to HW limitation.

Nicolas

> > Best regards,
> > Tomasz
> > 
> > > 
> > > Signed-off-by: Hsia-Jun(Randy) Li 
> > > ---
> > >   drivers/media/v4l2-core/v4l2-common.c | 1 +
> > >   drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
> > >   include/uapi/linux/videodev2.h| 2 ++
> > >   3 files changed, 5 insertions(+)
> > > 
> > > diff --git a/drivers/media/v4l2-core/v4l2-common.c 
> > > b/drivers/media/v4l2-core/v4l2-common.c
> > > index e0fbe6ba4b6c..f645278b3055 100644
> > > --- a/drivers/media/v4l2-core/v4l2-common.c
> > > +++ b/drivers/media/v4l2-core/v4l2-common.c
> > > @@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 
> > > format)
> > >  { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
> > > V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> > > 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > >  { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
> > > V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> > > 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > >  { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
> > > V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> > > 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > +   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
> > > V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 
> > > 0 }, .hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 
> > > } },
> > >  };
> > >  unsigned int i;
> > > 
> > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
> > > b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > index e6fd355a2e92..8f65964aff08 100644
> > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > @@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc 
> > > *fmt)
> > >  case V4L2_PIX_FMT_MT21C:descr = "Mediatek 
> > > Compressed Format"; break;
> > >  case V4L2_PIX_FMT_QC08C:descr = "QCOM Compressed 
> > > 8-bit Format"; break;
> > >  case V4L2_PIX_FMT_QC10C:descr = "QCOM Compressed 
> > > 10-bit Format"; break;
> > > +   case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics 
> > > Compressed 8-bit tiled Format";break;
> > > +   case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = 
> > > "Synaptics Compressed 10-bit tiled Format";break;
> > >  default:
> > >  if (fmt->description[0])
> > >  return;
> > > diff --git a/include/uapi/linux/videodev2.h 
> > > b/include/uapi/linux/videodev2.h
> > > index 01e630f2ec78..7e928cb69e7c 100644
> > > --- a/include/uapi/linux/videodev2.h
> > > +++ b/include/uapi/linux/videodev2.h
> > > @@ -661,6 +661,8 @@ struct v4l2_pix_format {
> > >   #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12 
> > >  Y/CbCr 4:2:0 16x16 tiles */
> > >   #define V4L2_PIX_FMT_NV12M_8L128  v4l2_fourcc('N', 'A', '1', '2') 
> > > /* Y/CbCr 4:2:0 8x128 tiles */
> > >   #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', 
> > > '2') /* Y/CbCr 4:2:0 10-bit 8x128 tiles */
> > > +#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 12 
> > >  Y/CbCr 4:2:0 tiles */
> > > +#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0')   
> > > /* 12  Y/CbCr 4:2:0 10-bits tiles */
> > > 
> > >   /* Bayer formats - see 
> > > 

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-23 Thread Nicolas Dufresne
Le mardi 23 août 2022 à 15:03 +0800, Hsia-Jun Li a écrit :
> 
> On 8/23/22 14:05, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > On Sat, Aug 20, 2022 at 12:44 AM Hsia-Jun Li  wrote:
> > > 
> > > 
> > > 
> > > On 8/19/22 23:28, Nicolas Dufresne wrote:
> > > > CAUTION: Email originated externally, do not click links or open 
> > > > attachments unless you recognize the sender and know the content is 
> > > > safe.
> > > > 
> > > > 
> > > > Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
> > > > > On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
> > > > > > On 8/18/22 14:06, Tomasz Figa wrote:
> > > > > > > On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li 
> > > > > > >  wrote:
> > > > > > > > 
> > > > > > > > From: "Hsia-Jun(Randy) Li" 
> > > > > > > > 
> > > > > > > > The most of detail has been written in the drm.
> > > > > 
> > > > > This patch still needs a description of the format, which should go to
> > > > > Documentation/userspace-api/media/v4l/.
> > > > > 
> > > > > > > > Please notice that the tiled formats here request
> > > > > > > > one more plane for storing the motion vector metadata.
> > > > > > > > This buffer won't be compressed, so you can't append
> > > > > > > > it to luma or chroma plane.
> > > > > > > 
> > > > > > > Does the motion vector buffer need to be exposed to userspace? Is 
> > > > > > > the
> > > > > > > decoder stateless (requires userspace to specify the reference 
> > > > > > > frames)
> > > > > > > or stateful (manages the entire decoding process internally)?
> > > > > > 
> > > > > > No, users don't need to access them at all. Just they need a 
> > > > > > different
> > > > > > dma-heap.
> > > > > > 
> > > > > > You would only get the stateful version of both encoder and decoder.
> > > > > 
> > > > > Shouldn't the motion vectors be stored in a separate V4L2 buffer,
> > > > > submitted through a different queue then ?
> > > > 
> > > > Imho, I believe these should be invisible to users and pooled 
> > > > separately to
> > > > reduce the overhead. The number of reference is usually lower then the 
> > > > number of
> > > > allocated display buffers.
> > > > 
> > > You can't. The motion vector buffer can't share with the luma and chroma
> > > data planes, nor the data plane for the compression meta data.
> > 
> > I believe what Nicolas is suggesting is to just keep the MV buffer
> > handling completely separate from video buffers. Just keep a map
> > between frame buffer and MV buffer in the driver and use the right
> > buffer when triggering a decode.
> > 
> > > 
> > > You could consider this as a security requirement(the memory region for
> > > the MV could only be accessed by the decoder) or hardware limitation.
> > > 
> > > It is also not very easy to manage such a large buffer that would change
> > > when the resolution changed.
> > 
> > How does it differ from managing additional planes of video buffers?
> I should say I am not against his suggestion if I could make a DMA-heap 
> v4l2 allocator merge into kernel in the future. Although I think we need 
> two heaps here one for the normal video and one for the secure video, I 
> don't have much idea on how to determine whether we are decoding a 
> secure or non-secure video here (The design here is that the kernel 
> didn't know, only hardware and TEE care about that).

Its always nice when "the design" get discussed upstream, so we can raise any
known issues and improve it. Here, not knowing if we are handling secure or non-
secure memory in kernel driver would indeed require external allocation for
everything, and V4L2 does not currently work like this. There is a few use cases
(not all of them might apply to your driver, but they exists).

1. Secondary buffers

When a CODEC is combined with a post-processor, the driver is then responsible
for reference frame allocation. In both known s

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-23 Thread Nicolas Dufresne
Le mardi 23 août 2022 à 15:40 +0800, Hsia-Jun Li a écrit :
> > In current state, If your driver can support it, userland does not strictly
> > need
> > to re-allocate if the resolution is changed to smaller. In most SVC
> > scenarios,
> > the largest resolution is known in advance, so pre-allocation can happen to
> > the
> When you play a video from Youtube, you may notice that starting 
> resolution is low, then after it received more data knowning the 
> bandwidth is enough, it would switch to a higher resolution. I don't 
> think it would inform the codecs2 or OMX there is a higher target 
> resolution.
> 
> Besides, for the case of SVC in a conference system, the remote(gatway) 
> would not tell you there is a higer resolution or frame rate because you 
> can't receive it in negotiate stage, it could be permanently(device 
> capability) or just bandwidth problem. Whether we know there is a higher 
> requirement video depends on the transport protocols used here.
> 
> The basic idea of SVC is that the low layer didn't depends on the upper 
> layer, we can't tell how the bitstream usually.

I'm not saying against the fact the for drivers without IOMMU (hitting directly
into the CMA allocator), allocation latency is massive challenge, and a
mechanism to smoothly reallocate (rather then mass-reallocation) is needed in
the long run. This is what I'm referring to when saying that folks have
considered extending CREATE_BUFS() with a DELETE_BUFS() ioctl.

Note that there is tones of software trickery you can use to mitigate this. The
most simple one is to use CREATE_BUFS() instead of REQBUFS(). Instead of
reallocating all the buffers you need in one go, you would allocate them one by
one. This will distribute allocation latency. For stateful CODEC, most OMX
focused firmware needs to be modified for that, since they stick with the old
OMX spec which did not allow run-time allocation.

Another trick is to use a second codec session. Both stateful/stateless CODEC
have support for concurrent decoding. On the MSE requirement, is that the stream
transition happens only on keyframe boundary. Meaning, there is no need to reuse
the same session, you can create a new decoder in parallel, and that before the
drain is complete (after the event, before the last buffer). This will compress
the "setup" latency, to the cost of some extra memory usage. Specially in the
MSE case, this is nearly always possible since browsers do require support for
more then 1 concurrent decode. This method also works with OMX style CODEC
without any modification.

regards,
Nicolas





Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-22 Thread Nicolas Dufresne
Le samedi 20 août 2022 à 08:10 +0800, Hsia-Jun Li a écrit :
> 
> On 8/20/22 03:17, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > Le vendredi 19 août 2022 à 23:44 +0800, Hsia-Jun Li a écrit :
> > > 
> > > On 8/19/22 23:28, Nicolas Dufresne wrote:
> > > > CAUTION: Email originated externally, do not click links or open 
> > > > attachments unless you recognize the sender and know the content is 
> > > > safe.
> > > > 
> > > > 
> > > > Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
> > > > > On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
> > > > > > On 8/18/22 14:06, Tomasz Figa wrote:
> > > > > > > On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li 
> > > > > > >  wrote:
> > > > > > > > 
> > > > > > > > From: "Hsia-Jun(Randy) Li" 
> > > > > > > > 
> > > > > > > > The most of detail has been written in the drm.
> > > > > 
> > > > > This patch still needs a description of the format, which should go to
> > > > > Documentation/userspace-api/media/v4l/.
> > > > > 
> > > > > > > > Please notice that the tiled formats here request
> > > > > > > > one more plane for storing the motion vector metadata.
> > > > > > > > This buffer won't be compressed, so you can't append
> > > > > > > > it to luma or chroma plane.
> > > > > > > 
> > > > > > > Does the motion vector buffer need to be exposed to userspace? Is 
> > > > > > > the
> > > > > > > decoder stateless (requires userspace to specify the reference 
> > > > > > > frames)
> > > > > > > or stateful (manages the entire decoding process internally)?
> > > > > > 
> > > > > > No, users don't need to access them at all. Just they need a 
> > > > > > different
> > > > > > dma-heap.
> > > > > > 
> > > > > > You would only get the stateful version of both encoder and decoder.
> > > > > 
> > > > > Shouldn't the motion vectors be stored in a separate V4L2 buffer,
> > > > > submitted through a different queue then ?
> > > > 
> > > > Imho, I believe these should be invisible to users and pooled 
> > > > separately to
> > > > reduce the overhead. The number of reference is usually lower then the 
> > > > number of
> > > > allocated display buffers.
> > > > 
> > > You can't. The motion vector buffer can't share with the luma and chroma
> > > data planes, nor the data plane for the compression meta data.
> > > 
> > > You could consider this as a security requirement(the memory region for
> > > the MV could only be accessed by the decoder) or hardware limitation.
> > > 
> > > It is also not very easy to manage such a large buffer that would change
> > > when the resolution changed.
> > 
> > Your argument are just aiming toward the fact that you should not let the 
> > user
> > allocate these in the first place. They should not be bound to the v4l2 
> > buffer.
> > Allocate these in your driver, and leave to your user the pixel buffer (and
> > compress meta) allocation work.
> > 
> What I want to say is that userspace could allocate buffers then make 
> the v4l2 decoder import these buffers, but each planes should come from 
> the right DMA-heaps. Usually the userspace would know better the memory 
> occupation, it would bring some flexibility here.
> 
> Currently, they are another thing bothers me, I need to allocate a small 
> piece of memory(less than 128KiB) as the compression metadata buffers as 
> I mentioned here. And these pieces of memory should be located in a 
> small region, or the performance could be badly hurt, besides, we don't 
> support IOMMU for this kind of data.
> 
> Any idea about assign a small piece of memory from a pre-allocated 
> memory or select region(I don't think I could reserve them in a 
> DMA-heap) for a plane in the MMAP type buffer ?

A V4L2 driver should first implement the V4L2 semantic before adding optional
use case like buffer importation. For this reason, your V4L2 driver should know
all the memory requiremen

Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

2022-08-19 Thread Nicolas Dufresne
Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
> On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
> > On 8/18/22 14:06, Tomasz Figa wrote:
> > > On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li  wrote:
> > > > 
> > > > From: "Hsia-Jun(Randy) Li" 
> > > > 
> > > > The most of detail has been written in the drm.
> 
> This patch still needs a description of the format, which should go to
> Documentation/userspace-api/media/v4l/.
> 
> > > > Please notice that the tiled formats here request
> > > > one more plane for storing the motion vector metadata.
> > > > This buffer won't be compressed, so you can't append
> > > > it to luma or chroma plane.
> > > 
> > > Does the motion vector buffer need to be exposed to userspace? Is the
> > > decoder stateless (requires userspace to specify the reference frames)
> > > or stateful (manages the entire decoding process internally)?
> > 
> > No, users don't need to access them at all. Just they need a different 
> > dma-heap.
> > 
> > You would only get the stateful version of both encoder and decoder.
> 
> Shouldn't the motion vectors be stored in a separate V4L2 buffer,
> submitted through a different queue then ?

Imho, I believe these should be invisible to users and pooled separately to
reduce the overhead. The number of reference is usually lower then the number of
allocated display buffers.

> 
> > > > Signed-off-by: Hsia-Jun(Randy) Li 
> > > > ---
> > > >   drivers/media/v4l2-core/v4l2-common.c | 1 +
> > > >   drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
> > > >   include/uapi/linux/videodev2.h| 2 ++
> > > >   3 files changed, 5 insertions(+)
> > > > 
> > > > diff --git a/drivers/media/v4l2-core/v4l2-common.c 
> > > > b/drivers/media/v4l2-core/v4l2-common.c
> > > > index e0fbe6ba4b6c..f645278b3055 100644
> > > > --- a/drivers/media/v4l2-core/v4l2-common.c
> > > > +++ b/drivers/media/v4l2-core/v4l2-common.c
> > > > @@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 
> > > > format)
> > > >  { .format = V4L2_PIX_FMT_SGBRG12,   .pixel_enc = 
> > > > V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> > > > 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > >  { .format = V4L2_PIX_FMT_SGRBG12,   .pixel_enc = 
> > > > V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> > > > 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > >  { .format = V4L2_PIX_FMT_SRGGB12,   .pixel_enc = 
> > > > V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 
> > > > 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > +   { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = 
> > > > V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 
> > > > 0, 0 }, .hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 
> > > > 128, 128 } },
> > > >  };
> > > >  unsigned int i;
> > > > 
> > > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
> > > > b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > index e6fd355a2e92..8f65964aff08 100644
> > > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > @@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc 
> > > > *fmt)
> > > >  case V4L2_PIX_FMT_MT21C:descr = "Mediatek 
> > > > Compressed Format"; break;
> > > >  case V4L2_PIX_FMT_QC08C:descr = "QCOM 
> > > > Compressed 8-bit Format"; break;
> > > >  case V4L2_PIX_FMT_QC10C:descr = "QCOM 
> > > > Compressed 10-bit Format"; break;
> > > > +   case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics 
> > > > Compressed 8-bit tiled Format";break;
> > > > +   case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = 
> > > > "Synaptics Compressed 10-bit tiled Format";break;
> > > >  default:
> > > >  if (fmt->description[0])
> > > >  return;
> > > > diff --git a/include/uapi/linux/videodev2.h 
> > > > b/include/uapi/linux/videodev2.h
> > > > index 01e630f2ec78..7e928cb69e7c 100644
> > > > --- a/include/uapi/linux/videodev2.h
> > > > +++ b/include/uapi/linux/videodev2.h
> > > > @@ -661,6 +661,8 @@ struct v4l2_pix_format {
> > > >   #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 
> > > > 12  Y/CbCr 4:2:0 16x16 tiles */
> > > >   #define V4L2_PIX_FMT_NV12M_8L128  v4l2_fourcc('N', 'A', '1', '2') 
> > > > /* Y/CbCr 4:2:0 8x128 tiles */
> > > >   #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', 
> > > > '2') /* Y/CbCr 4:2:0 10-bit 8x128 tiles */
> > > > +#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 
> > > > 12  Y/CbCr 4:2:0 tiles */
> > > > +#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0')  
> > > >  /* 12  Y/CbCr 4:2:0 10-bits tiles */
> > > > 
> > > >   /* Bayer formats - see http://www.siliconimaging.com/RGB%20Bayer.htm 
> > 

Re: [EXT] Re: [PATCH 1/3] dma-buf: heaps: add Linaro secure dmabuf heap support

2022-08-19 Thread Nicolas Dufresne
dedicated security rules for DRM.

What you wrote here is about as much as I heard about the new security model
coming in newer chips (this is not NXP specific). I think in order to push
forward designs and APIs, it would be logical to first present about these
mechanism, now they work and how they affect drivers and user space. Its not
clear how this mechanism inforces usage of non-mappable to kernel mmu memory.
Providing Open Source kernel and userland to demonstrate and use this feature is
also very helpful for reviewers and adopters, but also a requirement in the drm
tree.

regards,
Nicolas

>   
>   I'm on vacation until end of this week. I can setup a call next week to 
> discuss this topic if more clarifications are needed.
> 
> Regards.
> 
> -Original Message-
> From: Olivier Masse  
> Sent: Wednesday, August 17, 2022 4:52 PM
> To: nico...@ndufresne.ca; Cyrille Fleury ; 
> brian.star...@arm.com
> Cc: sumit.sem...@linaro.org; linux-ker...@vger.kernel.org; 
> linaro-mm-...@lists.linaro.org; christian.koe...@amd.com; 
> linux-me...@vger.kernel.org; n...@arm.com; Clément Faure 
> ; dri-devel@lists.freedesktop.org; 
> benjamin.gaign...@collabora.com
> Subject: Re: [EXT] Re: [PATCH 1/3] dma-buf: heaps: add Linaro secure dmabuf 
> heap support
> 
> +Cyrille
> 
> Hi Nicolas,
> 
> On mer., 2022-08-17 at 10:29 -0400, Nicolas Dufresne wrote:
> > Caution: EXT Email
> > 
> > Hi Folks,
> > 
> > Le mardi 16 août 2022 à 11:20 +, Olivier Masse a écrit :
> > > Hi Brian,
> > > 
> > > 
> > > On ven., 2022-08-12 at 17:39 +0100, Brian Starkey wrote:
> > > > Caution: EXT Ema
> > > > 
> > 
> > [...]
> > 
> > > > 
> > > > Interesting, that's not how the devices I've worked on operated.
> > > > 
> > > > Are you saying that you have to have a display controller driver 
> > > > running in the TEE to display one of these buffers?
> > > 
> > > In fact the display controller is managing 3 plans : UI, PiP and 
> > > video. The video plan is protected in secure as you can see on slide
> > > 11:
> > > 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstatic.linaro.org%2Fconnect%2Fsan19%2Fpresentations%2Fsan19-107.pdfdata=05%7C01%7Colivier.masse%40nxp.com%7Ce0e00be789a54dff8e5208da805ce2f6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637963433695707516%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=GHjEfbgqRkfHK16oyNaYJob4LRVqvoffRElKR%2F7Rtes%3Dreserved=0
> > 
> > 
> > 
> > just wanted to highlight that all the WPE/GStreamer bit in this 
> > presentation is based on NXP Vendor Media CODEC design, which rely on 
> > their own i.MX VPU API. I don't see any effort to extend this to a 
> > wider audience. It is not explaining how this can work with a mainline 
> > kernel with v4l2 stateful or stateless drivers and generic 
> > GStreamer/FFMPEG/Chromium support.
> 
> Maybe Cyrille can explain what it is currently done at NXP level regarding 
> the integration of v4l2 with NXP VPU.
> 
> > 
> > I'm raising this, since I'm worried that no one cares of solving that 
> > high level problem from a generic point of view. In that context, any 
> > additions to the mainline Linux kernel can only be flawed and will 
> > only serves specific vendors and not the larger audience.
> > 
> > Another aspect, is that this design might be bound to a specific (NXP
> > ?)
> > security design. I've learn recently that newer HW is going to use 
> > multiple level of MMU (like virtual machines do) to protect the memory 
> > rather then marking pages. Will all this work for that too ?
> 
> our fire-walling hardware is protecting memory behind the MMU and so rely on 
> physical memory layout.
> this work is only relying on a reserved physical memory.
> 
> Regards,
> Olivier
> 
> > 
> > regards,
> > Nicolas



Re: [EXT] Re: [PATCH 1/3] dma-buf: heaps: add Linaro secure dmabuf heap support

2022-08-17 Thread Nicolas Dufresne
Hi Folks,

Le mardi 16 août 2022 à 11:20 +, Olivier Masse a écrit :
> Hi Brian,
> 
> 
> On ven., 2022-08-12 at 17:39 +0100, Brian Starkey wrote:
> > Caution: EXT Ema
> > 

[...]

> > 
> > Interesting, that's not how the devices I've worked on operated.
> > 
> > Are you saying that you have to have a display controller driver
> > running in the TEE to display one of these buffers?
> 
> In fact the display controller is managing 3 plans : UI, PiP and
> video. The video plan is protected in secure as you can see on slide
> 11:
> https://static.linaro.org/connect/san19/presentations/san19-107.pdf



just wanted to highlight that all the WPE/GStreamer bit in this presentation is
based on NXP Vendor Media CODEC design, which rely on their own i.MX VPU API. I
don't see any effort to extend this to a wider audience. It is not explaining
how this can work with a mainline kernel with v4l2 stateful or stateless drivers
and generic GStreamer/FFMPEG/Chromium support.

I'm raising this, since I'm worried that no one cares of solving that high level
problem from a generic point of view. In that context, any additions to the
mainline Linux kernel can only be flawed and will only serves specific vendors
and not the larger audience.

Another aspect, is that this design might be bound to a specific (NXP ?)
security design. I've learn recently that newer HW is going to use multiple
level of MMU (like virtual machines do) to protect the memory rather then
marking pages. Will all this work for that too ?

regards,
Nicolas


Re: [PATCH 3/5] dma-buf: heaps: add Linaro secure dmabuf heap support

2022-08-16 Thread Nicolas Dufresne
Hi,

Le mardi 02 août 2022 à 11:58 +0200, Olivier Masse a écrit :
> add Linaro secure heap bindings: linaro,secure-heap

Just a curiosity, how is this specific to Linaro OPTEE OS ? Shouldn't it be "de-
linaro-ified" somehow ?

regards,
Nicolas

> use genalloc to allocate/free buffer from buffer pool.
> buffer pool info is from dts.
> use sg_table instore the allocated memory info, the length of sg_table is 1.
> implement secure_heap_buf_ops to implement buffer share in difference device:
> 1. Userspace passes this fd to all drivers it wants this buffer
> to share with: First the filedescriptor is converted to a _buf using
> dma_buf_get(). Then the buffer is attached to the device using 
> dma_buf_attach().
> 2. Once the buffer is attached to all devices userspace can initiate DMA
> access to the shared buffer. In the kernel this is done by calling 
> dma_buf_map_attachment()
> 3. get sg_table with dma_buf_map_attachment in difference device.
> 
> Signed-off-by: Olivier Masse 
> ---
>  drivers/dma-buf/heaps/Kconfig   |  21 +-
>  drivers/dma-buf/heaps/Makefile  |   1 +
>  drivers/dma-buf/heaps/secure_heap.c | 588 
>  3 files changed, 606 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/dma-buf/heaps/secure_heap.c
> 
> diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
> index 6a33193a7b3e..b2406932192e 100644
> --- a/drivers/dma-buf/heaps/Kconfig
> +++ b/drivers/dma-buf/heaps/Kconfig
> @@ -1,8 +1,12 @@
> -config DMABUF_HEAPS_DEFERRED_FREE
> - tristate
> +menuconfig DMABUF_HEAPS_DEFERRED_FREE
> + bool "DMA-BUF heaps deferred-free library"
> + help
> +   Choose this option to enable the DMA-BUF heaps deferred-free library.
>  
> -config DMABUF_HEAPS_PAGE_POOL
> - tristate
> +menuconfig DMABUF_HEAPS_PAGE_POOL
> + bool "DMA-BUF heaps page-pool library"
> + help
> +   Choose this option to enable the DMA-BUF heaps page-pool library.
>  
>  config DMABUF_HEAPS_SYSTEM
>   bool "DMA-BUF System Heap"
> @@ -26,3 +30,12 @@ config DMABUF_HEAPS_DSP
>Choose this option to enable the dsp dmabuf heap. The dsp heap
>is allocated by gen allocater. it's allocated according the dts.
>If in doubt, say Y.
> +
> +config DMABUF_HEAPS_SECURE
> + tristate "DMA-BUF Secure Heap"
> + depends on DMABUF_HEAPS && DMABUF_HEAPS_DEFERRED_FREE
> + help
> +   Choose this option to enable the secure dmabuf heap. The secure heap
> +   pools are defined according to the DT. Heaps are allocated
> +   in the pools using gen allocater.
> +   If in doubt, say Y.
> diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
> index e70722ea615e..08f6aa5919d1 100644
> --- a/drivers/dma-buf/heaps/Makefile
> +++ b/drivers/dma-buf/heaps/Makefile
> @@ -4,3 +4,4 @@ obj-$(CONFIG_DMABUF_HEAPS_PAGE_POOL)  += page_pool.o
>  obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)+= system_heap.o
>  obj-$(CONFIG_DMABUF_HEAPS_CMA)   += cma_heap.o
>  obj-$(CONFIG_DMABUF_HEAPS_DSP)  += dsp_heap.o
> +obj-$(CONFIG_DMABUF_HEAPS_SECURE)+= secure_heap.o
> diff --git a/drivers/dma-buf/heaps/secure_heap.c 
> b/drivers/dma-buf/heaps/secure_heap.c
> new file mode 100644
> index ..31aac5d050b4
> --- /dev/null
> +++ b/drivers/dma-buf/heaps/secure_heap.c
> @@ -0,0 +1,588 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * DMABUF secure heap exporter
> + *
> + * Copyright 2021 NXP.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "deferred-free-helper.h"
> +#include "page_pool.h"
> +
> +#define MAX_SECURE_HEAP 2
> +#define MAX_HEAP_NAME_LEN 32
> +
> +struct secure_heap_buffer {
> + struct dma_heap *heap;
> + struct list_head attachments;
> + struct mutex lock;
> + unsigned long len;
> + struct sg_table sg_table;
> + int vmap_cnt;
> + struct deferred_freelist_item deferred_free;
> + void *vaddr;
> + bool uncached;
> +};
> +
> +struct dma_heap_attachment {
> + struct device *dev;
> + struct sg_table *table;
> + struct list_head list;
> + bool no_map;
> + bool mapped;
> + bool uncached;
> +};
> +
> +struct secure_heap_info {
> + struct gen_pool *pool;
> +
> + bool no_map;
> +};
> +
> +struct rmem_secure {
> + phys_addr_t base;
> + phys_addr_t size;
> +
> + char name[MAX_HEAP_NAME_LEN];
> +
> + bool no_map;
> +};
> +
> +static struct rmem_secure secure_data[MAX_SECURE_HEAP] = {0};
> +static unsigned int secure_data_count;
> +
> +static struct sg_table *dup_sg_table(struct sg_table *table)
> +{
> + struct sg_table *new_table;
> + int ret, i;
> + struct scatterlist *sg, *new_sg;
> +
> + new_table = kzalloc(sizeof(*new_table), GFP_KERNEL);
> + if (!new_table)
> + return ERR_PTR(-ENOMEM);
> +
> + ret = 

Re: DMA-buf and uncached system memory

2022-06-27 Thread Nicolas Dufresne
Le lundi 27 juin 2022 à 16:06 +0200, Lucas Stach a écrit :
> Am Montag, dem 27.06.2022 um 09:54 -0400 schrieb Nicolas Dufresne:
> > Le jeudi 23 juin 2022 à 11:33 +0200, Lucas Stach a écrit :
> > > > 
> > > > See for example on AMD/Intel hardware most of the engines can perfectly 
> > > > deal with cache coherent memory accesses. Only the display engines 
> > > > can't.
> > > > 
> > > > So on import time we can't even say if the access can be coherent and 
> > > > snoop the CPU cache or not because we don't know how the imported 
> > > > DMA-buf will be used later on.
> > > > 
> > > So for those mixed use cases, wouldn't it help to have something
> > > similar to the dma_sync in the DMA-buf API, so your scanout usage can
> > > tell the exporter that it's going to do non-snoop access and any dirty
> > > cache lines must be cleaned? Signaling this to the exporter would allow
> > > to skip the cache maintenance if the buffer is in CPU uncached memory,
> > > which again is a default case for the ARM SoC world.
> > 
> > Telling the exporter for every scan is unneeded overhead. If that 
> > information is
> > made available "properly", then tracking it in attach/detach is sufficient 
> > and
> > lightweight.
> 
> That isn't sufficient. The AMD GPU is a single device, but internally
> has different engines that have different capabilities with regard to
> snooping the caches. So you will likely end up with needing the cache
> clean if the V4L2 buffer is going directly to scanout, which doesn't
> snoop, but if the usage changes to sampling you don't need any cache
> flushes.
> 
> Also I don't see a big overhead when comparing a kernel internal call
> that tells the exporter that the importer is going to access the buffer
> without snooping and thus needs the cache clean once every frame and
> the need to always clean the cache before DQBUF when a potentially non-
> snooping importer is attached.

Ack, thanks for the information.

> 
> Regards,
> Lucas
> 



Re: DMA-buf and uncached system memory

2022-06-27 Thread Nicolas Dufresne
Le jeudi 23 juin 2022 à 11:33 +0200, Lucas Stach a écrit :
> > 
> > See for example on AMD/Intel hardware most of the engines can perfectly 
> > deal with cache coherent memory accesses. Only the display engines can't.
> > 
> > So on import time we can't even say if the access can be coherent and 
> > snoop the CPU cache or not because we don't know how the imported 
> > DMA-buf will be used later on.
> > 
> So for those mixed use cases, wouldn't it help to have something
> similar to the dma_sync in the DMA-buf API, so your scanout usage can
> tell the exporter that it's going to do non-snoop access and any dirty
> cache lines must be cleaned? Signaling this to the exporter would allow
> to skip the cache maintenance if the buffer is in CPU uncached memory,
> which again is a default case for the ARM SoC world.

Telling the exporter for every scan is unneeded overhead. If that information is
made available "properly", then tracking it in attach/detach is sufficient and
lightweight.

Nicolas



Re: DMA-buf and uncached system memory

2022-06-27 Thread Nicolas Dufresne
Hi,

Le jeudi 23 juin 2022 à 10:58 +0200, Lucas Stach a écrit :
> > > In the DMA API keeping things mapped is also a valid use-case, but then
> > > you need to do explicit domain transfers via the dma_sync_* family,
> > > which DMA-buf has not inherited. Again those sync are no-ops on cache
> > > coherent architectures, but do any necessary cache maintenance on non
> > > coherent arches.
> > 
> > Correct, yes. Coherency is mandatory for DMA-buf, you can't use 
> > dma_sync_* on it when you are the importer.
> > 
> > The exporter could of course make use of that because he is the owner of 
> > the buffer.
> 
> In the example given here with UVC video, you don't know that the
> buffer will be exported and needs to be coherent without
> synchronization points, due to the mapping cache at the DRM side. So
> V4L2 naturally allocates the buffers from CPU cached memory. If the
> expectation is that those buffers are device coherent without relying
> on the map/unmap_attachment calls, then V4L2 needs to always
> synchronize caches on DQBUF when the  buffer is allocated from CPU
> cached memory and a single DMA-buf attachment exists. And while writing
> this I realize that this is probably exactly what V4L2 should do...

I'm not sure we are making any progress here. Doing so will just regress
performance of coherent devices used to render UVC video feeds. In fact, they
are all coherent except the display controller (on Intel). What my colleague was
suggesting me to try (with the expectation that some adaptation will be needed,
perhaps new signalling flags), is to read the dma_coherency_mask values on the
devices that calls attach() and adapt v4l2 exporter accordingly.

Its likely wrong as-is, not intended to be used for that, but the value is that
it tries to fix the problem, unlike what I'm reading here.

Nicolas



Re: DMA-buf and uncached system memory

2022-06-22 Thread Nicolas Dufresne
Le mardi 16 février 2021 à 10:25 +0100, Daniel Vetter a écrit :
> So I think if AMD also guarantees to drop clean cachelines just do the
> same thing we do right now for intel integrated + discrete amd, but in
> reserve. It's fragile, but it does work.

Sorry to disrupt, but if you pass V4L2 vmalloc data to Intel display driver, you
also get nice dirt on the screen. If you have a UVC webcam that produces a pixel
format compatible with your display, you can reproduce the issue quite easily
with:

  gst-launch-1.0 v4l2src device=/dev/video0 ! kmssink

p.s. some frame-rate are less likely to exhibit the issue, make sure you create
movement to see it.

The only solution I could think of (not implemented) was to detect in the
attach() call what the importers can do (with dev->coherent_dma_mask if I
recall), and otherwise flush the cache immediately and start flushing the cache
from now on signalling it for DQBUF (in vb2 workqueue or dqbuf ioctl, I don't
have an idea yet). I bet this idea is inapplicable to were you have fences, we
don't have that in v4l2.

This idea was hinted by Robert Becket (now in CC), but perhaps I picked it up
wrong, explaining it wrong, etc. I'm no expert, just noticed there wasn't really
a good plan for that, so one needs to make one up. I'm not aware oh an importer
could know how the memory was allocated by the exporter, and worst, how an
importer could figure-out that the export is going to produce buffer with hot
CPU cache (UVC driver does memcpy from USB chunks of variable size to produce a
fixed size image).

Nicolas


Re: DMA-buf and uncached system memory

2022-06-21 Thread Nicolas Dufresne
Hi Christian and Andy,

Le mardi 21 juin 2022 à 12:34 +0200, Christian König a écrit :
>  Hi Andy,
>  
>  Am 21.06.22 um 12:17 schrieb Andy.Hsieh:
>  
> > On 2/16/21 4:39 AM, Nicolas Dufresne wrote:
> > > Le lundi 15 février 2021 à 09:58 +0100, Christian König a écrit :
> > > > Hi guys,
> > > > 
> > > > we are currently working an Freesync and direct scan out from system 
> > > > memory on AMD APUs in A+A laptops.
> > > > 
> > > > On problem we stumbled over is that our display hardware needs to scan 
> > > > out from uncached system memory and we currently don't have a way to 
> > > > communicate that through DMA-buf.
> > > > 
> > > > For our specific use case at hand we are going to implement something 
> > > > driver specific, but the question is should we have something more 
> > > > generic for this?
> > > 
> > > Hopefully I'm getting this right, but this makes me think of a long
> > > standing
> > > issue I've met with Intel DRM and UVC driver. If I let the UVC driver
> > > allocate
> > > the buffer, and import the resulting DMABuf (cacheable memory written with
> > > a cpu
> > > copy in the kernel) into DRM, we can see cache artifact being displayed.
> > > While
> > > if I use the DRM driver memory (dumb buffer in that case) it's clean
> > > because
> > > there is a driver specific solution to that.
> > > 
> > > There is no obvious way for userspace application to know what's is
> > > right/wrong
> > > way and in fact it feels like the kernel could solve this somehow without
> > > having
> > > to inform userspace (perhaps).
> > > 
> > > > 
> > > > After all the system memory access pattern is a PCIe extension and as 
> > > > such something generic.
> > > > 
> > > > Regards,
> > > > Christian.
> > > 
> > > 
> > 
> > Hi All,
> > 
> > We also encountered the UVC cache issue on ARMv8 CPU in Mediatek SoC when
> > using UVC dmabuf-export and feeding the dmabuf to the DRM display by the
> > following GStreamer command:
> > 
> > # gst-launch-1.0 v4l2src device=/dev/video0 io-mode=dmabuf ! kmssink
> > 
> > UVC driver uses videobuf2-vmalloc to allocate buffers and is able to export
> > them as dmabuf. But UVC uses memcpy() to fill the frame buffer by CPU
> > without
> > flushing the cache. So if the display hardware directly uses the buffer, the
> > image shown on the screen will be dirty.
> > 
> > Here are some experiments:
> > 
> > 1. By doing some memory operations (e.g. devmem) when streaming the UVC,
> >    the issue is mitigated. I guess the cache is swapped rapidly.
> > 2. By replacing the memcpy() with memcpy_flushcache() in the UVC driver,
> >    the issue disappears.
> > 3. By adding .finish callback in videobuf2-vmalloc.c to flush the cache
> >    before returning the buffer, the issue disappears.
> > 
> > It seems to lack a cache flush stage in either UVC or Display. We may also
> > need communication between the producer and consumer. Then, they can decide
> > who is responsible for the flushing to avoid flushing cache unconditionally
> > leading to the performance impact.
>  
>  Well, that's not what this mail thread was all about.
>  
>  The issue you are facing is that somebody is forgetting to flush caches, but
> the issue discussed in this thread here is that we have hardware which
> bypasses caches altogether.
>  
>  As far as I can see in your case UVC just allocates normal cached system
> memory through videobuf2-vmalloc() and it is perfectly valid to fill that
> using memcpy().
>  
>  If some hardware then accesses those buffers bypassing CPU caches then it is
> the responsibility of the importing driver and/or DMA subsystem to flush the
> caches accordingly.

I've tracked this down to videobuf2-vmalloc.c failing to look for coherency
during "attach()". It is also missing begin_/end access implementation for the
case it get attached to a non-coherent device. Seems fixable though, but "I'm
far from an expert", but more some random person reading code and comments.

regards,
Nicolas

>  
>  Regards,
>  Christian.
>  
>  
> > 
> > Regards,
> > Andy Hsieh
> > 
> > * MEDIATEK Confidentiality Notice 
> > The information contained in this e-mail message (including any 
> > attachments) may be confidential, proprietary, privileged, or otherwise
> > exempt from disclosure under applicable laws. It is intended to be 
> > conveyed only to the designated recipient(s). Any use, dissemination, 
> > distribution, printing, retaining or copying of this e-mail (including its 
> > attachments) by unintended recipient(s) is strictly prohibited and may 
> > be unlawful. If you are not an intended recipient of this e-mail, or believe
> > that you have received this e-mail in error, please notify the sender 
> > immediately (by replying to this e-mail), delete any and all copies of 
> > this e-mail (including any attachments) from your system, and do not
> > disclose the content of this e-mail to any other person. Thank you!
>  
>  



Re: [PATCH v7, 04/15] media: mtk-vcodec: Read max resolution from dec_capability

2022-06-21 Thread Nicolas Dufresne
Le vendredi 17 juin 2022 à 14:46 +0800, Chen-Yu Tsai a écrit :
> Hi,
> 
> On Mon, Feb 28, 2022 at 04:29:15PM -0500, Nicolas Dufresne wrote:
> > Hi Yunfei,
> > 
> > this patch does not work unless userland calls enum_framesizes, which is
> > completely optional. See comment and suggestion below.
> > 
> > Le mercredi 23 février 2022 à 11:39 +0800, Yunfei Dong a écrit :
> > > Supported max resolution for different platforms are not the same: 2K
> > > or 4K, getting it according to dec_capability.
> > > 
> > > Signed-off-by: Yunfei Dong 
> > > Reviewed-by: Tzung-Bi Shih
> > > ---
> > >  .../platform/mtk-vcodec/mtk_vcodec_dec.c  | 29 +++
> > >  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |  4 +++
> > >  2 files changed, 21 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c 
> > > b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> > > index 130ecef2e766..304f5afbd419 100644
> > > --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> > > +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> > > @@ -445,7 +447,7 @@ static int vidioc_vdec_s_fmt(struct file *file, void 
> > > *priv,
> > >   return -EINVAL;
> > >  
> > >   q_data->fmt = fmt;
> > > - vidioc_try_fmt(f, q_data->fmt);
> > > + vidioc_try_fmt(ctx, f, q_data->fmt);
> > >   if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> > >   q_data->sizeimage[0] = pix_mp->plane_fmt[0].sizeimage;
> > >   q_data->coded_width = pix_mp->width;
> > > @@ -545,6 +547,9 @@ static int vidioc_enum_framesizes(struct file *file, 
> > > void *priv,
> > >   fsize->stepwise.min_height,
> > >   fsize->stepwise.max_height,
> > >   fsize->stepwise.step_height);
> > > +
> > > + ctx->max_width = fsize->stepwise.max_width;
> > > + ctx->max_height = fsize->stepwise.max_height;
> > 
> > The spec does not require calling enum_fmt, so changing the maximum here is
> > incorrect (and fail with GStreamer). If userland never enum the framesizes, 
> > the
> > resolution get limited to 1080p.
> > 
> > As this only depends and the OUTPUT format and the device being open()
> > (condition being dev_capability being set and OUTPUT format being known / 
> > not
> > VP8), you could initialize the cxt max inside s_fmt(OUTPUT) instead, which 
> > is a
> > mandatory call. I have tested this change to verify this:
> > 
> > 
> > diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c 
> > b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> > index 044e3dfbdd8c..3e7c571526a4 100644
> > --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> > +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> > @@ -484,6 +484,14 @@ static int vidioc_vdec_s_fmt(struct file *file, void 
> > *priv,
> > if (fmt == NULL)
> > return -EINVAL;
> >  
> > +   if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE &&
> > +   !(ctx->dev->dec_capability & VCODEC_CAPABILITY_4K_DISABLED) &&
> > +   fmt->fourcc != V4L2_PIX_FMT_VP8_FRAME) {
> > +   mtk_v4l2_debug(3, "4K is enabled");
> > +   ctx->max_width = VCODEC_DEC_4K_CODED_WIDTH;
> > +   ctx->max_height = VCODEC_DEC_4K_CODED_HEIGHT;
> > +   }
> > +
> > q_data->fmt = fmt;
> > vidioc_try_fmt(ctx, f, q_data->fmt);
> > if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> > @@ -574,15 +582,9 @@ static int vidioc_enum_framesizes(struct file *file, 
> > void *priv,
> >  
> > fsize->type = V4L2_FRMSIZE_TYPE_STEPWISE;
> > fsize->stepwise = dec_pdata->vdec_framesizes[i].stepwise;
> > -   if (!(ctx->dev->dec_capability &
> > -   VCODEC_CAPABILITY_4K_DISABLED) &&
> > -   fsize->pixel_format != V4L2_PIX_FMT_VP8_FRAME) {
> > -   mtk_v4l2_debug(3, "4K is enabled");
> > -   fsize->stepwise.max_width =
> > -   VCODEC_DEC_4K_CODED_WIDTH;
> > -   fsize->stepwise.max_height =
> > -   VCODEC_DEC_4K_CODED_HEIGHT;
> > -   }
> > +   fsize->stepwis

Re: [PATCH v4, 0/3] add h264 decoder driver for mt8186

2022-06-21 Thread Nicolas Dufresne
Le mercredi 15 juin 2022 à 19:37 +0800, yunfei.d...@mediatek.com a écrit :
> Hi Nicolas,
> 
> Thanks for your comments.
> On Mon, 2022-06-13 at 16:10 -0400, Nicolas Dufresne wrote:
> > > 
> 
> > Le jeudi 12 mai 2022 à 11:46 +0800, Yunfei Dong a écrit :
> > > Firstly, add mt8186 compatible and private data, then add document
> > > for
> > > compatible "mediatek,mt8186-vcodec-dec". For mt8186 is single core
> > > architecture, need to add new interface for h264 hardware decoder.
> > 
> > Would be nice to take the habit of sharing fluster score for this new
> > HW, I
> > would expect no less then what the numbers you'd get from running
> > over MT8195 or
> > 92, remains nice to demonstrate that this was tested and document any
> > oops along
> > the way.
> For we don't setup mt8186 fluster test environment. So not to run
> fluster in mt8186. According to our plan, we will do fluster test for
> every project begin from mt8188.
> 
> When I'm free, we continue to setup the fluster test environment for
> mt8186.

I may be able to help here then if needed. Just let me know. Meanwhile, it seems
a bit early to consider merging these patches as they seems to lack the level of
testing we'd normally expect for non-staging driver changes.

regards,
Nicolas

> 
> Thanks,
> Yunfei Dong
> > > Patche 1 add mt8186 compatible and private data.
> > > Patche 2 add mt8186 compatible document.
> > > Patche 3 add h264 single core driver.
> > > ---
> > > This patch depends on "support for MT8192 decoder"[1]
> > > 
> > > [1]  
> > > https://patchwork.kernel.org/project/linux-mediatek/cover/20220512021950.29087-1-yunfei.d...@mediatek.com/
> > > ---
> > > changed with v3:
> > > - fix __iomem not reasonable, align share memory to dram.
> > > changed with v2:
> > > - fix sparse and smatch check fail for patch 3
> > > changed with v1:
> > > - rebase driver to the latest media_stage.
> > > ---
> > > Yunfei Dong (3):
> > >   dt-bindings: media: mediatek: vcodec: Adds decoder dt-bindings
> > > for
> > > mt8186
> > >   media: mediatek: vcodec: Support MT8186
> > >   media: mediatek: vcodec: add h264 decoder driver for mt8186
> > > 
> > >  .../media/mediatek,vcodec-subdev-decoder.yaml |   4 +-
> > >  .../platform/mediatek/vcodec/mtk_vcodec_dec.h |   1 +
> > >  .../mediatek/vcodec/mtk_vcodec_dec_drv.c  |   4 +
> > >  .../vcodec/mtk_vcodec_dec_stateless.c |  19 ++
> > >  .../vcodec/vdec/vdec_h264_req_multi_if.c  | 177
> > > +-
> > >  5 files changed, 203 insertions(+), 2 deletions(-)
> > > 
> > 
> > 
> 



Re: [PATCH v4, 3/3] media: mediatek: vcodec: add h264 decoder driver for mt8186

2022-06-21 Thread Nicolas Dufresne
Le mercredi 15 juin 2022 à 19:33 +0800, yunfei.d...@mediatek.com a écrit :
> Hi Nicolas,
> 
> Thanks for your comments.
> On Mon, 2022-06-13 at 16:08 -0400, Nicolas Dufresne wrote:
> > Le jeudi 12 mai 2022 à 11:46 +0800, Yunfei Dong a écrit :
> > > Add h264 decode driver to support mt8186. For the architecture
> > > is single core, need to add new interface to decode.
> > > 
> > > Signed-off-by: Yunfei Dong 
> > > ---
> > >  .../vcodec/vdec/vdec_h264_req_multi_if.c  | 177
> > > +-
> > >  1 file changed, 176 insertions(+), 1 deletion(-)
> > > 
> > > diff --git
> > > a/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_i
> > > f.c
> > > b/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_i
> > > f.c
> > > index a96f203b5d54..1d9e753cf894 100644
> > > ---
> > > a/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_i
> > > f.c
> > > +++
> > > b/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_i
> > > f.c
> > > @@ -140,6 +140,9 @@ struct vdec_h264_slice_share_info {
> > >   * @vsi: vsi used for lat
> > >   * @vsi_core:vsi used for core
> > >   *
> > > + * @vsi_ctx: Local VSI data for this decoding
> > > context
> > > + * @h264_slice_param:the parameters that hardware use to
> > > decode
> > > + *
> > >   * @resolution_changed:resolution changed
> > >   * @realloc_mv_buf:  reallocate mv buffer
> > >   * @cap_num_planes:  number of capture queue plane
> > > @@ -157,6 +160,9 @@ struct vdec_h264_slice_inst {
> > >   struct vdec_h264_slice_vsi *vsi;
> > >   struct vdec_h264_slice_vsi *vsi_core;
> > >  
> > > + struct vdec_h264_slice_vsi vsi_ctx;
> > > + struct vdec_h264_slice_lat_dec_param h264_slice_param;
> > > +
> > >   unsigned int resolution_changed;
> > >   unsigned int realloc_mv_buf;
> > >   unsigned int cap_num_planes;
> > > @@ -208,6 +214,61 @@ static int
> > > vdec_h264_slice_fill_decode_parameters(struct vdec_h264_slice_inst
> > > *i
> > >   return 0;
> > >  }
> > >  
> > > +static int get_vdec_sig_decode_parameters(struct
> > > vdec_h264_slice_inst *inst)
> > > +{
> > > + const struct v4l2_ctrl_h264_decode_params *dec_params;
> > > + const struct v4l2_ctrl_h264_sps *sps;
> > > + const struct v4l2_ctrl_h264_pps *pps;
> > > + const struct v4l2_ctrl_h264_scaling_matrix *scaling_matrix;
> > > + struct vdec_h264_slice_lat_dec_param *slice_param = 
> > > > h264_slice_param;
> > > + struct v4l2_h264_reflist_builder reflist_builder;
> > > + u8 *p0_reflist = slice_param->decode_params.ref_pic_list_p0;
> > > + u8 *b0_reflist = slice_param->decode_params.ref_pic_list_b0;
> > > + u8 *b1_reflist = slice_param->decode_params.ref_pic_list_b1;
> > > +
> > > + dec_params =
> > > + mtk_vdec_h264_get_ctrl_ptr(inst->ctx,
> > > V4L2_CID_STATELESS_H264_DECODE_PARAMS);
> > > + if (IS_ERR(dec_params))
> > > + return PTR_ERR(dec_params);
> > > +
> > > + sps = mtk_vdec_h264_get_ctrl_ptr(inst->ctx,
> > > V4L2_CID_STATELESS_H264_SPS);
> > > + if (IS_ERR(sps))
> > > + return PTR_ERR(sps);
> > > +
> > > + pps = mtk_vdec_h264_get_ctrl_ptr(inst->ctx,
> > > V4L2_CID_STATELESS_H264_PPS);
> > > + if (IS_ERR(pps))
> > > + return PTR_ERR(pps);
> > > +
> > > + scaling_matrix =
> > > + mtk_vdec_h264_get_ctrl_ptr(inst->ctx,
> > > V4L2_CID_STATELESS_H264_SCALING_MATRIX);
> > > + if (IS_ERR(scaling_matrix))
> > > + return PTR_ERR(scaling_matrix);
> > > +
> > > + mtk_vdec_h264_update_dpb(dec_params, inst->dpb);
> > > +
> > > + mtk_vdec_h264_copy_sps_params(_param->sps, sps);
> > > + mtk_vdec_h264_copy_pps_params(_param->pps, pps);
> > > + mtk_vdec_h264_copy_scaling_matrix(_param->scaling_matrix, 
> > > scaling_matrix);
> > > +
> > > + mtk_vdec_h264_copy_decode_params(_param->decode_params,
> > > dec_params, inst->dpb);
> > > + mtk_vdec_h264_fill_dpb_info(inst->ctx, _param-
> > > > decode_params,
> > > + slice_param->h264_dpb_info);
> > > +
> > > + /* Build the reference lists */
> > > + v4l2_h264_init_re

Re: [PATCH] media: mediatek: vcodec: Initialize decoder parameters after getting dec_capability

2022-06-21 Thread Nicolas Dufresne
Hi Yunfei,

Le samedi 18 juin 2022 à 15:29 +0800, Yunfei Dong a écrit :
> Need to get dec_capability from scp first, then initialize decoder
> supported format and other parameters according to dec_capability value.

Perhaps something to improve in the future. On top of describing the fix, it
could be useful to describe what issues is being fixed, and which platform will
benefit.

> 
> Signed-off-by: Yunfei Dong 

To add to this, this looks like a bug fix, can you relate it to an original
commit and add a Fixes: tag here ?

regards,
Nicolas

> ---
>  drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c | 2 --
>  drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c | 2 ++
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c 
> b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
> index 1465ddff1c6b..41589470da32 100644
> --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
> +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
> @@ -112,8 +112,6 @@ void mtk_vcodec_dec_set_default_params(struct 
> mtk_vcodec_ctx *ctx)
>  {
>   struct mtk_q_data *q_data;
>  
> - ctx->dev->vdec_pdata->init_vdec_params(ctx);
> -
>   ctx->m2m_ctx->q_lock = >dev->dev_mutex;
>   ctx->fh.m2m_ctx = ctx->m2m_ctx;
>   ctx->fh.ctrl_handler = >ctrl_hdl;
> diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c 
> b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c
> index 4103d7c1b638..99d7b15f2b9d 100644
> --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c
> +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_drv.c
> @@ -208,6 +208,8 @@ static int fops_vcodec_open(struct file *file)
>  
>   dev->dec_capability =
>   mtk_vcodec_fw_get_vdec_capa(dev->fw_handler);
> + ctx->dev->vdec_pdata->init_vdec_params(ctx);
> +
>   mtk_v4l2_debug(0, "decoder capability %x", dev->dec_capability);
>   }
>  



Re: [PATCH v4, 0/3] add h264 decoder driver for mt8186

2022-06-14 Thread Nicolas Dufresne
Le lundi 13 juin 2022 à 16:10 -0400, Nicolas Dufresne a écrit :
> Le jeudi 12 mai 2022 à 11:46 +0800, Yunfei Dong a écrit :
> > Firstly, add mt8186 compatible and private data, then add document for
> > compatible "mediatek,mt8186-vcodec-dec". For mt8186 is single core
> > architecture, need to add new interface for h264 hardware decoder.
> 
> Would be nice to take the habit of sharing fluster score for this new HW, I
> would expect no less then what the numbers you'd get from running over MT8195 
> or
> 92, remains nice to demonstrate that this was tested and document any oops 
> along
> the way.
> > 
> > Patche 1 add mt8186 compatible and private data.
> > Patche 2 add mt8186 compatible document.
> > Patche 3 add h264 single core driver.
> > ---
> > This patch depends on "support for MT8192 decoder"[1]
> > 
> > [1]  
> > https://patchwork.kernel.org/project/linux-mediatek/cover/20220512021950.29087-1-yunfei.d...@mediatek.com/

I forgot earlier, but I suppose this will also depends on an scp.img firmware ?
If so, any linux-firmware submission to link to ?

> > ---
> > changed with v3:
> > - fix __iomem not reasonable, align share memory to dram.
> > changed with v2:
> > - fix sparse and smatch check fail for patch 3
> > changed with v1:
> > - rebase driver to the latest media_stage.
> > ---
> > Yunfei Dong (3):
> >   dt-bindings: media: mediatek: vcodec: Adds decoder dt-bindings for
> > mt8186
> >   media: mediatek: vcodec: Support MT8186
> >   media: mediatek: vcodec: add h264 decoder driver for mt8186
> > 
> >  .../media/mediatek,vcodec-subdev-decoder.yaml |   4 +-
> >  .../platform/mediatek/vcodec/mtk_vcodec_dec.h |   1 +
> >  .../mediatek/vcodec/mtk_vcodec_dec_drv.c  |   4 +
> >  .../vcodec/mtk_vcodec_dec_stateless.c |  19 ++
> >  .../vcodec/vdec/vdec_h264_req_multi_if.c  | 177 +-
> >  5 files changed, 203 insertions(+), 2 deletions(-)
> > 
> 



Re: [PATCH v4, 0/3] add h264 decoder driver for mt8186

2022-06-13 Thread Nicolas Dufresne
Le jeudi 12 mai 2022 à 11:46 +0800, Yunfei Dong a écrit :
> Firstly, add mt8186 compatible and private data, then add document for
> compatible "mediatek,mt8186-vcodec-dec". For mt8186 is single core
> architecture, need to add new interface for h264 hardware decoder.

Would be nice to take the habit of sharing fluster score for this new HW, I
would expect no less then what the numbers you'd get from running over MT8195 or
92, remains nice to demonstrate that this was tested and document any oops along
the way.
> 
> Patche 1 add mt8186 compatible and private data.
> Patche 2 add mt8186 compatible document.
> Patche 3 add h264 single core driver.
> ---
> This patch depends on "support for MT8192 decoder"[1]
> 
> [1]  
> https://patchwork.kernel.org/project/linux-mediatek/cover/20220512021950.29087-1-yunfei.d...@mediatek.com/
> ---
> changed with v3:
> - fix __iomem not reasonable, align share memory to dram.
> changed with v2:
> - fix sparse and smatch check fail for patch 3
> changed with v1:
> - rebase driver to the latest media_stage.
> ---
> Yunfei Dong (3):
>   dt-bindings: media: mediatek: vcodec: Adds decoder dt-bindings for
> mt8186
>   media: mediatek: vcodec: Support MT8186
>   media: mediatek: vcodec: add h264 decoder driver for mt8186
> 
>  .../media/mediatek,vcodec-subdev-decoder.yaml |   4 +-
>  .../platform/mediatek/vcodec/mtk_vcodec_dec.h |   1 +
>  .../mediatek/vcodec/mtk_vcodec_dec_drv.c  |   4 +
>  .../vcodec/mtk_vcodec_dec_stateless.c |  19 ++
>  .../vcodec/vdec/vdec_h264_req_multi_if.c  | 177 +-
>  5 files changed, 203 insertions(+), 2 deletions(-)
> 



Re: [PATCH v4, 3/3] media: mediatek: vcodec: add h264 decoder driver for mt8186

2022-06-13 Thread Nicolas Dufresne
Le jeudi 12 mai 2022 à 11:46 +0800, Yunfei Dong a écrit :
> Add h264 decode driver to support mt8186. For the architecture
> is single core, need to add new interface to decode.
> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../vcodec/vdec/vdec_h264_req_multi_if.c  | 177 +-
>  1 file changed, 176 insertions(+), 1 deletion(-)
> 
> diff --git 
> a/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_if.c 
> b/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_if.c
> index a96f203b5d54..1d9e753cf894 100644
> --- a/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_if.c
> +++ b/drivers/media/platform/mediatek/vcodec/vdec/vdec_h264_req_multi_if.c
> @@ -140,6 +140,9 @@ struct vdec_h264_slice_share_info {
>   * @vsi: vsi used for lat
>   * @vsi_core:vsi used for core
>   *
> + * @vsi_ctx: Local VSI data for this decoding context
> + * @h264_slice_param:the parameters that hardware use to decode
> + *
>   * @resolution_changed:resolution changed
>   * @realloc_mv_buf:  reallocate mv buffer
>   * @cap_num_planes:  number of capture queue plane
> @@ -157,6 +160,9 @@ struct vdec_h264_slice_inst {
>   struct vdec_h264_slice_vsi *vsi;
>   struct vdec_h264_slice_vsi *vsi_core;
>  
> + struct vdec_h264_slice_vsi vsi_ctx;
> + struct vdec_h264_slice_lat_dec_param h264_slice_param;
> +
>   unsigned int resolution_changed;
>   unsigned int realloc_mv_buf;
>   unsigned int cap_num_planes;
> @@ -208,6 +214,61 @@ static int vdec_h264_slice_fill_decode_parameters(struct 
> vdec_h264_slice_inst *i
>   return 0;
>  }
>  
> +static int get_vdec_sig_decode_parameters(struct vdec_h264_slice_inst *inst)
> +{
> + const struct v4l2_ctrl_h264_decode_params *dec_params;
> + const struct v4l2_ctrl_h264_sps *sps;
> + const struct v4l2_ctrl_h264_pps *pps;
> + const struct v4l2_ctrl_h264_scaling_matrix *scaling_matrix;
> + struct vdec_h264_slice_lat_dec_param *slice_param = 
> >h264_slice_param;
> + struct v4l2_h264_reflist_builder reflist_builder;
> + u8 *p0_reflist = slice_param->decode_params.ref_pic_list_p0;
> + u8 *b0_reflist = slice_param->decode_params.ref_pic_list_b0;
> + u8 *b1_reflist = slice_param->decode_params.ref_pic_list_b1;
> +
> + dec_params =
> + mtk_vdec_h264_get_ctrl_ptr(inst->ctx, 
> V4L2_CID_STATELESS_H264_DECODE_PARAMS);
> + if (IS_ERR(dec_params))
> + return PTR_ERR(dec_params);
> +
> + sps = mtk_vdec_h264_get_ctrl_ptr(inst->ctx, 
> V4L2_CID_STATELESS_H264_SPS);
> + if (IS_ERR(sps))
> + return PTR_ERR(sps);
> +
> + pps = mtk_vdec_h264_get_ctrl_ptr(inst->ctx, 
> V4L2_CID_STATELESS_H264_PPS);
> + if (IS_ERR(pps))
> + return PTR_ERR(pps);
> +
> + scaling_matrix =
> + mtk_vdec_h264_get_ctrl_ptr(inst->ctx, 
> V4L2_CID_STATELESS_H264_SCALING_MATRIX);
> + if (IS_ERR(scaling_matrix))
> + return PTR_ERR(scaling_matrix);
> +
> + mtk_vdec_h264_update_dpb(dec_params, inst->dpb);
> +
> + mtk_vdec_h264_copy_sps_params(_param->sps, sps);
> + mtk_vdec_h264_copy_pps_params(_param->pps, pps);
> + mtk_vdec_h264_copy_scaling_matrix(_param->scaling_matrix, 
> scaling_matrix);
> +
> + mtk_vdec_h264_copy_decode_params(_param->decode_params, 
> dec_params, inst->dpb);
> + mtk_vdec_h264_fill_dpb_info(inst->ctx, _param->decode_params,
> + slice_param->h264_dpb_info);
> +
> + /* Build the reference lists */
> + v4l2_h264_init_reflist_builder(_builder, dec_params, sps, 
> inst->dpb);
> + v4l2_h264_build_p_ref_list(_builder, p0_reflist);
> +
> + v4l2_h264_build_b_ref_lists(_builder, b0_reflist, b1_reflist);
> + /* Adapt the built lists to the firmware's expectations */
> + mtk_vdec_h264_fixup_ref_list(p0_reflist, reflist_builder.num_valid);
> + mtk_vdec_h264_fixup_ref_list(b0_reflist, reflist_builder.num_valid);
> + mtk_vdec_h264_fixup_ref_list(b1_reflist, reflist_builder.num_valid);
> + memcpy(>vsi_ctx.h264_slice_params, slice_param,
> +sizeof(inst->vsi_ctx.h264_slice_params));

This function looks very redundant across multiple variants, could you try and
make a helper to reduce the duplication ?

> +
> + return 0;
> +}
> +
>  static void vdec_h264_slice_fill_decode_reflist(struct vdec_h264_slice_inst 
> *inst,
>   struct 
> vdec_h264_slice_lat_dec_param *slice_param,
>   struct 
> vdec_h264_slice_share_info *share_info)
> @@ -596,6 +657,120 @@ static int vdec_h264_slice_lat_decode(void *h_vdec, 
> struct mtk_vcodec_mem *bs,
>   return err;
>  }
>  
> +static int vdec_h264_slice_single_decode(void *h_vdec, struct mtk_vcodec_mem 
> *bs,
> +  struct vdec_fb *unused, bool *res_chg)
> +{
> + struct vdec_h264_slice_inst *inst = h_vdec;
> + struct 

Re: [PATCH v7 0/6] Proposal for a GPU cgroup controller

2022-05-12 Thread Nicolas Dufresne
Le mercredi 11 mai 2022 à 13:31 -0700, T.J. Mercier a écrit :
> On Wed, May 11, 2022 at 6:21 AM Nicolas Dufresne  wrote:
> > 
> > Hi,
> > 
> > Le mardi 10 mai 2022 à 23:56 +, T.J. Mercier a écrit :
> > > This patch series revisits the proposal for a GPU cgroup controller to
> > > track and limit memory allocations by various device/allocator
> > > subsystems. The patch series also contains a simple prototype to
> > > illustrate how Android intends to implement DMA-BUF allocator
> > > attribution using the GPU cgroup controller. The prototype does not
> > > include resource limit enforcements.
> > 
> > I'm sorry, since I'm not in-depth technically involve. But from reading the
> > topic I don't understand the bound this creates between DMABuf Heaps and 
> > GPU. Is
> > this an attempt to really track the DMABuf allocated by userland, or just
> > something for GPU ? What about V4L2 devices ? Any way this can be clarified,
> > specially what would other subsystem needs to have cgroup DMABuf allocation
> > controller support ?
> > 
> Hi Nicolas,
> 
> The link between dmabufs, dmabuf heaps, and "GPU memory" is maybe
> somewhat of an Androidism. However this change aims to be usable for
> tracking all GPU related allocations. It's just that this initial
> series only adds support for tracking dmabufs allocated from dmabuf
> heaps.
> 
> In Android most graphics buffers are dma buffers allocated from a
> dmabuf heap, so that is why these dmabuf heap allocations are being
> tracked under the GPU cgroup. Other dmabuf exporters like V4L2 might
> also want to track their buffers, but would probably want to do so
> under a bucket name of something like "v4l2". Same goes for GEM
> dmabufs. The naming scheme for this is still yet to be decided. It
> would be cool to be able to attribute memory at the driver level, or
> even different types of memory at the driver level, but I imagine
> there is a point of diminishing returns for fine-grained
> naming/bucketing.
> 
> So far, I haven't tried to create a strict definition of what is and
> is not "GPU memory" for the purpose of this accounting, so I don't
> think we should be restricted to tracking just dmabufs. I don't see
> why this couldn't be anything a driver wants to consider as GPU memory
> as long as it is named/bucketed appropriately, such as both on-package
> graphics card memory use and CPU memory dedicated for graphics use
> like for host/device transfers.
> 
> Is that helpful?

I'm actually happy I've asked this question, wasn't silly after all. I think the
problem here is a naming issue. What you really are monitor is "video memory",
which consist of a memory segment allocated to store data used to render images
(its not always images of course, GPU an VPU have specialized buffers for their
purpose).

Whether this should be split between what is used specifically by the GPU
drivers, the display drivers, the VPU (CODEC and pre/post-processor) or camera
drivers is something that should be discussed. But in the current approach, you
really meant Video memory as a superset of the above. Personally, I think
generically (to de-Andronized your work), en-globing all video memory is
sufficient. What I fail to understand is how you will manage to distinguished
DMABuf Heap allocation (which are used outside of Android btw), from Video
allocation or other type of usage. I'm sure non-video usage will exist in the
future (think of machine learning, compute, other high bandwidth streaming
thingy ...)

> 
> Best,
> T.J.
> 
> > > 
> > > Changelog:
> > > v7:
> > > Hide gpucg and gpucg_bucket struct definitions per Michal Koutný.
> > > This means gpucg_register_bucket now returns an internally allocated
> > > struct gpucg_bucket.
> > > 
> > > Move all public function documentation to the cgroup_gpu.h header.
> > > 
> > > Remove comment in documentation about duplicate name rejection which
> > > is not relevant to cgroups users per Michal Koutný.
> > > 
> > > v6:
> > > Move documentation into cgroup-v2.rst per Tejun Heo.
> > > 
> > > Rename BINDER_FD{A}_FLAG_SENDER_NO_NEED ->
> > > BINDER_FD{A}_FLAG_XFER_CHARGE per Carlos Llamas.
> > > 
> > > Return error on transfer failure per Carlos Llamas.
> > > 
> > > v5:
> > > Rebase on top of v5.18-rc3
> > > 
> > > Drop the global GPU cgroup "total" (sum of all device totals) portion
> > > of the design since there is no currently known use for this per
> > > Tejun Heo.
> > > 
> > > Fix commit message whic

Re: [PATCH v7 0/6] Proposal for a GPU cgroup controller

2022-05-11 Thread Nicolas Dufresne
Hi,

Le mardi 10 mai 2022 à 23:56 +, T.J. Mercier a écrit :
> This patch series revisits the proposal for a GPU cgroup controller to
> track and limit memory allocations by various device/allocator
> subsystems. The patch series also contains a simple prototype to
> illustrate how Android intends to implement DMA-BUF allocator
> attribution using the GPU cgroup controller. The prototype does not
> include resource limit enforcements.

I'm sorry, since I'm not in-depth technically involve. But from reading the
topic I don't understand the bound this creates between DMABuf Heaps and GPU. Is
this an attempt to really track the DMABuf allocated by userland, or just
something for GPU ? What about V4L2 devices ? Any way this can be clarified,
specially what would other subsystem needs to have cgroup DMABuf allocation
controller support ?

> 
> Changelog:
> v7:
> Hide gpucg and gpucg_bucket struct definitions per Michal Koutný.
> This means gpucg_register_bucket now returns an internally allocated
> struct gpucg_bucket.
> 
> Move all public function documentation to the cgroup_gpu.h header.
> 
> Remove comment in documentation about duplicate name rejection which
> is not relevant to cgroups users per Michal Koutný.
> 
> v6:
> Move documentation into cgroup-v2.rst per Tejun Heo.
> 
> Rename BINDER_FD{A}_FLAG_SENDER_NO_NEED ->
> BINDER_FD{A}_FLAG_XFER_CHARGE per Carlos Llamas.
> 
> Return error on transfer failure per Carlos Llamas.
> 
> v5:
> Rebase on top of v5.18-rc3
> 
> Drop the global GPU cgroup "total" (sum of all device totals) portion
> of the design since there is no currently known use for this per
> Tejun Heo.
> 
> Fix commit message which still contained the old name for
> dma_buf_transfer_charge per Michal Koutný.
> 
> Remove all GPU cgroup code except what's necessary to support charge transfer
> from dma_buf. Previously charging was done in export, but for non-Android
> graphics use-cases this is not ideal since there may be a delay between
> allocation and export, during which time there is no accounting.
> 
> Merge dmabuf: Use the GPU cgroup charge/uncharge APIs patch into
> dmabuf: heaps: export system_heap buffers with GPU cgroup charging as a
> result of above.
> 
> Put the charge and uncharge code in the same file (system_heap_allocate,
> system_heap_dma_buf_release) instead of splitting them between the heap and
> the dma_buf_release. This avoids asymmetric management of the gpucg charges.
> 
> Modify the dma_buf_transfer_charge API to accept a task_struct instead
> of a gpucg. This avoids requiring the caller to manage the refcount
> of the gpucg upon failure and confusing ownership transfer logic.
> 
> Support all strings for gpucg_register_bucket instead of just string
> literals.
> 
> Enforce globally unique gpucg_bucket names.
> 
> Constrain gpucg_bucket name lengths to 64 bytes.
> 
> Append "-heap" to gpucg_bucket names from dmabuf-heaps.
> 
> Drop patch 7 from the series, which changed the types of
> binder_transaction_data's sender_pid and sender_euid fields. This was
> done in another commit here:
> https://lore.kernel.org/all/20220210021129.3386083-4-masahi...@kernel.org/
> 
> Rename:
>   gpucg_try_charge -> gpucg_charge
>   find_cg_rpool_locked -> cg_rpool_find_locked
>   init_cg_rpool -> cg_rpool_init
>   get_cg_rpool_locked -> cg_rpool_get_locked
>   "gpu cgroup controller" -> "GPU controller"
>   gpucg_device -> gpucg_bucket
>   usage -> size
> 
> Tests:
>   Support both binder_fd_array_object and binder_fd_object. This is
>   necessary because new versions of Android will use binder_fd_object
>   instead of binder_fd_array_object, and we need to support both.
> 
>   Tests for both binder_fd_array_object and binder_fd_object.
> 
>   For binder_utils return error codes instead of
>   struct binder{fs}_ctx.
> 
>   Use ifdef __ANDROID__ to choose platform-dependent temp path instead
>   of a runtime fallback.
> 
>   Ensure binderfs_mntpt ends with a trailing '/' character instead of
>   prepending it where used.
> 
> v4:
> Skip test if not run as root per Shuah Khan
> 
> Add better test logging for abnormal child termination per Shuah Khan
> 
> Adjust ordering of charge/uncharge during transfer to avoid potentially
> hitting cgroup limit per Michal Koutný
> 
> Adjust gpucg_try_charge critical section for charge transfer functionality
> 
> Fix uninitialized return code error for dmabuf_try_charge error case
> 
> v3:
> Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
> 
> Use more common dual author commit message format per John Stultz
> 
> Remove android from binder changes title per Todd Kjos
> 
> Add a kselftest for this new behavior per Greg Kroah-Hartman
> 
> Include details on behavior for all combinations of kernel/userspace
> versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
> 
> Fix pid and uid types in binder UAPI header
> 
> v2:
> See the previous revision of this change submitted by Hridya Valsaraju
> at: 

Re: [PATCH v8, 00/15] media: mtk-vcodec: support for M8192 decoder

2022-04-13 Thread Nicolas Dufresne
Le mercredi 13 avril 2022 à 09:57 +0200, AngeloGioacchino Del Regno a écrit :
> Il 13/04/22 09:03, allen-kh.cheng ha scritto:
> > Hi Nicolas,
> > 
> > On Tue, 2022-04-12 at 10:48 -0400, Nicolas Dufresne wrote:
> > > Le lundi 11 avril 2022 à 11:41 +0800, yunfei.d...@mediatek.com a
> > > écrit :
> > > > Hi Nicolas,
> > > > 
> > > > On Thu, 2022-03-31 at 16:48 -0400, Nicolas Dufresne wrote:
> > > > > Hi Yunfei,
> > > > > 
> > > > > thanks for the update, I should be testing this really soon.
> > > > > 
> > > > > Le jeudi 31 mars 2022 à 10:47 +0800, Yunfei Dong a écrit :
> > > > > > This series adds support for mt8192 h264/vp8/vp9 decoder
> > > > > > drivers.
> > > > > > Firstly, refactor
> > > > > > power/clock/interrupt interfaces for mt8192 is lat and core
> > > > > > architecture.
> > > > > 
> > > > > Similarly to MT8173 and MT8183, a shared* firmware is needed for
> > > > > this
> > > > > CODEC to
> > > > > work (scp.img). I looked into linux-firmware[1] it has not been
> > > > > added
> > > > > for mt8192
> > > > > yet. As your patches are getting close to be ready, it would be
> > > > > important to
> > > > > look into this so the patchset does not get blocked due to that.
> > > > > 
> > > > > best regards,
> > > > > Nicolas
> > > > > 
> > > > > [1]
> > > > > 
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek__;!!CTRNKA9wMg0ARbw!zy4N6JDroSXtumXXa7MuxAgYAPAink8uyW-978vpWct8S3vOjBqXirFE8uTEHopHCovbSl0FNP9LPgWCEBrZfMIcvQ$
> > > > >   
> > > > > * Shared at least between MDP3 and MTK VCODEC from my knowledge
> > > > > 
> > > > 
> > > > Thanks for your remind.
> > > > 
> > > > I have already sent mt8192 scp.img to github.
> > > > 
> > > > 
> > https://urldefense.com/v3/__https://github.com/yunfeidongmediatek/linux_fw_scp_8192/commit/3ac2fc85bc7dfcebdb92b5b5808b0268cdfb772d__;!!CTRNKA9wMg0ARbw!zy4N6JDroSXtumXXa7MuxAgYAPAink8uyW-978vpWct8S3vOjBqXirFE8uTEHopHCovbSl0FNP9LPgWCEBpf9F_nWA$
> > > >   
> > > > 
> > > > Waiting for to be merged.
> > > 
> > > On boards I have, the firmware is loaded from /lib/firmware/scp.img,
> > > but with
> > > this submission it will be in lib/firmware/mediatek/mt8192/scp.img .
> > > I haven't
> > > found anything around:
> > > 
> > >   drivers/remoteproc/mtk_scp.c:812:   char *fw_name = "scp.img";
> > > 
> > > That would use the platform path. This seems like a problem to me,
> > > the
> > > upstreaming of the firmware isn't being aligned with the were the
> > > firmware is
> > > picked by the upstream driver. Correct me if I got this wrong, but
> > > I'd really
> > > like to clarify this.
> > > 
> > > Nicolas
> > > 
> > 
> > I am not sure why it's accepted the fw path of scp is
> > /lib/firmware/scp.img in mt8173/8183 but we upload scp.ing in
> > /lib/firmware/mediatek/mt8173(mt8183)/scp.img to
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek
> > 
> > Currently, the scp driver will load firmware in /lib/firmware/scp.img.
> > that means there is only one firmware for a specific platform.
> > I think we can send a PATCH to make firmware name of scp being more
> > flexible.
> > 
> > Maybe get firmware name from dts. e.g.,
> >  {
> > status = "okay";
> > firmware-name = "mediatek/mt81xx/scp.img";
> > };
> > 
> > Do you think it feasible?
> > If you have any concerns, please let us know.
> > 
> > Thanks,
> > Allen
> > 
> 
> Hello Allen,
> 
> what you proposed is exactly what has been done for other platforms because of
> both per-device firmware differences (different signatures) and per-SoC 
> (different
> firmware entirely), found on TI K3, iMX DSP, Qualcomm MSS/DSP remoteproc and
> others.
> 
> Of course this is an accepted way to resolve this situation: please go on!

Looks good to me! (don't forget to keep a fallback to /lib/firmware/scp.img to
maintain backward compatibility).

> 
> Cheers,
> Angelo
> 



Re: [PATCH v8, 00/15] media: mtk-vcodec: support for M8192 decoder

2022-04-12 Thread Nicolas Dufresne
Le lundi 11 avril 2022 à 11:41 +0800, yunfei.d...@mediatek.com a écrit :
> Hi Nicolas,
> 
> On Thu, 2022-03-31 at 16:48 -0400, Nicolas Dufresne wrote:
> > Hi Yunfei,
> > 
> > thanks for the update, I should be testing this really soon.
> > 
> > Le jeudi 31 mars 2022 à 10:47 +0800, Yunfei Dong a écrit :
> > > This series adds support for mt8192 h264/vp8/vp9 decoder drivers.
> > > Firstly, refactor
> > > power/clock/interrupt interfaces for mt8192 is lat and core
> > > architecture.
> > 
> > Similarly to MT8173 and MT8183, a shared* firmware is needed for this
> > CODEC to
> > work (scp.img). I looked into linux-firmware[1] it has not been added
> > for mt8192
> > yet. As your patches are getting close to be ready, it would be
> > important to
> > look into this so the patchset does not get blocked due to that.
> > 
> > best regards,
> > Nicolas
> > 
> > [1] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek
> > * Shared at least between MDP3 and MTK VCODEC from my knowledge
> > 
> 
> Thanks for your remind.
> 
> I have already sent mt8192 scp.img to github.
> 
> https://github.com/yunfeidongmediatek/linux_fw_scp_8192/commit/3ac2fc85bc7dfcebdb92b5b5808b0268cdfb772d
> 
> Waiting for to be merged.

On boards I have, the firmware is loaded from /lib/firmware/scp.img, but with
this submission it will be in lib/firmware/mediatek/mt8192/scp.img . I haven't
found anything around:

 drivers/remoteproc/mtk_scp.c:812:   char *fw_name = "scp.img";

That would use the platform path. This seems like a problem to me, the
upstreaming of the firmware isn't being aligned with the were the firmware is
picked by the upstream driver. Correct me if I got this wrong, but I'd really
like to clarify this.

Nicolas

> 
> Best Regards,
> Yunfei Dong
> 
> > > 
> > > Secondly, add new functions to get frame buffer size and resolution
> > > according
> > > to decoder capability from scp side. Then add callback function to
> > > get/put
> > > capture buffer in order to enable lat and core decoder in parallel,
> > > need to
> > > adjust GStreamer at the same time. 
> > > 
> > > Then add to support MT21C compressed mode and fix v4l2-compliance
> > > fail.
> > > 
> > > Next, extract H264 request api driver to let mt8183 and mt8192 use
> > > the same
> > > code, and adds mt8192 frame based h264 driver for stateless
> > > decoder.
> > > 
> > > Lastly, add vp8 and vp9 stateless decoder drivers.
> > > 
> > > Patches 1 refactor power/clock/interrupt interface.
> > > Patches 2~4 get frame buffer size and resolution according to
> > > decoder capability.
> > > Patches 5 set capture queue bytesused.
> > > Patches 6 adjust GStreamer.
> > > Patch 7~11 add to support MT21C compressed mode and fix v4l2-
> > > compliance fail.
> > > patch 12 record capture queue format type.
> > > Patch 13~14 extract h264 driver and add mt8192 frame based driver
> > > for h264 decoder.
> > > Patch 15~16 add vp8 and vp9 stateless decoder drivers.
> > > Patch 17 prevent kernel crash when rmmod mtk-vcodec-dec.ko
> > > ---
> > > changes compared with v6:
> > > - adjust GStreamer, separate src buffer done with
> > > v4l2_ctrl_request_complete for patch 6.
> > > - remove v4l2_m2m_set_dst_buffered.
> > > - add new patch to set each plane bytesused in buf prepare for
> > > patch 5.
> > > - using upstream interface to update vp9 prob tables for patch 16.
> > > - fix maintainer comments.
> > > - test the driver with chrome VD and GStreamer(H264/VP9/VP8/AV1).
> > > changes compared with v6:
> > > - rebase to the latest media stage and fix conficts
> > > - fix memcpy to memcpy_fromio or memcpy_toio
> > > - fix h264 crash when test field bitstream
> > > changes compared with v5:
> > > - fix vp9 comments for patch 15
> > > - fix vp8 comments for patch 14.
> > > - fix comments for patch 12.
> > > - fix build errors.
> > > changes compared with v4:
> > > - fix checkpatch.pl fail.
> > > - fix kernel-doc fail.
> > > - rebase to the latest media codec driver.
> > > changes compared with v3:
> > > - remove enum mtk_chip for patch 2.
> > > - add vp8 stateless decoder drivers for patch 14.
> > > - add vp9 stateless decoder drivers for patch 15.
> > > changes compared with v2:
> > > - add new pat

Re: [PATCH v8, 16/17] media: mediatek: vcodec: support stateless VP9 decoding

2022-04-07 Thread Nicolas Dufresne
Le mercredi 06 avril 2022 à 15:23 -0400, Nicolas Dufresne a écrit :
> Hi Yunfei,
> 
> Le jeudi 31 mars 2022 à 10:48 +0800, Yunfei Dong a écrit :
> > Add support for VP9 decoding using the stateless API,
> > as supported by MT8192. And the drivers is lat and core architecture.
> > 
> > Signed-off-by: George Sun 
> > Signed-off-by: Xiaoyong Lu 
> > Signed-off-by: Yunfei Dong 
> > Reviewed-by: AngeloGioacchino Del Regno 
> > 
> 
> Reviewed-by should be dropped when large rework happens. In this case, the
> probability updated has been rewritten to use the common code (thanks for
> porting it). Unfortunately, running fluster tests shows massive regression 
> (was
> 275/303) before):
> 
>Ran 34/303 tests successfully
> 
> H.264 (91/135) and VP9 (59/61) are same as before. Any idea ? What was your 
> test
> results ?

Build warnings were badly fixed in my tree. I'll comment inline, but everything
was catched by the CI, a V9 will be neede to finish cleanup build and doc
warnings. Note that Xiaoyong Lu also had crop info reading, I don't know if this
is needed.

> 
> > ---
> > changed compare with v7:
> > Using upstream interface to update vp9 prob tables.
> > ---
> >  .../media/platform/mediatek/vcodec/Makefile   |1 +
> >  .../vcodec/mtk_vcodec_dec_stateless.c |   26 +-
> >  .../platform/mediatek/vcodec/mtk_vcodec_drv.h |1 +
> >  .../vcodec/vdec/vdec_vp9_req_lat_if.c | 2072 +
> >  .../platform/mediatek/vcodec/vdec_drv_if.c|4 +
> >  .../platform/mediatek/vcodec/vdec_drv_if.h|1 +
> >  6 files changed, 2102 insertions(+), 3 deletions(-)
> >  create mode 100644 
> > drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c
> > 
> > diff --git a/drivers/media/platform/mediatek/vcodec/Makefile 
> > b/drivers/media/platform/mediatek/vcodec/Makefile
> > index b457daf2d196..93e7a343b5b0 100644
> > --- a/drivers/media/platform/mediatek/vcodec/Makefile
> > +++ b/drivers/media/platform/mediatek/vcodec/Makefile
> > @@ -9,6 +9,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
> > vdec/vdec_vp8_if.o \
> > vdec/vdec_vp8_req_if.o \
> > vdec/vdec_vp9_if.o \
> > +   vdec/vdec_vp9_req_lat_if.o \
> > vdec/vdec_h264_req_if.o \
> > vdec/vdec_h264_req_common.o \
> > vdec/vdec_h264_req_multi_if.o \
> > diff --git 
> > a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c 
> > b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> > index 3208f834ff80..a4735e67d39e 100644
> > --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> > +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> > @@ -91,13 +91,28 @@ static const struct mtk_stateless_control 
> > mtk_stateless_controls[] = {
> > .max = V4L2_MPEG_VIDEO_VP8_PROFILE_3,
> > },
> > .codec_type = V4L2_PIX_FMT_VP8_FRAME,
> > -   }
> > +   },
> > +   {
> > +   .cfg = {
> > +   .id = V4L2_CID_STATELESS_VP9_FRAME,
> > +   },
> > +   .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> > +   },
> > +   {
> > +   .cfg = {
> > +   .id = V4L2_CID_MPEG_VIDEO_VP9_PROFILE,
> > +   .min = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> > +   .def = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> > +   .max = V4L2_MPEG_VIDEO_VP9_PROFILE_3,
> > +   },
> > +   .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> > +   },
> >  };
> >  
> >  #define NUM_CTRLS ARRAY_SIZE(mtk_stateless_controls)
> >  
> > -static struct mtk_video_fmt mtk_video_formats[4];
> > -static struct mtk_codec_framesizes mtk_vdec_framesizes[2];
> > +static struct mtk_video_fmt mtk_video_formats[5];
> > +static struct mtk_codec_framesizes mtk_vdec_framesizes[3];
> >  
> >  static struct mtk_video_fmt default_out_format;
> >  static struct mtk_video_fmt default_cap_format;
> > @@ -338,6 +353,7 @@ static void mtk_vcodec_add_formats(unsigned int fourcc,
> > switch (fourcc) {
> > case V4L2_PIX_FMT_H264_SLICE:
> > case V4L2_PIX_FMT_VP8_FRAME:
> > +   case V4L2_PIX_FMT_VP9_FRAME:
> > mtk_video_formats[count_formats].fourcc = fourcc;
> > mtk_video_formats[count_formats].type = MTK_FMT_DEC;
> > mtk_video_formats[count_formats].num_planes = 1;
> > @@ -385,6 +401,10 @@ static void mtk_vcodec_get_supported_formats(struct 
> >

Re: [PATCH v8, 16/17] media: mediatek: vcodec: support stateless VP9 decoding

2022-04-07 Thread Nicolas Dufresne
Le jeudi 31 mars 2022 à 10:48 +0800, Yunfei Dong a écrit :
> Add support for VP9 decoding using the stateless API,
> as supported by MT8192. And the drivers is lat and core architecture.
> 
> Signed-off-by: George Sun 
> Signed-off-by: Xiaoyong Lu 
> Signed-off-by: Yunfei Dong 
> Reviewed-by: AngeloGioacchino Del Regno 
> 
> ---
> changed compare with v7:
> Using upstream interface to update vp9 prob tables.
> ---
>  .../media/platform/mediatek/vcodec/Makefile   |1 +
>  .../vcodec/mtk_vcodec_dec_stateless.c |   26 +-
>  .../platform/mediatek/vcodec/mtk_vcodec_drv.h |1 +
>  .../vcodec/vdec/vdec_vp9_req_lat_if.c | 2072 +
>  .../platform/mediatek/vcodec/vdec_drv_if.c|4 +
>  .../platform/mediatek/vcodec/vdec_drv_if.h|1 +
>  6 files changed, 2102 insertions(+), 3 deletions(-)
>  create mode 100644 
> drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c
> 
> diff --git a/drivers/media/platform/mediatek/vcodec/Makefile 
> b/drivers/media/platform/mediatek/vcodec/Makefile
> index b457daf2d196..93e7a343b5b0 100644
> --- a/drivers/media/platform/mediatek/vcodec/Makefile
> +++ b/drivers/media/platform/mediatek/vcodec/Makefile
> @@ -9,6 +9,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp8_if.o \
>   vdec/vdec_vp8_req_if.o \
>   vdec/vdec_vp9_if.o \
> + vdec/vdec_vp9_req_lat_if.o \
>   vdec/vdec_h264_req_if.o \
>   vdec/vdec_h264_req_common.o \
>   vdec/vdec_h264_req_multi_if.o \
> diff --git 
> a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> index 3208f834ff80..a4735e67d39e 100644
> --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> @@ -91,13 +91,28 @@ static const struct mtk_stateless_control 
> mtk_stateless_controls[] = {
>   .max = V4L2_MPEG_VIDEO_VP8_PROFILE_3,
>   },
>   .codec_type = V4L2_PIX_FMT_VP8_FRAME,
> - }
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_STATELESS_VP9_FRAME,
> + },
> + .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_MPEG_VIDEO_VP9_PROFILE,
> + .min = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> + .def = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> + .max = V4L2_MPEG_VIDEO_VP9_PROFILE_3,
> + },
> + .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> + },
>  };
>  
>  #define NUM_CTRLS ARRAY_SIZE(mtk_stateless_controls)
>  
> -static struct mtk_video_fmt mtk_video_formats[4];
> -static struct mtk_codec_framesizes mtk_vdec_framesizes[2];
> +static struct mtk_video_fmt mtk_video_formats[5];
> +static struct mtk_codec_framesizes mtk_vdec_framesizes[3];
>  
>  static struct mtk_video_fmt default_out_format;
>  static struct mtk_video_fmt default_cap_format;
> @@ -338,6 +353,7 @@ static void mtk_vcodec_add_formats(unsigned int fourcc,
>   switch (fourcc) {
>   case V4L2_PIX_FMT_H264_SLICE:
>   case V4L2_PIX_FMT_VP8_FRAME:
> + case V4L2_PIX_FMT_VP9_FRAME:
>   mtk_video_formats[count_formats].fourcc = fourcc;
>   mtk_video_formats[count_formats].type = MTK_FMT_DEC;
>   mtk_video_formats[count_formats].num_planes = 1;
> @@ -385,6 +401,10 @@ static void mtk_vcodec_get_supported_formats(struct 
> mtk_vcodec_ctx *ctx)
>   mtk_vcodec_add_formats(V4L2_PIX_FMT_VP8_FRAME, ctx);
>   out_format_count++;
>   }
> + if (ctx->dev->dec_capability & MTK_VDEC_FORMAT_VP9_FRAME) {
> + mtk_vcodec_add_formats(V4L2_PIX_FMT_VP9_FRAME, ctx);
> + out_format_count++;
> + }
>  
>   if (cap_format_count)
>   default_cap_format = mtk_video_formats[cap_format_count - 1];
> diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h 
> b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h
> index 2ba1c19f07b6..a29041a0b7e0 100644
> --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h
> +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h
> @@ -355,6 +355,7 @@ enum mtk_vdec_format_types {
>   MTK_VDEC_FORMAT_MT21C = 0x40,
>   MTK_VDEC_FORMAT_H264_SLICE = 0x100,
>   MTK_VDEC_FORMAT_VP8_FRAME = 0x200,
> + MTK_VDEC_FORMAT_VP9_FRAME = 0x400,
>  };
>  
>  /**
> diff --git 
> a/drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c 
> b/drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c
> new file mode 100644
> index ..d63399085b9b
> --- /dev/null
> +++ b/drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c
> @@ -0,0 +1,2072 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 MediaTek Inc.
> + * Author: 

Re: [PATCH v8, 16/17] media: mediatek: vcodec: support stateless VP9 decoding

2022-04-06 Thread Nicolas Dufresne
Hi Yunfei,

Le jeudi 31 mars 2022 à 10:48 +0800, Yunfei Dong a écrit :
> Add support for VP9 decoding using the stateless API,
> as supported by MT8192. And the drivers is lat and core architecture.
> 
> Signed-off-by: George Sun 
> Signed-off-by: Xiaoyong Lu 
> Signed-off-by: Yunfei Dong 
> Reviewed-by: AngeloGioacchino Del Regno 
> 

Reviewed-by should be dropped when large rework happens. In this case, the
probability updated has been rewritten to use the common code (thanks for
porting it). Unfortunately, running fluster tests shows massive regression (was
275/303) before):

   Ran 34/303 tests successfully

H.264 (91/135) and VP9 (59/61) are same as before. Any idea ? What was your test
results ?

> ---
> changed compare with v7:
> Using upstream interface to update vp9 prob tables.
> ---
>  .../media/platform/mediatek/vcodec/Makefile   |1 +
>  .../vcodec/mtk_vcodec_dec_stateless.c |   26 +-
>  .../platform/mediatek/vcodec/mtk_vcodec_drv.h |1 +
>  .../vcodec/vdec/vdec_vp9_req_lat_if.c | 2072 +
>  .../platform/mediatek/vcodec/vdec_drv_if.c|4 +
>  .../platform/mediatek/vcodec/vdec_drv_if.h|1 +
>  6 files changed, 2102 insertions(+), 3 deletions(-)
>  create mode 100644 
> drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c
> 
> diff --git a/drivers/media/platform/mediatek/vcodec/Makefile 
> b/drivers/media/platform/mediatek/vcodec/Makefile
> index b457daf2d196..93e7a343b5b0 100644
> --- a/drivers/media/platform/mediatek/vcodec/Makefile
> +++ b/drivers/media/platform/mediatek/vcodec/Makefile
> @@ -9,6 +9,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp8_if.o \
>   vdec/vdec_vp8_req_if.o \
>   vdec/vdec_vp9_if.o \
> + vdec/vdec_vp9_req_lat_if.o \
>   vdec/vdec_h264_req_if.o \
>   vdec/vdec_h264_req_common.o \
>   vdec/vdec_h264_req_multi_if.o \
> diff --git 
> a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> index 3208f834ff80..a4735e67d39e 100644
> --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec_stateless.c
> @@ -91,13 +91,28 @@ static const struct mtk_stateless_control 
> mtk_stateless_controls[] = {
>   .max = V4L2_MPEG_VIDEO_VP8_PROFILE_3,
>   },
>   .codec_type = V4L2_PIX_FMT_VP8_FRAME,
> - }
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_STATELESS_VP9_FRAME,
> + },
> + .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_MPEG_VIDEO_VP9_PROFILE,
> + .min = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> + .def = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> + .max = V4L2_MPEG_VIDEO_VP9_PROFILE_3,
> + },
> + .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> + },
>  };
>  
>  #define NUM_CTRLS ARRAY_SIZE(mtk_stateless_controls)
>  
> -static struct mtk_video_fmt mtk_video_formats[4];
> -static struct mtk_codec_framesizes mtk_vdec_framesizes[2];
> +static struct mtk_video_fmt mtk_video_formats[5];
> +static struct mtk_codec_framesizes mtk_vdec_framesizes[3];
>  
>  static struct mtk_video_fmt default_out_format;
>  static struct mtk_video_fmt default_cap_format;
> @@ -338,6 +353,7 @@ static void mtk_vcodec_add_formats(unsigned int fourcc,
>   switch (fourcc) {
>   case V4L2_PIX_FMT_H264_SLICE:
>   case V4L2_PIX_FMT_VP8_FRAME:
> + case V4L2_PIX_FMT_VP9_FRAME:
>   mtk_video_formats[count_formats].fourcc = fourcc;
>   mtk_video_formats[count_formats].type = MTK_FMT_DEC;
>   mtk_video_formats[count_formats].num_planes = 1;
> @@ -385,6 +401,10 @@ static void mtk_vcodec_get_supported_formats(struct 
> mtk_vcodec_ctx *ctx)
>   mtk_vcodec_add_formats(V4L2_PIX_FMT_VP8_FRAME, ctx);
>   out_format_count++;
>   }
> + if (ctx->dev->dec_capability & MTK_VDEC_FORMAT_VP9_FRAME) {
> + mtk_vcodec_add_formats(V4L2_PIX_FMT_VP9_FRAME, ctx);
> + out_format_count++;
> + }
>  
>   if (cap_format_count)
>   default_cap_format = mtk_video_formats[cap_format_count - 1];
> diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h 
> b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h
> index 2ba1c19f07b6..a29041a0b7e0 100644
> --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h
> +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_drv.h
> @@ -355,6 +355,7 @@ enum mtk_vdec_format_types {
>   MTK_VDEC_FORMAT_MT21C = 0x40,
>   MTK_VDEC_FORMAT_H264_SLICE = 0x100,
>   MTK_VDEC_FORMAT_VP8_FRAME = 0x200,
> + MTK_VDEC_FORMAT_VP9_FRAME = 0x400,
>  };
>  
>  /**
> diff --git 
> 

Re: [PATCH] media: mediatek: vcodec: Fix v4l2 compliance decoder cmd test fail

2022-04-06 Thread Nicolas Dufresne
Le mercredi 06 avril 2022 à 09:20 +0800, Yunfei Dong a écrit :
> Will return -EINVAL using standard framework api when test stateless
> decoder with cmd VIDIOC_(TRY)DECODER_CMD.
> 
> Using another return value to adjust v4l2 compliance test for user
> driver(GStreamer/Chrome) won't use decoder cmd.
> 
> Fixes: 8cdc3794b2e3 ("media: mtk-vcodec: vdec: support stateless API")
> Signed-off-by: Yunfei Dong 
> Reviewed-by: AngeloGioacchino Del Regno 
> 

Acked-by: Nicolas Dufresne 

> ---
> changes compared with v2:
> - add reviewed-by tag
> changes compared with v1:
> - add Fixes: tag
> ---
>  drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c 
> b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
> index 3859e4c651c6..69b0e797d342 100644
> --- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
> +++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_dec.c
> @@ -51,8 +51,7 @@ static int vidioc_try_decoder_cmd(struct file *file, void 
> *priv,
>  
>   /* Use M2M stateless helper if relevant */
>   if (ctx->dev->vdec_pdata->uses_stateless_api)
> - return v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv,
> - cmd);
> + return -ENOTTY;
>   else
>   return v4l2_m2m_ioctl_try_decoder_cmd(file, priv, cmd);
>  }



Re: [PATCH v8, 00/15] media: mtk-vcodec: support for M8192 decoder

2022-03-31 Thread Nicolas Dufresne
Hi Yunfei,

thanks for the update, I should be testing this really soon.

Le jeudi 31 mars 2022 à 10:47 +0800, Yunfei Dong a écrit :
> This series adds support for mt8192 h264/vp8/vp9 decoder drivers. Firstly, 
> refactor
> power/clock/interrupt interfaces for mt8192 is lat and core architecture.

Similarly to MT8173 and MT8183, a shared* firmware is needed for this CODEC to
work (scp.img). I looked into linux-firmware[1] it has not been added for mt8192
yet. As your patches are getting close to be ready, it would be important to
look into this so the patchset does not get blocked due to that.

best regards,
Nicolas

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek
* Shared at least between MDP3 and MTK VCODEC from my knowledge

> 
> Secondly, add new functions to get frame buffer size and resolution according
> to decoder capability from scp side. Then add callback function to get/put
> capture buffer in order to enable lat and core decoder in parallel, need to
> adjust GStreamer at the same time. 
> 
> Then add to support MT21C compressed mode and fix v4l2-compliance fail.
> 
> Next, extract H264 request api driver to let mt8183 and mt8192 use the same
> code, and adds mt8192 frame based h264 driver for stateless decoder.
> 
> Lastly, add vp8 and vp9 stateless decoder drivers.
> 
> Patches 1 refactor power/clock/interrupt interface.
> Patches 2~4 get frame buffer size and resolution according to decoder 
> capability.
> Patches 5 set capture queue bytesused.
> Patches 6 adjust GStreamer.
> Patch 7~11 add to support MT21C compressed mode and fix v4l2-compliance fail.
> patch 12 record capture queue format type.
> Patch 13~14 extract h264 driver and add mt8192 frame based driver for h264 
> decoder.
> Patch 15~16 add vp8 and vp9 stateless decoder drivers.
> Patch 17 prevent kernel crash when rmmod mtk-vcodec-dec.ko
> ---
> changes compared with v6:
> - adjust GStreamer, separate src buffer done with v4l2_ctrl_request_complete 
> for patch 6.
> - remove v4l2_m2m_set_dst_buffered.
> - add new patch to set each plane bytesused in buf prepare for patch 5.
> - using upstream interface to update vp9 prob tables for patch 16.
> - fix maintainer comments.
> - test the driver with chrome VD and GStreamer(H264/VP9/VP8/AV1).
> changes compared with v6:
> - rebase to the latest media stage and fix conficts
> - fix memcpy to memcpy_fromio or memcpy_toio
> - fix h264 crash when test field bitstream
> changes compared with v5:
> - fix vp9 comments for patch 15
> - fix vp8 comments for patch 14.
> - fix comments for patch 12.
> - fix build errors.
> changes compared with v4:
> - fix checkpatch.pl fail.
> - fix kernel-doc fail.
> - rebase to the latest media codec driver.
> changes compared with v3:
> - remove enum mtk_chip for patch 2.
> - add vp8 stateless decoder drivers for patch 14.
> - add vp9 stateless decoder drivers for patch 15.
> changes compared with v2:
> - add new patch 11 to record capture queue format type.
> - separate patch 4 according to tzung-bi's suggestion.
> - re-write commit message for patch 5 according to tzung-bi's suggestion.
> changes compared with v1:
> - rewrite commit message for patch 12.
> - rewrite cover-letter message.
> ---
> Yunfei Dong (17):
>   media: mediatek: vcodec: Add vdec enable/disable hardware helpers
>   media: mediatek: vcodec: Using firmware type to separate different
> firmware architecture
>   media: mediatek: vcodec: get capture queue buffer size from scp
>   media: mediatek: vcodec: Read max resolution from dec_capability
>   media: mediatek: vcodec: set each plane bytesused in buf prepare
>   media: mediatek: vcodec: Refactor get and put capture buffer flow
>   media: mediatek: vcodec: Refactor supported vdec formats and
> framesizes
>   media: mediatek: vcodec: Getting supported decoder format types
>   media: mediatek: vcodec: Add format to support MT21C
>   media: mediatek: vcodec: disable vp8 4K capability
>   media: mediatek: vcodec: Fix v4l2-compliance fail
>   media: mediatek: vcodec: record capture queue format type
>   media: mediatek: vcodec: Extract H264 common code
>   media: mediatek: vcodec: support stateless H.264 decoding for mt8192
>   media: mediatek: vcodec: support stateless VP8 decoding
>   media: mediatek: vcodec: support stateless VP9 decoding
>   media: mediatek: vcodec: prevent kernel crash when rmmod
> mtk-vcodec-dec.ko
> 
>  .../media/platform/mediatek/vcodec/Makefile   |4 +
>  .../platform/mediatek/vcodec/mtk_vcodec_dec.c |   62 +-
>  .../mediatek/vcodec/mtk_vcodec_dec_drv.c  |8 +-
>  .../mediatek/vcodec/mtk_vcodec_dec_pm.c   |  166 +-
>  .../mediatek/vcodec/mtk_vcodec_dec_pm.h   |6 +-
>  .../mediatek/vcodec/mtk_vcodec_dec_stateful.c |   19 +-
>  .../vcodec/mtk_vcodec_dec_stateless.c |  257 +-
>  .../platform/mediatek/vcodec/mtk_vcodec_drv.h |   41 +-
>  .../mediatek/vcodec/mtk_vcodec_enc_drv.c  |5 -
>  

Re: [PATCH v7, 15/15] media: mtk-vcodec: support stateless VP9 decoding

2022-03-01 Thread Nicolas Dufresne
Le mercredi 23 février 2022 à 11:40 +0800, Yunfei Dong a écrit :
> Add support for VP9 decoding using the stateless API,
> as supported by MT8192. And the drivers is lat and core architecture.

You already have a reviewed tag, but I'm under the impression that there is a
fair amount of duplication with the helper library v4l2-vp9:

  include/media/v4l2-vp9.h
  drivers/media/v4l2-core/v4l2-vp9.c

Can you at least give it a look and comment on why you can't use/adapt it for
this driver ?

> 
> Signed-off-by: Yunfei Dong 
> Signed-off-by: George Sun 
> Reviewed-by: AngeloGioacchino Del Regno 
> 
> ---
>  drivers/media/platform/mtk-vcodec/Makefile|1 +
>  .../mtk-vcodec/mtk_vcodec_dec_stateless.c |   26 +-
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |1 +
>  .../mtk-vcodec/vdec/vdec_vp9_req_lat_if.c | 1971 +
>  .../media/platform/mtk-vcodec/vdec_drv_if.c   |4 +
>  .../media/platform/mtk-vcodec/vdec_drv_if.h   |1 +
>  6 files changed, 2001 insertions(+), 3 deletions(-)
>  create mode 100644 
> drivers/media/platform/mtk-vcodec/vdec/vdec_vp9_req_lat_if.c
> 
> diff --git a/drivers/media/platform/mtk-vcodec/Makefile 
> b/drivers/media/platform/mtk-vcodec/Makefile
> index b457daf2d196..93e7a343b5b0 100644
> --- a/drivers/media/platform/mtk-vcodec/Makefile
> +++ b/drivers/media/platform/mtk-vcodec/Makefile
> @@ -9,6 +9,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp8_if.o \
>   vdec/vdec_vp8_req_if.o \
>   vdec/vdec_vp9_if.o \
> + vdec/vdec_vp9_req_lat_if.o \
>   vdec/vdec_h264_req_if.o \
>   vdec/vdec_h264_req_common.o \
>   vdec/vdec_h264_req_multi_if.o \
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> index 2a0164ddc708..3770e8117488 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> @@ -91,13 +91,28 @@ static const struct mtk_stateless_control 
> mtk_stateless_controls[] = {
>   .max = V4L2_MPEG_VIDEO_VP8_PROFILE_3,
>   },
>   .codec_type = V4L2_PIX_FMT_VP8_FRAME,
> - }
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_STATELESS_VP9_FRAME,
> + },
> + .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_MPEG_VIDEO_VP9_PROFILE,
> + .min = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> + .def = V4L2_MPEG_VIDEO_VP9_PROFILE_0,
> + .max = V4L2_MPEG_VIDEO_VP9_PROFILE_3,
> + },
> + .codec_type = V4L2_PIX_FMT_VP9_FRAME,
> + },
>  };
>  
>  #define NUM_CTRLS ARRAY_SIZE(mtk_stateless_controls)
>  
> -static struct mtk_video_fmt mtk_video_formats[4];
> -static struct mtk_codec_framesizes mtk_vdec_framesizes[2];
> +static struct mtk_video_fmt mtk_video_formats[5];
> +static struct mtk_codec_framesizes mtk_vdec_framesizes[3];
>  
>  static struct mtk_video_fmt default_out_format;
>  static struct mtk_video_fmt default_cap_format;
> @@ -366,6 +381,7 @@ static void mtk_vcodec_add_formats(unsigned int fourcc,
>   switch (fourcc) {
>   case V4L2_PIX_FMT_H264_SLICE:
>   case V4L2_PIX_FMT_VP8_FRAME:
> + case V4L2_PIX_FMT_VP9_FRAME:
>   mtk_video_formats[count_formats].fourcc = fourcc;
>   mtk_video_formats[count_formats].type = MTK_FMT_DEC;
>   mtk_video_formats[count_formats].num_planes = 1;
> @@ -413,6 +429,10 @@ static void mtk_vcodec_get_supported_formats(struct 
> mtk_vcodec_ctx *ctx)
>   mtk_vcodec_add_formats(V4L2_PIX_FMT_VP8_FRAME, ctx);
>   out_format_count++;
>   }
> + if (ctx->dev->dec_capability & MTK_VDEC_FORMAT_VP9_FRAME) {
> + mtk_vcodec_add_formats(V4L2_PIX_FMT_VP9_FRAME, ctx);
> + out_format_count++;
> + }
>  
>   if (cap_format_count)
>   default_cap_format = mtk_video_formats[cap_format_count - 1];
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> index c68297db225e..ea58f11e7659 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> @@ -355,6 +355,7 @@ enum mtk_vdec_format_types {
>   MTK_VDEC_FORMAT_MT21C = 0x40,
>   MTK_VDEC_FORMAT_H264_SLICE = 0x100,
>   MTK_VDEC_FORMAT_VP8_FRAME = 0x200,
> + MTK_VDEC_FORMAT_VP9_FRAME = 0x400,
>  };
>  
>  /**
> diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_vp9_req_lat_if.c 
> b/drivers/media/platform/mtk-vcodec/vdec/vdec_vp9_req_lat_if.c
> new file mode 100644
> index ..c678170c7ca3
> --- /dev/null
> +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_vp9_req_lat_if.c
> @@ -0,0 +1,1971 @@
> 

Re: [PATCH v7, 14/15] media: mtk-vcodec: support stateless VP8 decoding

2022-03-01 Thread Nicolas Dufresne
Thanks for this work.

Le mercredi 23 février 2022 à 11:40 +0800, Yunfei Dong a écrit :
> Add support for VP8 decoding using the stateless API,
> as supported by MT8192.

With the struct members naming made consistent, even though I would like your
patch better if it was not duplicating so much code, I'll give you my:

Reviewed-by: Nicolas Dufresne 

> 
> Signed-off-by: Yunfei Dong 
> ---
>  drivers/media/platform/mtk-vcodec/Makefile|   1 +
>  .../mtk-vcodec/mtk_vcodec_dec_stateless.c |  24 +-
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |   1 +
>  .../mtk-vcodec/vdec/vdec_vp8_req_if.c | 445 ++
>  .../media/platform/mtk-vcodec/vdec_drv_if.c   |   4 +
>  .../media/platform/mtk-vcodec/vdec_drv_if.h   |   1 +
>  6 files changed, 474 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/media/platform/mtk-vcodec/vdec/vdec_vp8_req_if.c
> 
> diff --git a/drivers/media/platform/mtk-vcodec/Makefile 
> b/drivers/media/platform/mtk-vcodec/Makefile
> index 22edb1c86598..b457daf2d196 100644
> --- a/drivers/media/platform/mtk-vcodec/Makefile
> +++ b/drivers/media/platform/mtk-vcodec/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_VIDEO_MEDIATEK_VCODEC) += mtk-vcodec-dec.o \
>  
>  mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp8_if.o \
> + vdec/vdec_vp8_req_if.o \
>   vdec/vdec_vp9_if.o \
>   vdec/vdec_h264_req_if.o \
>   vdec/vdec_h264_req_common.o \
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> index 9333e3418b98..2a0164ddc708 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> @@ -76,13 +76,28 @@ static const struct mtk_stateless_control 
> mtk_stateless_controls[] = {
>   .max = V4L2_STATELESS_H264_START_CODE_ANNEX_B,
>   },
>   .codec_type = V4L2_PIX_FMT_H264_SLICE,
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_STATELESS_VP8_FRAME,
> + },
> + .codec_type = V4L2_PIX_FMT_VP8_FRAME,
> + },
> + {
> + .cfg = {
> + .id = V4L2_CID_MPEG_VIDEO_VP8_PROFILE,
> + .min = V4L2_MPEG_VIDEO_VP8_PROFILE_0,
> + .def = V4L2_MPEG_VIDEO_VP8_PROFILE_0,
> + .max = V4L2_MPEG_VIDEO_VP8_PROFILE_3,
> + },
> + .codec_type = V4L2_PIX_FMT_VP8_FRAME,
>   }
>  };
>  
>  #define NUM_CTRLS ARRAY_SIZE(mtk_stateless_controls)
>  
> -static struct mtk_video_fmt mtk_video_formats[3];
> -static struct mtk_codec_framesizes mtk_vdec_framesizes[1];
> +static struct mtk_video_fmt mtk_video_formats[4];
> +static struct mtk_codec_framesizes mtk_vdec_framesizes[2];
>  
>  static struct mtk_video_fmt default_out_format;
>  static struct mtk_video_fmt default_cap_format;
> @@ -350,6 +365,7 @@ static void mtk_vcodec_add_formats(unsigned int fourcc,
>  
>   switch (fourcc) {
>   case V4L2_PIX_FMT_H264_SLICE:
> + case V4L2_PIX_FMT_VP8_FRAME:
>   mtk_video_formats[count_formats].fourcc = fourcc;
>   mtk_video_formats[count_formats].type = MTK_FMT_DEC;
>   mtk_video_formats[count_formats].num_planes = 1;
> @@ -393,6 +409,10 @@ static void mtk_vcodec_get_supported_formats(struct 
> mtk_vcodec_ctx *ctx)
>   mtk_vcodec_add_formats(V4L2_PIX_FMT_H264_SLICE, ctx);
>   out_format_count++;
>   }
> + if (ctx->dev->dec_capability & MTK_VDEC_FORMAT_VP8_FRAME) {
> + mtk_vcodec_add_formats(V4L2_PIX_FMT_VP8_FRAME, ctx);
> + out_format_count++;
> + }
>  
>   if (cap_format_count)
>   default_cap_format = mtk_video_formats[cap_format_count - 1];
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> index d60561065656..c68297db225e 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> @@ -354,6 +354,7 @@ enum mtk_vdec_format_types {
>   MTK_VDEC_FORMAT_MM21 = 0x20,
>   MTK_VDEC_FORMAT_MT21C = 0x40,
>   MTK_VDEC_FORMAT_H264_SLICE = 0x100,
> + MTK_VDEC_FORMAT_VP8_FRAME = 0x200,
>  };
>  
>  /**
> diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_vp8_req_if.c 
> b/drivers/media/platform/mtk-vcodec/vdec/vdec_vp8_req_if.c
> new file mode 100644
> index ..6bd4f2365826
> --- /dev/null
> +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_vp8_req

Re: [PATCH v7, 13/15] media: mtk-vcodec: support stateless H.264 decoding for mt8192

2022-03-01 Thread Nicolas Dufresne
Le mercredi 23 février 2022 à 11:40 +0800, Yunfei Dong a écrit :
> Adds h264 lat and core architecture driver for mt8192,
> and the decode mode is frame based for stateless decoder.
> 
> Signed-off-by: Yunfei Dong 
> ---
>  drivers/media/platform/mtk-vcodec/Makefile|   1 +
>  .../mtk-vcodec/vdec/vdec_h264_req_multi_if.c  | 621 ++
>  .../media/platform/mtk-vcodec/vdec_drv_if.c   |   8 +-
>  .../media/platform/mtk-vcodec/vdec_drv_if.h   |   1 +
>  include/linux/remoteproc/mtk_scp.h|   2 +
>  5 files changed, 632 insertions(+), 1 deletion(-)
>  create mode 100644 
> drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_multi_if.c
> 
> diff --git a/drivers/media/platform/mtk-vcodec/Makefile 
> b/drivers/media/platform/mtk-vcodec/Makefile
> index 3f41d748eee5..22edb1c86598 100644
> --- a/drivers/media/platform/mtk-vcodec/Makefile
> +++ b/drivers/media/platform/mtk-vcodec/Makefile
> @@ -10,6 +10,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp9_if.o \
>   vdec/vdec_h264_req_if.o \
>   vdec/vdec_h264_req_common.o \
> + vdec/vdec_h264_req_multi_if.o \
>   mtk_vcodec_dec_drv.o \
>   vdec_drv_if.o \
>   vdec_vpu_if.o \
> diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_multi_if.c 
> b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_multi_if.c
> new file mode 100644
> index ..82a279f327c4
> --- /dev/null
> +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_multi_if.c
> @@ -0,0 +1,621 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 MediaTek Inc.
> + * Author: Yunfei Dong 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "../mtk_vcodec_util.h"
> +#include "../mtk_vcodec_dec.h"
> +#include "../mtk_vcodec_intr.h"
> +#include "../vdec_drv_base.h"
> +#include "../vdec_drv_if.h"
> +#include "../vdec_vpu_if.h"
> +#include "vdec_h264_req_common.h"
> +
> +/**
> + * enum vdec_h264_core_dec_err_type  - core decode error type

Similar to my comment on other patch, I notice that a empty line is added here
in other doc comments. To be applied everywhere of course.

> + * @TRANS_BUFFER_FULL : trans buffer is full
> + * @SLICE_HEADER_FULL : slice header buffer is full
> + */
> +enum vdec_h264_core_dec_err_type {
> + TRANS_BUFFER_FULL = 1,
> + SLICE_HEADER_FULL,
> +};
> +
> +/**
> + * struct vdec_h264_slice_lat_dec_param  - parameters for decode current 
> frame
> + * @sps : h264 sps syntax parameters
> + * @pps : h264 pps syntax parameters
> + * @slice_header: h264 slice header syntax parameters
> + * @scaling_matrix : h264 scaling list parameters
> + * @decode_params : decoder parameters of each frame used for hardware decode
> + * @h264_dpb_info : dpb reference list
> + */
> +struct vdec_h264_slice_lat_dec_param {
> + struct mtk_h264_sps_param sps;
> + struct mtk_h264_pps_param pps;
> + struct mtk_h264_slice_hd_param slice_header;
> + struct slice_api_h264_scaling_matrix scaling_matrix;
> + struct slice_api_h264_decode_param decode_params;
> + struct mtk_h264_dpb_info h264_dpb_info[V4L2_H264_NUM_DPB_ENTRIES];
> +};
> +
> +/**
> + * struct vdec_h264_slice_info - decode information
> + * @nal_info: nal info of current picture
> + * @timeout : Decode timeout: 1 timeout, 0 no timeount
> + * @bs_buf_size : bitstream size
> + * @bs_buf_addr : bitstream buffer dma address
> + * @y_fb_dma: Y frame buffer dma address
> + * @c_fb_dma: C frame buffer dma address
> + * @vdec_fb_va  : VDEC frame buffer struct virtual address
> + * @crc : Used to check whether hardware's status is right
> + */
> +struct vdec_h264_slice_info {
> + u16 nal_info;
> + u16 timeout;
> + u32 bs_buf_size;
> + u64 bs_buf_addr;
> + u64 y_fb_dma;
> + u64 c_fb_dma;
> + u64 vdec_fb_va;
> + u32 crc[8];
> +};
> +
> +/**
> + * struct vdec_h264_slice_vsi - shared memory for decode information exchange
> + *between VPU and Host. The memory is allocated by VPU then mapping 
> to
> + *Host in vdec_h264_slice_init() and freed in 
> vdec_h264_slice_deinit()
> + *by VPU. AP-W/R : AP is writer/reader on this item. VPU-W/R: VPU is
> + *write/reader on this item.

Long description goes below the member list.

> + * @wdma_err_addr   : wdma error dma address
> + * @wdma_start_addr : wdma start dma address
> + * @wdma_end_addr   : wdma end dma address
> + * @slice_bc_start_addr : slice bc start dma address
> + * @slice_bc_end_addr   : slice bc end dma address
> + * @row_info_start_addr : row info start dma address
> + * @row_info_end_addr   : row info end dma address
> + * @trans_start : trans start dma address
> + * @trans_end   : trans end dma address
> + * @wdma_end_addr_offset: wdma end address offset
> + *
> + * @mv_buf_dma  : HW working motion vector buffer
> + *dma 

Re: [PATCH v7, 12/15] media: mtk-vcodec: Extract H264 common code

2022-03-01 Thread Nicolas Dufresne
Le mercredi 23 février 2022 à 11:40 +0800, Yunfei Dong a écrit :
> Mt8192 can use some of common code with mt8183. Moves them to
> a new file in order to reuse.

With the documentation fixed as per my comments below, you can add:

Reviewed-by: Nicolas Dufresne 

> 
> Signed-off-by: Yunfei Dong 
> ---
>  drivers/media/platform/mtk-vcodec/Makefile|   1 +
>  .../mtk-vcodec/vdec/vdec_h264_req_common.c| 310 +
>  .../mtk-vcodec/vdec/vdec_h264_req_common.h| 253 +++
>  .../mtk-vcodec/vdec/vdec_h264_req_if.c| 424 ++
>  4 files changed, 606 insertions(+), 382 deletions(-)
>  create mode 100644 
> drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_common.c
>  create mode 100644 
> drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_common.h
> 
> diff --git a/drivers/media/platform/mtk-vcodec/Makefile 
> b/drivers/media/platform/mtk-vcodec/Makefile
> index 359619653a0e..3f41d748eee5 100644
> --- a/drivers/media/platform/mtk-vcodec/Makefile
> +++ b/drivers/media/platform/mtk-vcodec/Makefile
> @@ -9,6 +9,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp8_if.o \
>   vdec/vdec_vp9_if.o \
>   vdec/vdec_h264_req_if.o \
> + vdec/vdec_h264_req_common.o \
>   mtk_vcodec_dec_drv.o \
>   vdec_drv_if.o \
>   vdec_vpu_if.o \
> diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_common.c 
> b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_common.c
> new file mode 100644
> index ..6c68bee632d6
> --- /dev/null
> +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_common.c
> @@ -0,0 +1,310 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 MediaTek Inc.
> + * Author: Yunfei Dong 
> + */
> +
> +#include "vdec_h264_req_common.h"
> +
> +/* get used parameters for sps/pps */
> +#define GET_MTK_VDEC_FLAG(cond, flag) \
> + { dst_param->cond = ((src_param->flags & flag) ? (1) : (0)); }
> +#define GET_MTK_VDEC_PARAM(param) \
> + { dst_param->param = src_param->param; }
> +
> +/*
> + * The firmware expects unused reflist entries to have the value 0x20.
> + */
> +void mtk_vdec_h264_fixup_ref_list(u8 *ref_list, size_t num_valid)
> +{
> + memset_io(_list[num_valid], 0x20, 32 - num_valid);
> +}
> +
> +void *mtk_vdec_h264_get_ctrl_ptr(struct mtk_vcodec_ctx *ctx, int id)
> +{
> + struct v4l2_ctrl *ctrl = v4l2_ctrl_find(>ctrl_hdl, id);
> +
> + if (!ctrl)
> + return ERR_PTR(-EINVAL);
> +
> + return ctrl->p_cur.p;
> +}
> +
> +void mtk_vdec_h264_fill_dpb_info(struct mtk_vcodec_ctx *ctx,
> +  struct slice_api_h264_decode_param 
> *decode_params,
> +  struct mtk_h264_dpb_info *h264_dpb_info)
> +{
> + const struct slice_h264_dpb_entry *dpb;
> + struct vb2_queue *vq;
> + struct vb2_buffer *vb;
> + struct vb2_v4l2_buffer *vb2_v4l2;
> + int index, vb2_index;
> +
> + vq = v4l2_m2m_get_vq(ctx->m2m_ctx, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE);
> +
> + for (index = 0; index < V4L2_H264_NUM_DPB_ENTRIES; index++) {
> + dpb = _params->dpb[index];
> + if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) {
> + h264_dpb_info[index].reference_flag = 0;
> + continue;
> + }
> +
> + vb2_index = vb2_find_timestamp(vq, dpb->reference_ts, 0);
> + if (vb2_index < 0) {
> + dev_err(>dev->plat_dev->dev,
> + "Reference invalid: dpb_index(%d) 
> reference_ts(%lld)",
> + index, dpb->reference_ts);
> + continue;
> + }
> +
> + /* 1 for short term reference, 2 for long term reference */
> + if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM))
> + h264_dpb_info[index].reference_flag = 1;
> + else
> + h264_dpb_info[index].reference_flag = 2;
> +
> + vb = vq->bufs[vb2_index];
> + vb2_v4l2 = container_of(vb, struct vb2_v4l2_buffer, vb2_buf);
> + h264_dpb_info[index].field = vb2_v4l2->field;
> +
> + h264_dpb_info[index].y_dma_addr =
> + vb2_dma_contig_plane_dma_addr(vb, 0);
> + if (ctx->q_data[MTK_Q_DATA_DST].fmt->num_planes == 2)
> + h264_dpb_info[index].c_dma_addr =
> + vb2_dma_contig_plane_dma_addr(vb, 1);
> + else

Re: [PATCH v7, 09/15] media: mtk-vcodec: disable vp8 4K capability

2022-03-01 Thread Nicolas Dufresne
Le mercredi 23 février 2022 à 11:40 +0800, Yunfei Dong a écrit :
> For vp8 not support 4K, need to disable it.

This patch will need to be changed after you have moved this code into the
proper ioctl.

> 
> Signed-off-by: Yunfei Dong 
> ---
>  drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> index bae43938ee37..ba188d16f0fb 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> @@ -532,7 +532,8 @@ static int vidioc_enum_framesizes(struct file *file, void 
> *priv,
>   fsize->type = V4L2_FRMSIZE_TYPE_STEPWISE;
>   fsize->stepwise = dec_pdata->vdec_framesizes[i].stepwise;
>   if (!(ctx->dev->dec_capability &
> - VCODEC_CAPABILITY_4K_DISABLED)) {
> + VCODEC_CAPABILITY_4K_DISABLED) &&
> + fsize->pixel_format != V4L2_PIX_FMT_VP8_FRAME) {
>   mtk_v4l2_debug(3, "4K is enabled");
>   fsize->stepwise.max_width =
>   VCODEC_DEC_4K_CODED_WIDTH;



Re: [PATCH v7, 06/15] media: mtk-vcodec: Refactor get and put capture buffer flow

2022-03-01 Thread Nicolas Dufresne
Le mercredi 23 février 2022 à 11:39 +0800, Yunfei Dong a écrit :
> For lat and core decode in parallel, need to get capture buffer
> when core start to decode and put capture buffer to display
> list when core decode done.
> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../mtk-vcodec/mtk_vcodec_dec_stateless.c | 121 --
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |   5 +-
>  .../mtk-vcodec/vdec/vdec_h264_req_if.c|  16 ++-
>  3 files changed, 102 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> index 23a154c4e321..6d481410bf89 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> @@ -108,37 +108,87 @@ static const struct mtk_codec_framesizes 
> mtk_vdec_framesizes[] = {
>  
>  #define NUM_SUPPORTED_FRAMESIZE ARRAY_SIZE(mtk_vdec_framesizes)
>  
> -static void mtk_vdec_stateless_set_dst_payload(struct mtk_vcodec_ctx *ctx,
> -struct vdec_fb *fb)
> +static void mtk_vdec_stateless_out_to_done(struct mtk_vcodec_ctx *ctx,
> +struct mtk_vcodec_mem *bs, int error)
>  {
> - struct mtk_video_dec_buf *vdec_frame_buf =
> - container_of(fb, struct mtk_video_dec_buf, frame_buffer);
> - struct vb2_v4l2_buffer *vb = _frame_buf->m2m_buf.vb;
> - unsigned int cap_y_size = ctx->q_data[MTK_Q_DATA_DST].sizeimage[0];
> + struct mtk_video_dec_buf *out_buf;
> + struct vb2_v4l2_buffer *vb;
>  
> - vb2_set_plane_payload(>vb2_buf, 0, cap_y_size);
> - if (ctx->q_data[MTK_Q_DATA_DST].fmt->num_planes == 2) {
> - unsigned int cap_c_size =
> - ctx->q_data[MTK_Q_DATA_DST].sizeimage[1];
> + if (!bs) {
> + mtk_v4l2_err("Free bitstream buffer fail.");
> + return;
> + }
> + out_buf = container_of(bs, struct mtk_video_dec_buf, bs_buffer);
> + vb = _buf->m2m_buf.vb;
>  
> - vb2_set_plane_payload(>vb2_buf, 1, cap_c_size);
> + mtk_v4l2_debug(2, "Free bitsteam buffer id = %d to done_list",
> +vb->vb2_buf.index);
> +
> + v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
> + if (error) {
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_ERROR);
> + if (error == -EIO)
> + out_buf->error = true;
> + } else {
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_DONE);
>   }
>  }
>  
> -static struct vdec_fb *vdec_get_cap_buffer(struct mtk_vcodec_ctx *ctx,
> -struct vb2_v4l2_buffer *vb2_v4l2)
> +static void mtk_vdec_stateless_cap_to_disp(struct mtk_vcodec_ctx *ctx,
> +struct vdec_fb *fb, int error)
>  {
> - struct mtk_video_dec_buf *framebuf =
> - container_of(vb2_v4l2, struct mtk_video_dec_buf, m2m_buf.vb);
> - struct vdec_fb *pfb = >frame_buffer;
> - struct vb2_buffer *dst_buf = _v4l2->vb2_buf;
> + struct mtk_video_dec_buf *vdec_frame_buf;
> + struct vb2_v4l2_buffer *vb;
> + unsigned int cap_y_size, cap_c_size;
> +
> + if (!fb) {
> + mtk_v4l2_err("Free frame buffer fail.");
> + return;
> + }
> + vdec_frame_buf = container_of(fb, struct mtk_video_dec_buf,
> +   frame_buffer);
> + vb = _frame_buf->m2m_buf.vb;
> +
> + cap_y_size = ctx->q_data[MTK_Q_DATA_DST].sizeimage[0];
> + cap_c_size = ctx->q_data[MTK_Q_DATA_DST].sizeimage[1];
> +
> + v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
>  
> - pfb->base_y.va = NULL;
> + vb2_set_plane_payload(>vb2_buf, 0, cap_y_size);
> + if (ctx->q_data[MTK_Q_DATA_DST].fmt->num_planes == 2)
> + vb2_set_plane_payload(>vb2_buf, 1, cap_c_size);
> +
> + mtk_v4l2_debug(2, "Free frame buffer id = %d to done_list",
> +vb->vb2_buf.index);
> + if (error)
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_ERROR);
> + else
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_DONE);
> +}
> +
> +static struct vdec_fb *vdec_get_cap_buffer(struct mtk_vcodec_ctx *ctx)
> +{
> + struct mtk_video_dec_buf *framebuf;
> + struct vb2_v4l2_buffer *vb2_v4l2;
> + struct vb2_buffer *dst_buf;
> + struct vdec_fb *pfb;
> +
> + vb2_v4l2 = v4l2_m2m_next_dst_buf(ctx->m2m_ctx);
> + if (!vb2_v4l2) {
> + mtk_v4l2_debug(1, "[%d] dst_buf empty!!", ctx->id);
> + return NULL;
> + }
> +
> + dst_buf = _v4l2->vb2_buf;
> + framebuf = container_of(vb2_v4l2, struct mtk_video_dec_buf, m2m_buf.vb);
> +
> + pfb = >frame_buffer;
> + pfb->base_y.va = vb2_plane_vaddr(dst_buf, 0);
>   pfb->base_y.dma_addr = vb2_dma_contig_plane_dma_addr(dst_buf, 0);
>   pfb->base_y.size = ctx->q_data[MTK_Q_DATA_DST].sizeimage[0];
>  
>   if 

Re: [PATCH v7, 05/15] media: mtk-vcodec: Call v4l2_m2m_set_dst_buffered() set capture buffer buffered

2022-03-01 Thread Nicolas Dufresne
Le mercredi 23 février 2022 à 11:39 +0800, Yunfei Dong a écrit :
> lat thread: output queue  \
>-> lat hardware -> lat trans buffer
> lat trans buffer  /
> 
> core thread: capture queue \
> ->core hardware -> capture queue
>  lat trans buffer  /
> 
> Lat and core work in different thread, setting capture buffer buffered.

... so that output queue buffers (bitstream) can be process regardless if there
is available capture buffers.

I have concerns around the usefulness of running a dedicated thread to drive the
lat and the core blocks. Having 3 threads (counting the m2m worker thread) here
increase the complexity. The hardware is asynchronous by definition. I think
this patch will go away after a proper rework of the driver thread model here.

> 
> Signed-off-by: Yunfei Dong 
> ---
>  drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> index 5aebf88f997b..23a154c4e321 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> @@ -314,6 +314,9 @@ static void mtk_init_vdec_params(struct mtk_vcodec_ctx 
> *ctx)
>   src_vq = v4l2_m2m_get_vq(ctx->m2m_ctx,
>V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE);
>  
> + if (ctx->dev->vdec_pdata->hw_arch != MTK_VDEC_PURE_SINGLE_CORE)
> + v4l2_m2m_set_dst_buffered(ctx->m2m_ctx, 1);
> +
>   /* Support request api for output plane */
>   src_vq->supports_requests = true;
>   src_vq->requires_requests = true;



Re: [PATCH v7, 03/15] media: mtk-vcodec: get capture queue buffer size from scp

2022-03-01 Thread Nicolas Dufresne
Thanks for your patch, though perhaps it could be improved, see comment below.

Le mercredi 23 février 2022 à 11:39 +0800, Yunfei Dong a écrit :
> Different capture buffer format has different buffer size, need to get
> real buffer size according to buffer type from scp.
> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../media/platform/mtk-vcodec/vdec_ipi_msg.h  | 36 ++
>  .../media/platform/mtk-vcodec/vdec_vpu_if.c   | 49 +++
>  .../media/platform/mtk-vcodec/vdec_vpu_if.h   | 15 ++
>  3 files changed, 100 insertions(+)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h 
> b/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h
> index bf54d6d9a857..47070be2a991 100644
> --- a/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h
> +++ b/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h
> @@ -20,6 +20,7 @@ enum vdec_ipi_msgid {
>   AP_IPIMSG_DEC_RESET = 0xA004,
>   AP_IPIMSG_DEC_CORE = 0xA005,
>   AP_IPIMSG_DEC_CORE_END = 0xA006,
> + AP_IPIMSG_DEC_GET_PARAM = 0xA007,
>  
>   VPU_IPIMSG_DEC_INIT_ACK = 0xB000,
>   VPU_IPIMSG_DEC_START_ACK = 0xB001,
> @@ -28,6 +29,7 @@ enum vdec_ipi_msgid {
>   VPU_IPIMSG_DEC_RESET_ACK = 0xB004,
>   VPU_IPIMSG_DEC_CORE_ACK = 0xB005,
>   VPU_IPIMSG_DEC_CORE_END_ACK = 0xB006,
> + VPU_IPIMSG_DEC_GET_PARAM_ACK = 0xB007,
>  };
>  
>  /**
> @@ -114,4 +116,38 @@ struct vdec_vpu_ipi_init_ack {
>   uint32_t inst_id;
>  };
>  
> +/**
> + * struct vdec_ap_ipi_get_param - for AP_IPIMSG_DEC_GET_PARAM
> + * @msg_id   : AP_IPIMSG_DEC_GET_PARAM
> + * @inst_id : instance ID. Used if the ABI version >= 2.
> + * @data : picture information
> + * @param_type   : get param type
> + * @codec_type   : Codec fourcc
> + */
> +struct vdec_ap_ipi_get_param {
> + u32 msg_id;
> + u32 inst_id;
> + u32 data[4];
> + u32 param_type;
> + u32 codec_type;
> +};
> +
> +/**
> + * struct vdec_vpu_ipi_get_param_ack - for VPU_IPIMSG_DEC_GET_PARAM_ACK
> + * @msg_id   : VPU_IPIMSG_DEC_GET_PARAM_ACK
> + * @status   : VPU execution result
> + * @ap_inst_addr : AP vcodec_vpu_inst instance address
> + * @data : picture information from SCP.
> + * @param_type   : get param type
> + * @reserved : reserved param
> + */
> +struct vdec_vpu_ipi_get_param_ack {
> + u32 msg_id;
> + s32 status;
> + u64 ap_inst_addr;
> + u32 data[4];
> + u32 param_type;
> + u32 reserved;
> +};
> +
>  #endif
> diff --git a/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c 
> b/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c
> index 7210061c772f..35f4d5583084 100644
> --- a/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c
> +++ b/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c
> @@ -6,6 +6,7 @@
>  
>  #include "mtk_vcodec_drv.h"
>  #include "mtk_vcodec_util.h"
> +#include "vdec_drv_if.h"
>  #include "vdec_ipi_msg.h"
>  #include "vdec_vpu_if.h"
>  #include "mtk_vcodec_fw.h"
> @@ -54,6 +55,26 @@ static void handle_init_ack_msg(const struct 
> vdec_vpu_ipi_init_ack *msg)
>   }
>  }
>  
> +static void handle_get_param_msg_ack(const struct vdec_vpu_ipi_get_param_ack 
> *msg)
> +{
> + struct vdec_vpu_inst *vpu = (struct vdec_vpu_inst *)
> + (unsigned long)msg->ap_inst_addr;
> +
> + mtk_vcodec_debug(vpu, "+ ap_inst_addr = 0x%llx", msg->ap_inst_addr);
> +
> + /* param_type is enum vdec_get_param_type */
> + switch (msg->param_type) {
> + case GET_PARAM_PIC_INFO:
> + vpu->fb_sz[0] = msg->data[0];
> + vpu->fb_sz[1] = msg->data[1];
> + break;
> + default:
> + mtk_vcodec_err(vpu, "invalid get param type=%d", 
> msg->param_type);
> + vpu->failure = 1;
> + break;
> + }
> +}
> +
>  /*
>   * vpu_dec_ipi_handler - Handler for VPU ipi message.
>   *
> @@ -89,6 +110,9 @@ static void vpu_dec_ipi_handler(void *data, unsigned int 
> len, void *priv)
>   case VPU_IPIMSG_DEC_CORE_END_ACK:
>   break;
>  
> + case VPU_IPIMSG_DEC_GET_PARAM_ACK:
> + handle_get_param_msg_ack(data);
> + break;
>   default:
>   mtk_vcodec_err(vpu, "invalid msg=%X", msg->msg_id);
>   break;
> @@ -217,6 +241,31 @@ int vpu_dec_start(struct vdec_vpu_inst *vpu, uint32_t 
> *data, unsigned int len)
>   return err;
>  }
>  
> +int vpu_dec_get_param(struct vdec_vpu_inst *vpu, uint32_t *data,
> +   unsigned int len, unsigned int param_type)
> +{
> + struct vdec_ap_ipi_get_param msg;
> + int err;
> +
> + mtk_vcodec_debug_enter(vpu);
> +
> + if (len > ARRAY_SIZE(msg.data)) {
> + mtk_vcodec_err(vpu, "invalid len = %d\n", len);
> + return -EINVAL;
> + }
> +
> + memset(, 0, sizeof(msg));
> + msg.msg_id = AP_IPIMSG_DEC_GET_PARAM;
> + msg.inst_id = vpu->inst_id;
> + memcpy(msg.data, data, sizeof(unsigned int) * len);
> + 

Re: [PATCH v7, 07/15] media: mtk-vcodec: Refactor supported vdec formats and framesizes

2022-03-01 Thread Nicolas Dufresne
Le mercredi 23 février 2022 à 11:40 +0800, Yunfei Dong a écrit :
> Supported output and capture format types for mt8192 are different
> with mt8183. Needs to get format types according to decoder capability.

This patch is both refactoring and changing the behaviour. Can you please split
the non-functional changes from the functional one. This ensure we can proceed
with a good review of the functional changes.

regards,
Nicolas

> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../platform/mtk-vcodec/mtk_vcodec_dec.c  |   8 +-
>  .../mtk-vcodec/mtk_vcodec_dec_stateful.c  |  13 +-
>  .../mtk-vcodec/mtk_vcodec_dec_stateless.c | 117 +-
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |  13 +-
>  4 files changed, 107 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> index 304f5afbd419..bae43938ee37 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> @@ -26,7 +26,7 @@ mtk_vdec_find_format(struct v4l2_format *f,
>   const struct mtk_video_fmt *fmt;
>   unsigned int k;
>  
> - for (k = 0; k < dec_pdata->num_formats; k++) {
> + for (k = 0; k < *dec_pdata->num_formats; k++) {
>   fmt = _pdata->vdec_formats[k];
>   if (fmt->fourcc == f->fmt.pix_mp.pixelformat)
>   return fmt;
> @@ -525,7 +525,7 @@ static int vidioc_enum_framesizes(struct file *file, void 
> *priv,
>   if (fsize->index != 0)
>   return -EINVAL;
>  
> - for (i = 0; i < dec_pdata->num_framesizes; ++i) {
> + for (i = 0; i < *dec_pdata->num_framesizes; ++i) {
>   if (fsize->pixel_format != dec_pdata->vdec_framesizes[i].fourcc)
>   continue;
>  
> @@ -564,7 +564,7 @@ static int vidioc_enum_fmt(struct v4l2_fmtdesc *f, void 
> *priv,
>   const struct mtk_video_fmt *fmt;
>   int i, j = 0;
>  
> - for (i = 0; i < dec_pdata->num_formats; i++) {
> + for (i = 0; i < *dec_pdata->num_formats; i++) {
>   if (output_queue &&
>   dec_pdata->vdec_formats[i].type != MTK_FMT_DEC)
>   continue;
> @@ -577,7 +577,7 @@ static int vidioc_enum_fmt(struct v4l2_fmtdesc *f, void 
> *priv,
>   ++j;
>   }
>  
> - if (i == dec_pdata->num_formats)
> + if (i == *dec_pdata->num_formats)
>   return -EINVAL;
>  
>   fmt = _pdata->vdec_formats[i];
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c
> index 7966c132be8f..3f33beb9c551 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c
> @@ -37,7 +37,9 @@ static const struct mtk_video_fmt mtk_video_formats[] = {
>   },
>  };
>  
> -#define NUM_FORMATS ARRAY_SIZE(mtk_video_formats)
> +static const unsigned int num_supported_formats =
> + ARRAY_SIZE(mtk_video_formats);
> +
>  #define DEFAULT_OUT_FMT_IDX 0
>  #define DEFAULT_CAP_FMT_IDX 3
>  
> @@ -59,7 +61,8 @@ static const struct mtk_codec_framesizes 
> mtk_vdec_framesizes[] = {
>   },
>  };
>  
> -#define NUM_SUPPORTED_FRAMESIZE ARRAY_SIZE(mtk_vdec_framesizes)
> +static const unsigned int num_supported_framesize =
> + ARRAY_SIZE(mtk_vdec_framesizes);
>  
>  /*
>   * This function tries to clean all display buffers, the buffers will return
> @@ -235,7 +238,7 @@ static void mtk_vdec_update_fmt(struct mtk_vcodec_ctx 
> *ctx,
>   unsigned int k;
>  
>   dst_q_data = >q_data[MTK_Q_DATA_DST];
> - for (k = 0; k < NUM_FORMATS; k++) {
> + for (k = 0; k < num_supported_formats; k++) {
>   fmt = _video_formats[k];
>   if (fmt->fourcc == pixelformat) {
>   mtk_v4l2_debug(1, "Update cap fourcc(%d -> %d)",
> @@ -617,11 +620,11 @@ const struct mtk_vcodec_dec_pdata mtk_vdec_8173_pdata = 
> {
>   .ctrls_setup = mtk_vcodec_dec_ctrls_setup,
>   .vdec_vb2_ops = _vdec_frame_vb2_ops,
>   .vdec_formats = mtk_video_formats,
> - .num_formats = NUM_FORMATS,
> + .num_formats = _supported_formats,
>   .default_out_fmt = _video_formats[DEFAULT_OUT_FMT_IDX],
>   .default_cap_fmt = _video_formats[DEFAULT_CAP_FMT_IDX],
>   .vdec_framesizes = mtk_vdec_framesizes,
> - .num_framesizes = NUM_SUPPORTED_FRAMESIZE,
> + .num_framesizes = _supported_framesize,
>   .worker = mtk_vdec_worker,
>   .flush_decoder = mtk_vdec_flush_decoder,
>   .is_subdev_supported = false,
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> index 6d481410bf89..e51d935bd21d 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> @@ -81,33 +81,23 @@ static 

Re: [PATCH v7, 04/15] media: mtk-vcodec: Read max resolution from dec_capability

2022-02-28 Thread Nicolas Dufresne
Hi Yunfei,

this patch does not work unless userland calls enum_framesizes, which is
completely optional. See comment and suggestion below.

Le mercredi 23 février 2022 à 11:39 +0800, Yunfei Dong a écrit :
> Supported max resolution for different platforms are not the same: 2K
> or 4K, getting it according to dec_capability.
> 
> Signed-off-by: Yunfei Dong 
> Reviewed-by: Tzung-Bi Shih
> ---
>  .../platform/mtk-vcodec/mtk_vcodec_dec.c  | 29 +++
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |  4 +++
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> index 130ecef2e766..304f5afbd419 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> @@ -152,13 +152,15 @@ void mtk_vcodec_dec_set_default_params(struct 
> mtk_vcodec_ctx *ctx)
>   q_data->coded_height = DFT_CFG_HEIGHT;
>   q_data->fmt = ctx->dev->vdec_pdata->default_cap_fmt;
>   q_data->field = V4L2_FIELD_NONE;
> + ctx->max_width = MTK_VDEC_MAX_W;
> + ctx->max_height = MTK_VDEC_MAX_H;
>  
>   v4l_bound_align_image(_data->coded_width,
>   MTK_VDEC_MIN_W,
> - MTK_VDEC_MAX_W, 4,
> + ctx->max_width, 4,
>   _data->coded_height,
>   MTK_VDEC_MIN_H,
> - MTK_VDEC_MAX_H, 5, 6);
> + ctx->max_height, 5, 6);
>  
>   q_data->sizeimage[0] = q_data->coded_width * q_data->coded_height;
>   q_data->bytesperline[0] = q_data->coded_width;
> @@ -217,7 +219,7 @@ static int vidioc_vdec_subscribe_evt(struct v4l2_fh *fh,
>   }
>  }
>  
> -static int vidioc_try_fmt(struct v4l2_format *f,
> +static int vidioc_try_fmt(struct mtk_vcodec_ctx *ctx, struct v4l2_format *f,
> const struct mtk_video_fmt *fmt)
>  {
>   struct v4l2_pix_format_mplane *pix_fmt_mp = >fmt.pix_mp;
> @@ -225,9 +227,9 @@ static int vidioc_try_fmt(struct v4l2_format *f,
>   pix_fmt_mp->field = V4L2_FIELD_NONE;
>  
>   pix_fmt_mp->width =
> - clamp(pix_fmt_mp->width, MTK_VDEC_MIN_W, MTK_VDEC_MAX_W);
> + clamp(pix_fmt_mp->width, MTK_VDEC_MIN_W, ctx->max_width);
>   pix_fmt_mp->height =
> - clamp(pix_fmt_mp->height, MTK_VDEC_MIN_H, MTK_VDEC_MAX_H);
> + clamp(pix_fmt_mp->height, MTK_VDEC_MIN_H, ctx->max_height);
>  
>   if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
>   pix_fmt_mp->num_planes = 1;
> @@ -245,16 +247,16 @@ static int vidioc_try_fmt(struct v4l2_format *f,
>   tmp_h = pix_fmt_mp->height;
>   v4l_bound_align_image(_fmt_mp->width,
>   MTK_VDEC_MIN_W,
> - MTK_VDEC_MAX_W, 6,
> + ctx->max_width, 6,
>   _fmt_mp->height,
>   MTK_VDEC_MIN_H,
> - MTK_VDEC_MAX_H, 6, 9);
> + ctx->max_height, 6, 9);
>  
>   if (pix_fmt_mp->width < tmp_w &&
> - (pix_fmt_mp->width + 64) <= MTK_VDEC_MAX_W)
> + (pix_fmt_mp->width + 64) <= ctx->max_width)
>   pix_fmt_mp->width += 64;
>   if (pix_fmt_mp->height < tmp_h &&
> - (pix_fmt_mp->height + 64) <= MTK_VDEC_MAX_H)
> + (pix_fmt_mp->height + 64) <= ctx->max_height)
>   pix_fmt_mp->height += 64;
>  
>   mtk_v4l2_debug(0,
> @@ -294,7 +296,7 @@ static int vidioc_try_fmt_vid_cap_mplane(struct file 
> *file, void *priv,
>   fmt = mtk_vdec_find_format(f, dec_pdata);
>   }
>  
> - return vidioc_try_fmt(f, fmt);
> + return vidioc_try_fmt(ctx, f, fmt);
>  }
>  
>  static int vidioc_try_fmt_vid_out_mplane(struct file *file, void *priv,
> @@ -317,7 +319,7 @@ static int vidioc_try_fmt_vid_out_mplane(struct file 
> *file, void *priv,
>   return -EINVAL;
>   }
>  
> - return vidioc_try_fmt(f, fmt);
> + return vidioc_try_fmt(ctx, f, fmt);
>  }
>  
>  static int vidioc_vdec_g_selection(struct file *file, void *priv,
> @@ -445,7 +447,7 @@ static int vidioc_vdec_s_fmt(struct file *file, void 
> *priv,
>   return -EINVAL;
>  
>   q_data->fmt = fmt;
> - vidioc_try_fmt(f, q_data->fmt);
> + vidioc_try_fmt(ctx, f, q_data->fmt);
>   if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
>   q_data->sizeimage[0] = pix_mp->plane_fmt[0].sizeimage;
>   q_data->coded_width = pix_mp->width;
> @@ -545,6 +547,9 @@ static int vidioc_enum_framesizes(struct file *file, void 
> *priv,
>   fsize->stepwise.min_height,
>  

Re: [PATCH v6, 06/15] media: mtk-vcodec: Refactor get and put capture buffer flow

2022-02-17 Thread Nicolas Dufresne
Le jeudi 17 février 2022 à 17:03 +0800, yunfei.d...@mediatek.com a écrit :
> > > - ret = vdec_if_decode(ctx, bs_src, dst_buf, _chg);
> > > + ret = vdec_if_decode(ctx, bs_src, NULL, _chg);
> > >   if (ret) {
> > >   mtk_v4l2_err(" <===[%d], src_buf[%d] sz=0x%zx pts=%llu
> > > vdec_if_decode() ret=%d res_chg=%d===>",
> > >    ctx->id, vb2_src->index, bs_src->size,
> > > @@ -220,12 +266,9 @@ static void mtk_vdec_worker(struct work_struct
> > > *work)
> > >   }
> > >   }
> > >   
> > > - mtk_vdec_stateless_set_dst_payload(ctx, dst_buf);
> > > -
> > > - v4l2_m2m_buf_done_and_job_finish(dev->m2m_dev_dec, ctx-
> > > > m2m_ctx,
> > > -  ret ? VB2_BUF_STATE_ERROR :
> > > VB2_BUF_STATE_DONE);
> > > -
> > > + mtk_vdec_stateless_out_to_done(ctx, bs_src, ret);
> > 
> > v4l2_m2m_buf_done_and_job_finish() was specially crafted to prevent
> > developer
> > from implementing the signalling of the request at the wrong moment.
> > This patch
> > broke this strict ordering. The relevant comment in the helper
> > function:
> > 
> > 
> As we discussed in chat, please help to check whether it's possible to
> let lat and core decode in parallel.

Thanks, Benjamin is looking into that. For the mailing list here, here's some
prior art for a similar problem found by downstream RPi4 HEVC driver developer.
The general problem here is that we don't want to signal the request until the
decode have complete, yet we want to pick and run second (concurrent job) so
that parallel decoding is made possible. For RPi4 it is not multi-core, but the
decoding is split in 2 stages, and the decoder run both stages concurrently,
which basically means, we need to be able to run two jobs at the same time
whenever possible.

https://github.com/raspberrypi/linux/commit/964be1d20e2f1335915a6bf8c82a3199bfddf8ac

This introduce media_request_pin/unpin, but being able to pin a request and not
have it bound to any other object lifetime anymore seems a bit error prone in
comparison to the current restrictions. Comments welcome !

> 
> I will continue to fix h264 issue.

Thanks.

> 
> Thanks for your help.
> 
> Best Regards,
> Yunfei Dong



Re: [PATCH v6, 06/15] media: mtk-vcodec: Refactor get and put capture buffer flow

2022-01-28 Thread Nicolas Dufresne
Hi Yunfei,

thanks for you work, see comments below...

Le samedi 22 janvier 2022 à 11:53 +0800, Yunfei Dong a écrit :
> For lat and core decode in parallel, need to get capture buffer
> when core start to decode and put capture buffer to display
> list when core decode done.
> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../mtk-vcodec/mtk_vcodec_dec_stateless.c | 121 --
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |   5 +-
>  .../mtk-vcodec/vdec/vdec_h264_req_if.c|  16 ++-
>  3 files changed, 102 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> index 23a154c4e321..6d481410bf89 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> @@ -108,37 +108,87 @@ static const struct mtk_codec_framesizes 
> mtk_vdec_framesizes[] = {
>  
>  #define NUM_SUPPORTED_FRAMESIZE ARRAY_SIZE(mtk_vdec_framesizes)
>  
> -static void mtk_vdec_stateless_set_dst_payload(struct mtk_vcodec_ctx *ctx,
> -struct vdec_fb *fb)
> +static void mtk_vdec_stateless_out_to_done(struct mtk_vcodec_ctx *ctx,
> +struct mtk_vcodec_mem *bs, int error)
>  {
> - struct mtk_video_dec_buf *vdec_frame_buf =
> - container_of(fb, struct mtk_video_dec_buf, frame_buffer);
> - struct vb2_v4l2_buffer *vb = _frame_buf->m2m_buf.vb;
> - unsigned int cap_y_size = ctx->q_data[MTK_Q_DATA_DST].sizeimage[0];
> + struct mtk_video_dec_buf *out_buf;
> + struct vb2_v4l2_buffer *vb;
>  
> - vb2_set_plane_payload(>vb2_buf, 0, cap_y_size);
> - if (ctx->q_data[MTK_Q_DATA_DST].fmt->num_planes == 2) {
> - unsigned int cap_c_size =
> - ctx->q_data[MTK_Q_DATA_DST].sizeimage[1];
> + if (!bs) {
> + mtk_v4l2_err("Free bitstream buffer fail.");
> + return;
> + }
> + out_buf = container_of(bs, struct mtk_video_dec_buf, bs_buffer);
> + vb = _buf->m2m_buf.vb;
>  
> - vb2_set_plane_payload(>vb2_buf, 1, cap_c_size);
> + mtk_v4l2_debug(2, "Free bitsteam buffer id = %d to done_list",
> +vb->vb2_buf.index);
> +
> + v4l2_m2m_src_buf_remove(ctx->m2m_ctx);
> + if (error) {
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_ERROR);
> + if (error == -EIO)
> + out_buf->error = true;
> + } else {
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_DONE);
>   }
>  }
>  
> -static struct vdec_fb *vdec_get_cap_buffer(struct mtk_vcodec_ctx *ctx,
> -struct vb2_v4l2_buffer *vb2_v4l2)
> +static void mtk_vdec_stateless_cap_to_disp(struct mtk_vcodec_ctx *ctx,
> +struct vdec_fb *fb, int error)
>  {
> - struct mtk_video_dec_buf *framebuf =
> - container_of(vb2_v4l2, struct mtk_video_dec_buf, m2m_buf.vb);
> - struct vdec_fb *pfb = >frame_buffer;
> - struct vb2_buffer *dst_buf = _v4l2->vb2_buf;
> + struct mtk_video_dec_buf *vdec_frame_buf;
> + struct vb2_v4l2_buffer *vb;
> + unsigned int cap_y_size, cap_c_size;
> +
> + if (!fb) {
> + mtk_v4l2_err("Free frame buffer fail.");
> + return;
> + }
> + vdec_frame_buf = container_of(fb, struct mtk_video_dec_buf,
> +   frame_buffer);
> + vb = _frame_buf->m2m_buf.vb;
> +
> + cap_y_size = ctx->q_data[MTK_Q_DATA_DST].sizeimage[0];
> + cap_c_size = ctx->q_data[MTK_Q_DATA_DST].sizeimage[1];
> +
> + v4l2_m2m_dst_buf_remove(ctx->m2m_ctx);
>  
> - pfb->base_y.va = NULL;
> + vb2_set_plane_payload(>vb2_buf, 0, cap_y_size);
> + if (ctx->q_data[MTK_Q_DATA_DST].fmt->num_planes == 2)
> + vb2_set_plane_payload(>vb2_buf, 1, cap_c_size);
> +
> + mtk_v4l2_debug(2, "Free frame buffer id = %d to done_list",
> +vb->vb2_buf.index);
> + if (error)
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_ERROR);
> + else
> + v4l2_m2m_buf_done(vb, VB2_BUF_STATE_DONE);
> +}
> +
> +static struct vdec_fb *vdec_get_cap_buffer(struct mtk_vcodec_ctx *ctx)
> +{
> + struct mtk_video_dec_buf *framebuf;
> + struct vb2_v4l2_buffer *vb2_v4l2;
> + struct vb2_buffer *dst_buf;
> + struct vdec_fb *pfb;
> +
> + vb2_v4l2 = v4l2_m2m_next_dst_buf(ctx->m2m_ctx);
> + if (!vb2_v4l2) {
> + mtk_v4l2_debug(1, "[%d] dst_buf empty!!", ctx->id);
> + return NULL;
> + }
> +
> + dst_buf = _v4l2->vb2_buf;
> + framebuf = container_of(vb2_v4l2, struct mtk_video_dec_buf, m2m_buf.vb);
> +
> + pfb = >frame_buffer;
> + pfb->base_y.va = vb2_plane_vaddr(dst_buf, 0);
>   pfb->base_y.dma_addr = vb2_dma_contig_plane_dma_addr(dst_buf, 0);
>   pfb->base_y.size = 

Re: [PATCH v4, 00/15] media: mtk-vcodec: support for MT8192 decoder

2022-01-11 Thread Nicolas Dufresne
Hello Yunfei,

Le lundi 10 janvier 2022 à 16:34 +0800, Yunfei Dong a écrit :
> This series adds support for mt8192 h264/vp8/vp9 decoder drivers. Firstly, 
> refactor
> power/clock/interrupt interfaces for mt8192 is lat and core architecture.
> 
> Secondly, add new functions to get frame buffer size and resolution according
> to decoder capability from scp side. Then add callback function to get/put
> capture buffer in order to enable lat and core decoder in parallel. 
> 
> Then add to support MT21C compressed mode and fix v4l2-compliance fail.

Perhaps you wanted to append the referred v4l2-compliance output (fixed) ?

As we started doing with other codec driver submission (just did last month for
NXP), can you state which software this driver was tested with ? I have started
receiving feedback from third party that MTK driver support is not reproducible,
I would like to work with you to fix the situation.

regards,
Nicolas

> 
> Next, extract H264 request api driver to let mt8183 and mt8192 use the same
> code, and adds mt8192 frame based h264 driver for stateless decoder.
> 
> Lastly, add vp8 and vp9 stateless decoder drivers.
> 
> Patches 1 to refactor power/clock/interrupt interface.
> Patches 2~4 get frame buffer size and resolution according to decoder 
> capability.
> Patches 5~6 enable lat and core decode in parallel.
> Patch 7~10 add to support MT21C compressed mode and fix v4l2-compliance fail.
> patch 11 record capture queue format type.
> Patch 12~13 extract h264 driver and add mt8192 frame based driver for h264 
> decoder.
> Patch 14~15 add vp8 and vp9 stateless decoder drivers.
> 
> Dependents on "Support multi hardware decode using of_platform_populate"[1].
> 
> This patches are the second part used to add mt8192 h264 decoder. And the 
> base part is [1].
> 
> [1]https://patchwork.linuxtv.org/project/linux-media/cover/20211215061552.8523-1-yunfei.d...@mediatek.com/
> ---
> changes compared with v3:
> - remove enum mtk_chip for patch 2.
> - add vp8 stateless decoder drivers for patch 14.
> - add vp9 stateless decoder drivers for patch 15.
> changes compared with v2:
> - add new patch 11 to record capture queue format type.
> - separate patch 4 according to tzung-bi's suggestion.
> - re-write commit message for patch 5 according to tzung-bi's suggestion.
> changes compared with v1:
> - rewrite commit message for patch 12.
> - rewrite cover-letter message.
> ---
> Yunfei Dong (15):
>   media: mtk-vcodec: Add vdec enable/disable hardware helpers
>   media: mtk-vcodec: Using firmware type to separate different firmware
> architecture
>   media: mtk-vcodec: get capture queue buffer size from scp
>   media: mtk-vcodec: Read max resolution from dec_capability
>   media: mtk-vcodec: Call v4l2_m2m_set_dst_buffered() set capture buffer
> buffered
>   media: mtk-vcodec: Refactor get and put capture buffer flow
>   media: mtk-vcodec: Refactor supported vdec formats and framesizes
>   media: mtk-vcodec: Add format to support MT21C
>   media: mtk-vcodec: disable vp8 4K capability
>   media: mtk-vcodec: Fix v4l2-compliance fail
>   media: mtk-vcodec: record capture queue format type
>   media: mtk-vcodec: Extract H264 common code
>   media: mtk-vcodec: Add h264 decoder driver for mt8192
>   media: mtk-vcodec: Add vp8 decoder driver for mt8192
>   media: mtk-vcodec: Add vp9 decoder driver for mt8192
> 
>  drivers/media/platform/mtk-vcodec/Makefile|4 +
>  .../platform/mtk-vcodec/mtk_vcodec_dec.c  |   49 +-
>  .../platform/mtk-vcodec/mtk_vcodec_dec_drv.c  |5 -
>  .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c   |  168 +-
>  .../platform/mtk-vcodec/mtk_vcodec_dec_pm.h   |6 +-
>  .../mtk-vcodec/mtk_vcodec_dec_stateful.c  |   14 +-
>  .../mtk-vcodec/mtk_vcodec_dec_stateless.c |  284 ++-
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |   40 +-
>  .../platform/mtk-vcodec/mtk_vcodec_enc_drv.c  |5 -
>  .../media/platform/mtk-vcodec/mtk_vcodec_fw.c |6 +
>  .../media/platform/mtk-vcodec/mtk_vcodec_fw.h |1 +
>  .../mtk-vcodec/vdec/vdec_h264_req_common.c|  311 +++
>  .../mtk-vcodec/vdec/vdec_h264_req_common.h|  254 ++
>  .../mtk-vcodec/vdec/vdec_h264_req_if.c|  416 +---
>  .../mtk-vcodec/vdec/vdec_h264_req_multi_if.c  |  605 +
>  .../mtk-vcodec/vdec/vdec_vp8_req_if.c |  445 
>  .../mtk-vcodec/vdec/vdec_vp9_req_lat_if.c | 2066 +
>  .../media/platform/mtk-vcodec/vdec_drv_if.c   |   36 +-
>  .../media/platform/mtk-vcodec/vdec_drv_if.h   |3 +
>  .../media/platform/mtk-vcodec/vdec_ipi_msg.h  |   37 +
>  .../platform/mtk-vcodec/vdec_msg_queue.c  |2 +
>  .../media/platform/mtk-vcodec/vdec_vpu_if.c   |   54 +-
>  .../media/platform/mtk-vcodec/vdec_vpu_if.h   |   15 +
>  .../media/platform/mtk-vcodec/venc_vpu_if.c   |2 +-
>  include/linux/remoteproc/mtk_scp.h|2 +
>  25 files changed, 4248 insertions(+), 582 deletions(-)
>  create mode 100644 
> 

Re: [PATCH v1, 12/12] media: mtk-vcodec: Add h264 slice api driver for mt8192

2021-12-15 Thread Nicolas Dufresne
Hi Yunfei,

Le mercredi 15 décembre 2021 à 14:59 +0800, Yunfei Dong a écrit :
> From: Yunfei Dong 
> 
> Adds h264 lat and core driver for mt8192.

This is purely a nit, but I have first notice the usage of "slice" in the
namespace and the title, which lead me to think this new platform was
V4L2_STATELESS_H264_DECODE_MODE_SLICE_BASED. I think some structure which are
clearly frame_based should probably be renamed (its the namespace that is
confusing) to reduce the confusion.

p.s. Note that adding slice_based mode would be amazing for streaming with ultra
low latency (think remote video games)

regards,
Nicolas

> 
> Signed-off-by: Yunfei Dong 
> ---
>  drivers/media/platform/mtk-vcodec/Makefile|   1 +
>  .../mtk-vcodec/vdec/vdec_h264_req_lat_if.c| 620 ++
>  .../media/platform/mtk-vcodec/vdec_drv_if.c   |   8 +-
>  .../media/platform/mtk-vcodec/vdec_drv_if.h   |   1 +
>  include/linux/remoteproc/mtk_scp.h|   2 +
>  5 files changed, 631 insertions(+), 1 deletion(-)
>  create mode 100644 
> drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_lat_if.c
> 
> diff --git a/drivers/media/platform/mtk-vcodec/Makefile 
> b/drivers/media/platform/mtk-vcodec/Makefile
> index 3f41d748eee5..1777d7606f0d 100644
> --- a/drivers/media/platform/mtk-vcodec/Makefile
> +++ b/drivers/media/platform/mtk-vcodec/Makefile
> @@ -10,6 +10,7 @@ mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
>   vdec/vdec_vp9_if.o \
>   vdec/vdec_h264_req_if.o \
>   vdec/vdec_h264_req_common.o \
> + vdec/vdec_h264_req_lat_if.o \
>   mtk_vcodec_dec_drv.o \
>   vdec_drv_if.o \
>   vdec_vpu_if.o \
> diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_lat_if.c 
> b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_lat_if.c
> new file mode 100644
> index ..403d7df00e1d
> --- /dev/null
> +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_lat_if.c
> @@ -0,0 +1,620 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 MediaTek Inc.
> + * Author: Yunfei Dong 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "../mtk_vcodec_util.h"
> +#include "../mtk_vcodec_dec.h"
> +#include "../mtk_vcodec_intr.h"
> +#include "../vdec_drv_base.h"
> +#include "../vdec_drv_if.h"
> +#include "../vdec_vpu_if.h"
> +#include "vdec_h264_req_common.h"
> +
> +/**
> + * enum vdec_h264_core_dec_err_type  - core decode error type
> + */
> +enum vdec_h264_core_dec_err_type {
> + TRANS_BUFFER_FULL = 1,
> + SLICE_HEADER_FULL,
> +};
> +
> +/**
> + * struct vdec_h264_slice_lat_dec_param  - parameters for decode current 
> frame
> + * @sps : h264 sps syntax parameters
> + * @pps : h264 pps syntax parameters
> + * @slice_header: h264 slice header syntax parameters
> + * @scaling_matrix : h264 scaling list parameters
> + * @decode_params : decoder parameters of each frame used for hardware decode
> + * @h264_dpb_info : dpb reference list
> + */
> +struct vdec_h264_slice_lat_dec_param {
> + struct mtk_h264_sps_param sps;
> + struct mtk_h264_pps_param pps;
> + struct mtk_h264_slice_hd_param slice_header;
> + struct slice_api_h264_scaling_matrix scaling_matrix;
> + struct slice_api_h264_decode_param decode_params;
> + struct mtk_h264_dpb_info h264_dpb_info[V4L2_H264_NUM_DPB_ENTRIES];
> +};
> +
> +/**
> + * struct vdec_h264_slice_info - decode information
> + * @nal_info: nal info of current picture
> + * @timeout : Decode timeout: 1 timeout, 0 no timeount
> + * @bs_buf_size : bitstream size
> + * @bs_buf_addr : bitstream buffer dma address
> + * @y_fb_dma: Y frame buffer dma address
> + * @c_fb_dma: C frame buffer dma address
> + * @vdec_fb_va  : VDEC frame buffer struct virtual address
> + * @crc : Used to check whether hardware's status is right
> + */
> +struct vdec_h264_slice_info {
> + uint16_t nal_info;
> + uint16_t timeout;
> + uint32_t bs_buf_size;
> + uint64_t bs_buf_addr;
> + uint64_t y_fb_dma;
> + uint64_t c_fb_dma;
> + uint64_t vdec_fb_va;
> + uint32_t crc[8];
> +};
> +
> +/**
> + * struct vdec_h264_slice_vsi - shared memory for decode information exchange
> + *between VPU and Host. The memory is allocated by VPU then mapping 
> to
> + *Host in vdec_h264_slice_init() and freed in 
> vdec_h264_slice_deinit()
> + *by VPU. AP-W/R : AP is writer/reader on this item. VPU-W/R: VPU is
> + *write/reader on this item.
> + * @wdma_err_addr   : wdma error dma address
> + * @wdma_start_addr : wdma start dma address
> + * @wdma_end_addr   : wdma end dma address
> + * @slice_bc_start_addr : slice bc start dma address
> + * @slice_bc_end_addr   : slice bc end dma address
> + * @row_info_start_addr : row info start dma address
> + * @row_info_end_addr   : row info end dma address
> + * @trans_start : trans start dma address
> + * @trans_end   : 

Re: [PATCH v2] media: mtk-vcodec: Align width and height to 64 bytes

2021-11-03 Thread Nicolas Dufresne
Le mercredi 03 novembre 2021 à 11:37 +0800, Yunfei Dong a écrit :
> Width and height need to 64 bytes aligned when setting the format.
> Need to make sure all is 64 bytes align when use width and height to
> calculate buffer size.
> 
> Signed-off-by: Yunfei Dong 
> Change-Id: I39886b1a6b433c92565ddbf297eb193456eec1d2

Perhaps avoid this tag later ? Another perhaps, there is a tag to indicate which
patch introduce that bug, if you add this tag, the patch will be automatically
backported into relevant stable kernel. The format is:

> Fixes:  ("

> ---
>  drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.h| 1 +
>  drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c | 4 ++--
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.h 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.h
> index e30806c1faea..66cd6d2242c3 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.h
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.h
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  
> +#define VCODEC_DEC_ALIGNED_64 64
>  #define VCODEC_CAPABILITY_4K_DISABLED0x10
>  #define VCODEC_DEC_4K_CODED_WIDTH4096U
>  #define VCODEC_DEC_4K_CODED_HEIGHT   2304U
> diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c 
> b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> index d402fc4bda69..e1a3011772a9 100644
> --- a/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> @@ -562,8 +562,8 @@ static void get_pic_info(struct vdec_h264_slice_inst 
> *inst,
>  {
>   struct mtk_vcodec_ctx *ctx = inst->ctx;
>  
> - ctx->picinfo.buf_w = (ctx->picinfo.pic_w + 15) & 0xFFF0;
> - ctx->picinfo.buf_h = (ctx->picinfo.pic_h + 31) & 0xFFE0;
> + ctx->picinfo.buf_w = ALIGN(ctx->picinfo.pic_w, VCODEC_DEC_ALIGNED_64);
> + ctx->picinfo.buf_h = ALIGN(ctx->picinfo.pic_h, VCODEC_DEC_ALIGNED_64);
>   ctx->picinfo.fb_sz[0] = ctx->picinfo.buf_w * ctx->picinfo.buf_h;
>   ctx->picinfo.fb_sz[1] = ctx->picinfo.fb_sz[0] >> 1;
>   inst->vsi_ctx.dec.cap_num_planes =



Re: DMA-buf and uncached system memory

2021-02-15 Thread Nicolas Dufresne
Le lundi 15 février 2021 à 09:58 +0100, Christian König a écrit :
> Hi guys,
> 
> we are currently working an Freesync and direct scan out from system 
> memory on AMD APUs in A+A laptops.
> 
> On problem we stumbled over is that our display hardware needs to scan 
> out from uncached system memory and we currently don't have a way to 
> communicate that through DMA-buf.
> 
> For our specific use case at hand we are going to implement something 
> driver specific, but the question is should we have something more 
> generic for this?

Hopefully I'm getting this right, but this makes me think of a long standing
issue I've met with Intel DRM and UVC driver. If I let the UVC driver allocate
the buffer, and import the resulting DMABuf (cacheable memory written with a cpu
copy in the kernel) into DRM, we can see cache artifact being displayed. While
if I use the DRM driver memory (dumb buffer in that case) it's clean because
there is a driver specific solution to that.

There is no obvious way for userspace application to know what's is right/wrong
way and in fact it feels like the kernel could solve this somehow without having
to inform userspace (perhaps).

> 
> After all the system memory access pattern is a PCIe extension and as 
> such something generic.
> 
> Regards,
> Christian.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: DMA-buf and uncached system memory

2021-02-15 Thread Nicolas Dufresne
Le lundi 15 février 2021 à 13:10 +0100, Christian König a écrit :
> 
> 
> Am 15.02.21 um 13:00 schrieb Thomas Zimmermann:
> > Hi
> > 
> > Am 15.02.21 um 10:49 schrieb Thomas Zimmermann:
> > > Hi
> > > 
> > > Am 15.02.21 um 09:58 schrieb Christian König:
> > > > Hi guys,
> > > > 
> > > > we are currently working an Freesync and direct scan out from system 
> > > > memory on AMD APUs in A+A laptops.
> > > > 
> > > > On problem we stumbled over is that our display hardware needs to 
> > > > scan out from uncached system memory and we currently don't have a 
> > > > way to communicate that through DMA-buf.
> > 
> > Re-reading this paragrah, it sounds more as if you want to let the 
> > exporter know where to move the buffer. Is this another case of the 
> > missing-pin-flag problem?
> 
> No, your original interpretation was correct. Maybe my writing is a bit 
> unspecific.
> 
> The real underlying issue is that our display hardware has a problem 
> with latency when accessing system memory.
> 
> So the question is if that also applies to for example Intel hardware or 
> other devices as well or if it is just something AMD specific?

I do believe that the answer is yes, Intel display have similar issue with
latency, hence requires un-cached memory.

> 
> Regards,
> Christian.
> 
> > 
> > Best regards
> > Thomas
> > 
> > > > 
> > > > For our specific use case at hand we are going to implement 
> > > > something driver specific, but the question is should we have 
> > > > something more generic for this?
> > > 
> > > For vmap operations, we return the address as struct dma_buf_map, 
> > > which contains additional information about the memory buffer. In 
> > > vram helpers, we have the interface drm_gem_vram_offset() that 
> > > returns the offset of the GPU device memory.
> > > 
> > > Would it be feasible to combine both concepts into a dma-buf 
> > > interface that returns the device-memory offset plus the additional 
> > > caching flag?
> > > 
> > > There'd be a structure and a getter function returning the structure.
> > > 
> > > struct dma_buf_offset {
> > >  bool cached;
> > >  u64 address;
> > > };
> > > 
> > > // return offset in *off
> > > int dma_buf_offset(struct dma_buf *buf, struct dma_buf_off *off);
> > > 
> > > Whatever settings are returned by dma_buf_offset() are valid while 
> > > the dma_buf is pinned.
> > > 
> > > Best regards
> > > Thomas
> > > 
> > > > 
> > > > After all the system memory access pattern is a PCIe extension and 
> > > > as such something generic.
> > > > 
> > > > Regards,
> > > > Christian.
> > > > ___
> > > > dri-devel mailing list
> > > > dri-devel@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > 
> > > 
> > > ___
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > 
> > 
> 


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/9] misc: Add Xilinx AI engine device driver

2020-12-09 Thread Nicolas Dufresne
Le mercredi 18 novembre 2020 à 00:06 -0800, Wendy Liang a écrit :
> Create AI engine device/partition hierarchical structure.
> 
> Each AI engine device can have multiple logical partitions(groups of AI
> engine tiles). Each partition is column based and has its own node ID
> in the system. AI engine device driver manages its partitions.
> 
> Applications can access AI engine partition through the AI engine
> partition driver instance. AI engine registers write is moved to kernel
> as there are registers in the AI engine array needs privilege
> permission.

Hi there, it's nice to see an effort to upstream an AI driver. I'm a little
worried this driver is not obvious to use from it's source code itself. So you
have reference to some Open Source code that demonstrate it's usage ?

> 
> Signed-off-by: Wendy Liang 
> Signed-off-by: Hyun Kwon 
> ---
>  MAINTAINERS    |   8 +
>  drivers/misc/Kconfig   |  12 +
>  drivers/misc/Makefile  |   1 +
>  drivers/misc/xilinx-ai-engine/Makefile |  11 +
>  drivers/misc/xilinx-ai-engine/ai-engine-aie.c  | 115 +
>  drivers/misc/xilinx-ai-engine/ai-engine-dev.c  | 448 ++
>  drivers/misc/xilinx-ai-engine/ai-engine-internal.h | 226 ++
>  drivers/misc/xilinx-ai-engine/ai-engine-part.c | 498
> +
>  drivers/misc/xilinx-ai-engine/ai-engine-res.c  | 114 +
>  include/uapi/linux/xlnx-ai-engine.h    | 107 +
>  10 files changed, 1540 insertions(+)
>  create mode 100644 drivers/misc/xilinx-ai-engine/Makefile
>  create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-aie.c
>  create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-dev.c
>  create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-internal.h
>  create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-part.c
>  create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-res.c
>  create mode 100644 include/uapi/linux/xlnx-ai-engine.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5cc595a..40e3351 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -19283,6 +19283,14 @@ T: git https://github.com/Xilinx/linux-xlnx.git
>  F: Documentation/devicetree/bindings/phy/xlnx,zynqmp-psgtr.yaml
>  F: drivers/phy/xilinx/phy-zynqmp.c
>  
> +XILINX AI ENGINE DRIVER
> +M: Wendy Liang 
> +S: Supported
> +F: Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
> +F: drivers/misc/xilinx-ai-engine/
> +F: include/linux/xlnx-ai-engine.h
> +F: include/uapi/linux/xlnx-ai-engine.h
> +
>  XILLYBUS DRIVER
>  M: Eli Billauer 
>  L: linux-ker...@vger.kernel.org
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index fafa8b0..0b8ce4d 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -444,6 +444,18 @@ config XILINX_SDFEC
>  
>   If unsure, say N.
>  
> +config XILINX_AIE
> +   tristate "Xilinx AI engine"
> +   depends on ARM64 || COMPILE_TEST
> +   help
> + This option enables support for the Xilinx AI engine driver.
> + One Xilinx AI engine device can have multiple partitions (groups of
> + AI engine tiles). Xilinx AI engine device driver instance manages
> + AI engine partitions. User application access its partitions through
> + AI engine partition instance file operations.
> +
> + If unsure, say N
> +
>  config MISC_RTSX
> tristate
> default MISC_RTSX_PCI || MISC_RTSX_USB
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index d23231e..2176b18 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -57,3 +57,4 @@ obj-$(CONFIG_HABANA_AI)   += habanalabs/
>  obj-$(CONFIG_UACCE)+= uacce/
>  obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
>  obj-$(CONFIG_HISI_HIKEY_USB)   += hisi_hikey_usb.o
> +obj-$(CONFIG_XILINX_AIE)   += xilinx-ai-engine/
> diff --git a/drivers/misc/xilinx-ai-engine/Makefile b/drivers/misc/xilinx-ai-
> engine/Makefile
> new file mode 100644
> index 000..7827a0a
> --- /dev/null
> +++ b/drivers/misc/xilinx-ai-engine/Makefile
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for Xilinx AI engine device driver
> +#
> +
> +obj-$(CONFIG_XILINX_AIE)   += xilinx-aie.o
> +
> +xilinx-aie-$(CONFIG_XILINX_AIE) := ai-engine-aie.o \
> +  ai-engine-dev.o \
> +  ai-engine-part.o \
> +  ai-engine-res.o
> diff --git a/drivers/misc/xilinx-ai-engine/ai-engine-aie.c
> b/drivers/misc/xilinx-ai-engine/ai-engine-aie.c
> new file mode 100644
> index 000..319260f
> --- /dev/null
> +++ b/drivers/misc/xilinx-ai-engine/ai-engine-aie.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Xilinx AI Engine driver AIE device specific implementation
> + *
> + * Copyright (C) 2020 Xilinx, Inc.
> 

Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation

2020-12-08 Thread Nicolas Dufresne
Le vendredi 13 novembre 2020 à 21:39 +0100, Daniel Vetter a écrit :
> On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote:
> > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter  wrote:
> > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > > > On Tue, 10 Nov 2020 at 09:19, John Stultz 
> > > > wrote:
> > > > > 
> > > > > Hey All,
> > > > >   So just wanted to send my last revision of my patch series
> > > > > of performance optimizations to the dma-buf system heap.
> > > > 
> > > > Thanks very much for your patches - I think the first 5 patches look
> > > > good to me.
> > > > 
> > > > I know there was a bit of discussion over adding a new system-uncached
> > > > heap v/s using a flag to identify that; I think I prefer the separate
> > > > heap idea, but lets ask one last time if any one else has any real
> > > > objections to it.
> > > > 
> > > > Daniel, Christian: any comments from your side on this?
> > > 
> > > I do wonder a bit where the userspace stack for this all is, since tuning
> > > allocators without a full stack is fairly pointless. dma-buf heaps is a
> > > bit in a limbo situation here it feels like.
> > 
> > As mentioned in the system-uncached patch:
> > Pending opensource users of this code include:
> > * AOSP HiKey960 gralloc:
> >   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
> >   - Visibly improves performance over the system heap
> > * AOSP Codec2 (possibly, needs more review):
> >   - 
> > https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> > 
> > Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
> > able to use the current dmabuf heaps instead of ION.
> > 
> > So I'm not sure what you mean by limbo, other than it being in a
> > transition state where the interface is upstream and we're working on
> > moving vendors to it from ION (which is staged to be dropped in 5.11).
> > Part of that work is making sure we don't regress the performance
> > expectations.
> 
> The mesa thing below, since if we test this with some downstream kernel
> drivers or at least non-mesa userspace I'm somewhat worried we're just
> creating a nice split world between the android gfx world and the
> mesa/linux desktop gfx world.
> 
> But then that's kinda how android rolls, so *shrug*
> 
> > > Plus I'm vary of anything related to leaking this kind of stuff beyond the
> > > dma-api because dma api maintainers don't like us doing that. But
> > > personally no concern on that front really, gpus need this. It's just that
> > > we do need solid justification I think if we land this. Hence back to
> > > first point.
> > > 
> > > Ideally first point comes in the form of benchmarking on android together
> > > with a mesa driver (or mesa + some v4l driver or whatever it takes to
> > > actually show the benefits, I have no idea).
> > 
> > Tying it with mesa is a little tough as the grallocs for mesa devices
> > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
> > allocation path for dmabuf heaps there gets a little complex as last I
> > tried that (when trying to get HiKey working with Lima graphics, as
> > gbm wouldn't allocate the contiguous buffers required by the display),
> > I ran into issues with the drm_hwcomposer and mesa expecting the gbm
> > private handle metadata in the buffer when it was passed in.
> > 
> > But I might take a look at it again. I got a bit lost digging through
> > the mesa gbm allocation paths last time.
> > 
> > I'll also try to see if I can find a benchmark for the codec2 code
> > (using dmabuf heaps with and without the uncached heap) on on db845c
> > (w/ mesa), as that is already working and I suspect that might be
> > close to what you're looking for.
> 
> tbh I think trying to push for this long term is the best we can hope for.
> 
> Media is also a lot more *meh* since it's deeply fragmented and a lot less
> of it upstream than on the gles/display side.

Sorry to jump in, but I'd like to reset a bit. The Media APIs are a lot more
generic, most of the kernel API is usable without specific knowledge of the HW.
Pretty much all APIs are exercised through v4l2-ctl and v4l2-compliance on the
V4L2 side (including performance testing). It would be pretty straight forward
to demonstrate the use of DMABuf heaps (just do live resolution switching,
you'll beat the internal V4L2 allocator without even looking at DMA cache
optimization).

> 
> I think confirming that this at least doesn't horrible blow up on a
> gralloc/gbm+mesa stack would be useful I think.
> -Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 00/49] DRM driver for Hikey 970

2020-08-27 Thread Nicolas Dufresne
Le mardi 25 août 2020 à 13:30 +0200, Mauro Carvalho Chehab a écrit :
> Em Tue, 25 Aug 2020 05:29:29 +1000
> Dave Airlie  escreveu:
> 
> > On Thu, 20 Aug 2020 at 20:02, Laurent Pinchart
> >  wrote:
> > > Hi Mauro,
> > > 
> > > On Thu, Aug 20, 2020 at 09:03:26AM +0200, Mauro Carvalho Chehab wrote:  
> > > > Em Wed, 19 Aug 2020 12:52:06 -0700 John Stultz escreveu:  
> > > > > On Wed, Aug 19, 2020 at 8:31 AM Laurent Pinchart wrote:  
> > > > > > On Wed, Aug 19, 2020 at 05:21:20PM +0200, Sam Ravnborg wrote:  
> > > > > > > On Wed, Aug 19, 2020 at 01:45:28PM +0200, Mauro Carvalho Chehab 
> > > > > > > wrote:  
> > > > > > > > This patch series port the out-of-tree driver for Hikey 970 
> > > > > > > > (which
> > > > > > > > should also support Hikey 960) from the official 96boards tree:
> > > > > > > > 
> > > > > > > >https://github.com/96boards-hikey/linux/tree/hikey970-v4.9
> > > > > > > > 
> > > > > > > > Based on his history, this driver seems to be originally written
> > > > > > > > for Kernel 4.4, and was later ported to Kernel 4.9. The original
> > > > > > > > driver used to depend on ION (from Kernel 4.4) and had its own
> > > > > > > > implementation for FB dev API.
> > > > > > > > 
> > > > > > > > As I need to preserve the original history (with has patches 
> > > > > > > > from
> > > > > > > > both HiSilicon and from Linaro),  I'm starting from the original
> > > > > > > > patch applied there. The remaining patches are incremental,
> > > > > > > > and port this driver to work with upstream Kernel.
> > > > > > > >  
> > > > > ...  
> > > > > > > > - Due to legal reasons, I need to preserve the authorship of
> > > > > > > >   each one responsbile for each patch. So, I need to start from
> > > > > > > >   the original patch from Kernel 4.4;  
> > > > > ...  
> > > > > > > I do acknowledge you need to preserve history and all -
> > > > > > > but this patchset is not easy to review.  
> > > > > > 
> > > > > > Why do we need to preserve history ? Adding relevant Signed-off-by 
> > > > > > and
> > > > > > Co-developed-by should be enough, shouldn't it ? Having a public 
> > > > > > branch
> > > > > > that contains the history is useful if anyone is interested, but I 
> > > > > > don't
> > > > > > think it's required in mainline.  
> > > > > 
> > > > > Yea. I concur with Laurent here. I'm not sure what legal reasoning you
> > > > > have on this but preserving the "absolute" history here is actively
> > > > > detrimental for review and understanding of the patch set.
> > > > > 
> > > > > Preserving Authorship, Signed-off-by lines and adding Co-developed-by
> > > > > lines should be sufficient to provide both atribution credit and DCO
> > > > > history.  
> > > > 
> > > > I'm not convinced that, from legal standpoint, folding things would
> > > > be enough. See, there are at least 3 legal systems involved here
> > > > among the different patch authors:
> > > > 
> > > >   - civil law;
> > > >   - common law;
> > > >   - customary law + common law.
> > > > 
> > > > Merging stuff altogether from different law systems can be problematic,
> > > > and trying to discuss this with experienced IP property lawyers will
> > > > for sure take a lot of time and efforts. I also bet that different
> > > > lawyers will have different opinions, because laws are subject to
> > > > interpretation. With that matter I'm not aware of any court rules
> > > > with regards to folded patches. So, it sounds to me that folding
> > > > patches is something that has yet to be proofed in courts around
> > > > the globe.
> > > > 
> > > > At least for US legal system, it sounds that the Country of
> > > > origin of a patch is relevant, as they have a concept of
> > > > "national technology" that can be subject to export regulations.
> > > > 
> > > > From my side, I really prefer to play safe and stay out of any such
> > > > legal discussions.  
> > > 
> > > Let's be serious for a moment. If you think there are legal issues in
> > > taking GPL-v2.0-only patches and squashing them while retaining
> > > authorship information through tags, the Linux kernel if *full* of that.
> > > You also routinely modify patches that you commit to the media subsystem
> > > to fix "small issues".
> > > 
> > > The country of origin argument makes no sense either, the kernel code
> > > base if full of code coming from pretty much all country on the planet.
> > > 
> > > Keeping the patches separate make this hard to review. Please squash
> > > them.  
> > 
> > I'm inclined to agree with Laurent here.
> > 
> > Patches submitted as GPL-v2 with DCO lines and author names/companies
> > should be fine to be squashed and rearranged,
> > as long as the DCO and Authorship is kept somewhere in the new patch
> > that is applied.
> > 
> > Review is more important here.
> 
> Sorry, but I can't agree that review is more important than to be able
> to properly indicate copyrights in a valid way at the legal systems that
> it would apply ;-)

Regardless of the 

Re: [RFC] Experimental DMA-BUF Device Heaps

2020-08-26 Thread Nicolas Dufresne
Le jeudi 20 août 2020 à 18:54 +0300, Laurent Pinchart a écrit :
> Hi Ezequiel,
> 
> On Thu, Aug 20, 2020 at 05:36:40AM -0300, Ezequiel Garcia wrote:
> > Hi John,
> > 
> > Thanks a ton for taking the time
> > to go thru this.
> > 
> > On Mon, 2020-08-17 at 21:13 -0700, John Stultz wrote:
> > > On Sun, Aug 16, 2020 at 10:23 AM Ezequiel Garcia  
> > > wrote:
> > > > This heap is basically a wrapper around DMA-API dma_alloc_attrs,
> > > > which will allocate memory suitable for the given device.
> > > > 
> > > > The implementation is mostly a port of the Contiguous Videobuf2
> > > > memory allocator (see videobuf2/videobuf2-dma-contig.c)
> > > > over to the DMA-BUF Heap interface.
> > > > 
> > > > The intention of this allocator is to provide applications
> > > > with a more system-agnostic API: the only thing the application
> > > > needs to know is which device to get the buffer for.
> > > > 
> > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool
> > > > is unknown to the application.
> > > 
> > > My hesitancy here is that the main reason we have DMA BUF Heaps, and
> > > ION before it, was to expose different types of memory allocations to
> > > userspace. The main premise that often these buffers are shared with
> > > multiple devices, which have differing constraints and it is userspace
> > > that best understands the path a buffer will take through a series of
> > > devices. So userspace is best positioned to determine what type of
> > > memory should be allocated to satisfy the various devices constraints
> > > (there were some design attempts to allow DMA BUFs to use multiple
> > > attach with deferred alloc at map time to handle this constraint
> > > solving in-kernel, but it was never adopted in practice).
> > > 
> > > This however, requires some system specific policy - implemented in
> > > the Android userspace by gralloc which maps "usage" types (device
> > > pipeline flows) to heaps. I liken it to fstab, which helps map mount
> > > points to partitions - it's not going to be the same on every system.
> > > 
> > > What you seem to be proposing seems a bit contrary to this premise -
> > > Userland doesn't know what type of memory it needs, but given a device
> > > can somehow find the heap it should allocate for? This seems to assume
> > > the buffer is only to be used with a single device?
> > 
> > Single-device usage wasn't the intention. I see now that this patch
> > looks too naive and it's confusing. The idea is of course to get buffers
> > that can be shared.
> > 
> > I'm thinking you need to share your picture buffer with a decoder,
> > a renderer, possibly something else. Each with its own set
> > of constraints and limitations. 
> > 
> > Of course, a buffer that works for device A may be unsuitable for
> > device B and so this per-device heap is surely way too naive.
> > 
> > As you rightly mention, the main intention of this RFC is to
> > question exactly the current premise: "userspace knows".
> > I fail to see how will (generic and system-agnostic) applications
> > know which heap to use.
> > 
> > Just for completion, let me throw a simple example: i.MX 8M
> > and some Rockchip platforms share the same VPU block, except the
> > latter have an IOMMU.
> > 
> > So applications would need to query an iommu presence
> > to get buffer from CMA or not.
> 
> I don't think we can do this in a system-agnostic way. What I'd like to
> see is an API for the kernel to expose opaque constraints for each

Please, take into consideration that constraints can also come from
userspace. These days, there exist things we may want to do using the
CPU, but the SIMD instructions and the associated algorithm will
introduce constraints too. If these constraints comes too opaque, but
you will also potentially limit some valid CPU interaction with HW in
term of buffer access. CPU constraints todays are fairly small and one
should be able to express them I believe. Of course, these are not
media agnostic, some constraints may depends on the media (like an
image buffer, a matrix buffer or audio buffer) and the associated
algorithm to be used.

An example would be an image buffers produced or modified on CPU but
encoded with HW.

> device, and a userspace library to reconcile constraints, selecting a
> suitable heap, and producing heap-specific parameters for the
> allocation.
> 
> The constraints need to be transported in a generic way, but the
> contents can be opaque for applications. Only the library would need to
> interpret them. This can be done with platform-specific code inside the
> library. A separate API for the library to interect with the kernel to
> further query or negotiate configuration parameters could be part of
> that picture.
> 
> This would offer standardized APIs to applications (to gather
> constraints, pass them to the constraint resolution library, and receive
> a heap ID and heap parameters), while still allowing platform-specific
> code in userspace.
> 
> > > There was at 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-04-06 Thread Nicolas Dufresne
Le samedi 04 avril 2020 à 15:55 +0200, Andreas Bergmeier a écrit :
> The problem of data transfer costs is not new in Cloud environments. At work 
> we usually just opt for paying for it since dev time is scarser. For private 
> projects though, I opt for aggressive (remote) caching.
> So you can setup a global cache in Google Cloud Storage and more local caches 
> wherever your executors are (reduces egress as much as possible).
> This setup works great with Bazel and Pants among others. Note that these 
> systems are pretty hermetic in contrast to Meson.
> IIRC Eric by now works at Google. They internally use Blaze which AFAIK does 
> aggressive caching, too.
> So maybe using any of these systems would be a way of not having to sacrifice 
> any of the current functionality.
> Downside is that you have lower a bit of dev productivity since you cannot 
> eyeball your build definitions anymore.
> 
Did you mean Bazel [0] ? I'm not sure I follow your reflection, why is
Meson vs Bazel related to this issue ?

Nicolas

[0] https://bazel.build/

> ym2c
> 
> 
> On Fri, 28 Feb 2020 at 20:34, Eric Anholt  wrote:
> > On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:
> > >
> > > On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> > > >
> > > > On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > > > > b) we probably need to take a large step back here.
> > > > >
> > > > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > > > sponsorship money that they are just giving straight to google to pay
> > > > > for hosting credits? Google are profiting in some minor way from these
> > > > > hosting credits being bought by us, and I assume we aren't getting any
> > > > > sort of discounts here. Having google sponsor the credits costs google
> > > > > substantially less than having any other company give us money to do
> > > > > it.
> > > >
> > > > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > > > comparable in terms of what you get and what you pay for them.
> > > > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > > > services are cheaper, but then you need to find someone who is going
> > > > to properly administer the various machines, install decent
> > > > monitoring, make sure that more storage is provisioned when we need
> > > > more storage (which is basically all the time), make sure that the
> > > > hardware is maintained in decent shape (pretty sure one of the fd.o
> > > > machines has had a drive in imminent-failure state for the last few
> > > > months), etc.
> > > >
> > > > Given the size of our service, that's a much better plan (IMO) than
> > > > relying on someone who a) isn't an admin by trade, b) has a million
> > > > other things to do, and c) hasn't wanted to do it for the past several
> > > > years. But as long as that's the resources we have, then we're paying
> > > > the cloud tradeoff, where we pay more money in exchange for fewer
> > > > problems.
> > >
> > > Admin for gitlab and CI is a full time role anyways. The system is
> > > definitely not self sustaining without time being put in by you and
> > > anholt still. If we have $75k to burn on credits, and it was diverted
> > > to just pay an admin to admin the real hw + gitlab/CI would that not
> > > be a better use of the money? I didn't know if we can afford $75k for
> > > an admin, but suddenly we can afford it for gitlab credits?
> > 
> > As I think about the time that I've spent at google in less than a
> > year on trying to keep the lights on for CI and optimize our
> > infrastructure in the current cloud environment, that's more than the
> > entire yearly budget you're talking about here.  Saying "let's just
> > pay for people to do more work instead of paying for full-service
> > cloud" is not a cost optimization.
> > 
> > 
> > > > Yes, we could federate everything back out so everyone runs their own
> > > > builds and executes those. Tinderbox did something really similar to
> > > > that IIRC; not sure if Buildbot does as well. Probably rules out
> > > > pre-merge testing, mind.
> > >
> > > Why? does gitlab not support the model? having builds done in parallel
> > > on runners closer to the test runners seems like it should be a thing.
> > > I guess artifact transfer would cost less then as a result.
> > 
> > Let's do some napkin math.  The biggest artifacts cost we have in Mesa
> > is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
> > downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
> > makes ~1.8TB/month ($180 or so).  We could build a local storage next
> > to the lava dispatcher so that the artifacts didn't have to contain
> > the rootfs that came from the container (~2/3 of the insides of the
> > zip file), but that's another service to build and maintain.  Building
> > the drivers once locally and storing it would save downloading the
> > other ~1/3 of the inside of the zip file, but that requires a big
> > enough system to do 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-04-06 Thread Nicolas Dufresne
Le samedi 04 avril 2020 à 08:11 -0700, Rob Clark a écrit :
> On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer  wrote:
> > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > pre-merge CI.
> > 
> > Thanks for the suggestion! I implemented something like this for Mesa:
> > 
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> > 
> 
> I wouldn't mind manually triggering pipelines, but unless there is
> some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> click first the container jobs.. then wait.. then the build jobs..
> then wait some more.. and then finally the actual runners.  That would
> be a real step back in terms of usefulness of CI.. one might call it a
> regression :-(

On GStreamer side we have moved some existing pipeline to manual mode.
As we use needs: between jobs, we could simply set the first job to
manual (in our case it's a single job called manifest in your case it
would be the N container jobs). This way you can have a manual pipeline
that is triggered in single (or fewer) clicks. Here's an example:

https://gitlab.freedesktop.org/gstreamer/gstreamer/pipelines/128292

That our post-merge pipelines, we only trigger then if we suspect a
problem.

> 
> Is there a possible middle ground where pre-marge pipelines that touch
> a particular driver trigger that driver's CI jobs, but MRs that don't
> touch that driver but do touch shared code don't until triggered by
> marge?  Ie. if I have a MR that only touches nir, it's probably ok to
> not run freedreno jobs until marge triggers it.  But if I have a MR
> that is touching freedreno, I'd really rather not have to wait until
> marge triggers the freedreno CI jobs.
> 
> Btw, I was under the impression (from periodically skimming the logs
> in #freedesktop, so I could well be missing or misunderstanding
> something) that caching/etc had been improved and mesa's part of the
> egress wasn't the bigger issue at this point?
> 
> BR,
> -R

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Nicolas Dufresne
Le mercredi 18 mars 2020 à 11:05 +0100, Michel Dänzer a écrit :
> On 2020-03-17 6:21 p.m., Lucas Stach wrote:
> > That's one of the issues with implicit sync that explicit may solve: 
> > a single client taking way too much time to render something can 
> > block the whole pipeline up until the display flip. With explicit 
> > sync the compositor can just decide to use the last client buffer if 
> > the latest buffer isn't ready by some deadline.
> 
> FWIW, the compositor can do this with implicit sync as well, by polling
> a dma-buf fd for the buffer. (Currently, it has to poll for writable,
> because waiting for the exclusive fence only isn't enough with amdgpu)

That is very interesting, thanks for sharing, could allow fixing some
issues in userspace for backward compatibility.

thanks,
Nicolas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Nicolas Dufresne
Le mardi 17 mars 2020 à 11:27 -0500, Jason Ekstrand a écrit :
> On Tue, Mar 17, 2020 at 10:33 AM Nicolas Dufresne  
> wrote:
> > Le lundi 16 mars 2020 à 23:15 +0200, Laurent Pinchart a écrit :
> > > Hi Jason,
> > > 
> > > On Mon, Mar 16, 2020 at 10:06:07AM -0500, Jason Ekstrand wrote:
> > > > On Mon, Mar 16, 2020 at 5:20 AM Laurent Pinchart wrote:
> > > > > On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> > > > > > (I know I'm going to be spammed by so many mailing list ...)
> > > > > > 
> > > > > > Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > > > > > > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand 
> > > > > > >  wrote:
> > > > > > > > All,
> > > > > > > > 
> > > > > > > > Sorry for casting such a broad net with this one. I'm sure most 
> > > > > > > > people
> > > > > > > > who reply will get at least one mailing list rejection.  
> > > > > > > > However, this
> > > > > > > > is an issue that affects a LOT of components and that's why it's
> > > > > > > > thorny to begin with.  Please pardon the length of this e-mail 
> > > > > > > > as
> > > > > > > > well; I promise there's a concrete point/proposal at the end.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Explicit synchronization is the future of graphics and media.  
> > > > > > > > At
> > > > > > > > least, that seems to be the consensus among all the graphics 
> > > > > > > > people
> > > > > > > > I've talked to.  I had a chat with one of the lead Android 
> > > > > > > > graphics
> > > > > > > > engineers recently who told me that doing explicit sync from 
> > > > > > > > the start
> > > > > > > > was one of the best engineering decisions Android ever made.  
> > > > > > > > It's
> > > > > > > > also the direction being taken by more modern APIs such as 
> > > > > > > > Vulkan.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ## What are implicit and explicit synchronization?
> > > > > > > > 
> > > > > > > > For those that aren't familiar with this space, GPUs, media 
> > > > > > > > encoders,
> > > > > > > > etc. are massively parallel and synchronization of some form is
> > > > > > > > required to ensure that everything happens in the right order 
> > > > > > > > and
> > > > > > > > avoid data races.  Implicit synchronization is when bits of 
> > > > > > > > work (3D,
> > > > > > > > compute, video encode, etc.) are implicitly based on the 
> > > > > > > > absolute
> > > > > > > > CPU-time order in which API calls occur.  Explicit 
> > > > > > > > synchronization is
> > > > > > > > when the client (whatever that means in any given context) 
> > > > > > > > provides
> > > > > > > > the dependency graph explicitly via some sort of synchronization
> > > > > > > > primitives.  If you're still confused, consider the following
> > > > > > > > examples:
> > > > > > > > 
> > > > > > > > With OpenGL and EGL, almost everything is implicit sync.  Say 
> > > > > > > > you have
> > > > > > > > two OpenGL contexts sharing an image where one writes to it and 
> > > > > > > > the
> > > > > > > > other textures from it.  The way the OpenGL spec works, the 
> > > > > > > > client has
> > > > > > > > to make the API calls to render to the image before (in CPU 
> > > > > > > > time) it
> > > > > > > > makes the API calls which texture from the image.  As long as 
> > > > > > > > it does
> > > > > > > > this (and maybe inserts a glFlush?), the driver will ensure 
> > > > > > > > that the
> > > > &

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Nicolas Dufresne
Le lundi 16 mars 2020 à 23:15 +0200, Laurent Pinchart a écrit :
> Hi Jason,
> 
> On Mon, Mar 16, 2020 at 10:06:07AM -0500, Jason Ekstrand wrote:
> > On Mon, Mar 16, 2020 at 5:20 AM Laurent Pinchart wrote:
> > > On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> > > > (I know I'm going to be spammed by so many mailing list ...)
> > > > 
> > > > Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > > > > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand 
> > > > >  wrote:
> > > > > > All,
> > > > > > 
> > > > > > Sorry for casting such a broad net with this one. I'm sure most 
> > > > > > people
> > > > > > who reply will get at least one mailing list rejection.  However, 
> > > > > > this
> > > > > > is an issue that affects a LOT of components and that's why it's
> > > > > > thorny to begin with.  Please pardon the length of this e-mail as
> > > > > > well; I promise there's a concrete point/proposal at the end.
> > > > > > 
> > > > > > 
> > > > > > Explicit synchronization is the future of graphics and media.  At
> > > > > > least, that seems to be the consensus among all the graphics people
> > > > > > I've talked to.  I had a chat with one of the lead Android graphics
> > > > > > engineers recently who told me that doing explicit sync from the 
> > > > > > start
> > > > > > was one of the best engineering decisions Android ever made.  It's
> > > > > > also the direction being taken by more modern APIs such as Vulkan.
> > > > > > 
> > > > > > 
> > > > > > ## What are implicit and explicit synchronization?
> > > > > > 
> > > > > > For those that aren't familiar with this space, GPUs, media 
> > > > > > encoders,
> > > > > > etc. are massively parallel and synchronization of some form is
> > > > > > required to ensure that everything happens in the right order and
> > > > > > avoid data races.  Implicit synchronization is when bits of work 
> > > > > > (3D,
> > > > > > compute, video encode, etc.) are implicitly based on the absolute
> > > > > > CPU-time order in which API calls occur.  Explicit synchronization 
> > > > > > is
> > > > > > when the client (whatever that means in any given context) provides
> > > > > > the dependency graph explicitly via some sort of synchronization
> > > > > > primitives.  If you're still confused, consider the following
> > > > > > examples:
> > > > > > 
> > > > > > With OpenGL and EGL, almost everything is implicit sync.  Say you 
> > > > > > have
> > > > > > two OpenGL contexts sharing an image where one writes to it and the
> > > > > > other textures from it.  The way the OpenGL spec works, the client 
> > > > > > has
> > > > > > to make the API calls to render to the image before (in CPU time) it
> > > > > > makes the API calls which texture from the image.  As long as it 
> > > > > > does
> > > > > > this (and maybe inserts a glFlush?), the driver will ensure that the
> > > > > > rendering completes before the texturing happens and you get correct
> > > > > > contents.
> > > > > > 
> > > > > > Implicit synchronization can also happen across processes.  Wayland,
> > > > > > for instance, is currently built on implicit sync where the client
> > > > > > does their rendering and then does a hand-off (via 
> > > > > > wl_surface::commit)
> > > > > > to tell the compositor it's done at which point the compositor can 
> > > > > > now
> > > > > > texture from the surface.  The hand-off ensures that the client's
> > > > > > OpenGL API calls happen before the server's OpenGL API calls.
> > > > > > 
> > > > > > A good example of explicit synchronization is the Vulkan API.  
> > > > > > There,
> > > > > > a client (or multiple clients) can simultaneously build command
> > > > > > buffers in different threads where one of those command buffers
> > > > >

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-12 Thread Nicolas Dufresne
(I know I'm going to be spammed by so many mailing list ...)

Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand  wrote:
> > All,
> > 
> > Sorry for casting such a broad net with this one. I'm sure most people
> > who reply will get at least one mailing list rejection.  However, this
> > is an issue that affects a LOT of components and that's why it's
> > thorny to begin with.  Please pardon the length of this e-mail as
> > well; I promise there's a concrete point/proposal at the end.
> > 
> > 
> > Explicit synchronization is the future of graphics and media.  At
> > least, that seems to be the consensus among all the graphics people
> > I've talked to.  I had a chat with one of the lead Android graphics
> > engineers recently who told me that doing explicit sync from the start
> > was one of the best engineering decisions Android ever made.  It's
> > also the direction being taken by more modern APIs such as Vulkan.
> > 
> > 
> > ## What are implicit and explicit synchronization?
> > 
> > For those that aren't familiar with this space, GPUs, media encoders,
> > etc. are massively parallel and synchronization of some form is
> > required to ensure that everything happens in the right order and
> > avoid data races.  Implicit synchronization is when bits of work (3D,
> > compute, video encode, etc.) are implicitly based on the absolute
> > CPU-time order in which API calls occur.  Explicit synchronization is
> > when the client (whatever that means in any given context) provides
> > the dependency graph explicitly via some sort of synchronization
> > primitives.  If you're still confused, consider the following
> > examples:
> > 
> > With OpenGL and EGL, almost everything is implicit sync.  Say you have
> > two OpenGL contexts sharing an image where one writes to it and the
> > other textures from it.  The way the OpenGL spec works, the client has
> > to make the API calls to render to the image before (in CPU time) it
> > makes the API calls which texture from the image.  As long as it does
> > this (and maybe inserts a glFlush?), the driver will ensure that the
> > rendering completes before the texturing happens and you get correct
> > contents.
> > 
> > Implicit synchronization can also happen across processes.  Wayland,
> > for instance, is currently built on implicit sync where the client
> > does their rendering and then does a hand-off (via wl_surface::commit)
> > to tell the compositor it's done at which point the compositor can now
> > texture from the surface.  The hand-off ensures that the client's
> > OpenGL API calls happen before the server's OpenGL API calls.
> > 
> > A good example of explicit synchronization is the Vulkan API.  There,
> > a client (or multiple clients) can simultaneously build command
> > buffers in different threads where one of those command buffers
> > renders to an image and the other textures from it and then submit
> > both of them at the same time with instructions to the driver for
> > which order to execute them in.  The execution order is described via
> > the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> > extension, you can even submit the work which does the texturing
> > BEFORE the work which does the rendering and the driver will sort it
> > out.
> > 
> > The #1 problem with implicit synchronization (which explicit solves)
> > is that it leads to a lot of over-synchronization both in client space
> > and in driver/device space.  The client has to synchronize a lot more
> > because it has to ensure that the API calls happen in a particular
> > order.  The driver/device have to synchronize a lot more because they
> > never know what is going to end up being a synchronization point as an
> > API call on another thread/process may occur at any time.  As we move
> > to more and more multi-threaded programming this synchronization (on
> > the client-side especially) becomes more and more painful.
> > 
> > 
> > ## Current status in Linux
> > 
> > Implicit synchronization in Linux works via a the kernel's internal
> > dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> > which represents the "done" status for some bit of work.  Typically,
> > dma_fences are created as a by-product of someone submitting some bit
> > of work (say, 3D rendering) to the kernel.  The dma_buf object has a
> > set of dma_fences on it representing shared (read) and exclusive
> > (write) access to the object.  When work is submitted which, for
> > instance renders to the dma_buf, it's queued waiting on all the fences
> > on the dma_buf and and a dma_fence is created representing the end of
> > said rendering work and it's installed as the dma_buf's exclusive
> > fence.  This way, the kernel can manage all its internal queues (3D
> > rendering, display, video encode, etc.) and know which things to
> > submit in what order.
> > 
> > For the last few years, we've had sync_file in the kernel 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-03-02 Thread Nicolas Dufresne
Le dimanche 01 mars 2020 à 15:14 +0100, Michel Dänzer a écrit :
> On 2020-02-29 8:46 p.m., Nicolas Dufresne wrote:
> > Le samedi 29 février 2020 à 19:14 +0100, Timur Kristóf a écrit :
> > > 1. I think we should completely disable running the CI on MRs which are
> > > marked WIP. Speaking from personal experience, I usually make a lot of
> > > changes to my MRs before they are merged, so it is a waste of CI
> > > resources.
> 
> Interesting idea, do you want to create an MR implementing it?
> 
> 
> > In the mean time, you can help by taking the habit to use:
> > 
> >   git push -o ci.skip
> 
> That breaks Marge Bot.
> 
> 
> > Notably, we would like to get rid of the post merge CI, as in a rebase
> > flow like we have in GStreamer, it's a really minor risk.
> 
> That should be pretty easy, see Mesa and
> https://docs.gitlab.com/ce/ci/variables/predefined_variables.html.
> Something like this should work:
> 
>   rules:
> - if: '$CI_PROJECT_NAMESPACE != "gstreamer"'
>   when: never
> 
> This is another interesting idea we could consider for Mesa as well. It
> would however require (mostly) banning direct pushes to the main repository.

We already have this policy in GStreamer group. We rely on maintainers
to make the right call though, as we have few cases in multi-repo usage
where pushing manually is the only way to reduce the breakage time
(e.g. when we undo a new API in development branch). (We have
implemented support so that CI is run across users repository with the
same branch name, so that allow doing CI with all the changes, but the
merge remains non-atomic.)

> 
> 
> > > 2. Maybe we could take this one step further and only allow the CI to
> > > be only triggered manually instead of automatically on every push.
> 
> That would again break Marge Bot.

Marge is just a software, we can update it to trigger CI on rebases, or
if the CI haven't been run. There was proposal to actually do that and
let marge trigger CI on merge from maintainers. Though, from my point
view, having a longer delay between submission and the author being
aware of CI breakage have some side effects. Authors are often less
available a week later, when someone review and try to merge, which
make merging patches a lot longer.

> 
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-03-02 Thread Nicolas Dufresne
Le samedi 29 février 2020 à 15:54 -0600, Jason Ekstrand a écrit :
> On Sat, Feb 29, 2020 at 3:47 PM Timur Kristóf  wrote:
> > On Sat, 2020-02-29 at 14:46 -0500, Nicolas Dufresne wrote:
> > > > 1. I think we should completely disable running the CI on MRs which
> > > > are
> > > > marked WIP. Speaking from personal experience, I usually make a lot
> > > > of
> > > > changes to my MRs before they are merged, so it is a waste of CI
> > > > resources.
> > > 
> > > In the mean time, you can help by taking the habit to use:
> > > 
> > >   git push -o ci.skip
> > 
> > Thanks for the advice, I wasn't aware such an option exists. Does this
> > also work on the mesa gitlab or is this a GStreamer only thing?
> 
> Mesa is already set up so that it only runs on MRs and branches named
> ci-* (or maybe it's ci/*; I can't remember).
> 
> > How hard would it be to make this the default?
> 
> I strongly suggest looking at how Mesa does it and doing that in
> GStreamer if you can.  It seems to work pretty well in Mesa.

You are right, they added CI_MERGE_REQUEST_SOURCE_BRANCH_NAME in 11.6
(we started our CI a while ago). But there is even better now, ou can
do:

  only:
refs:
  - merge_requests

Thanks for the hint, I'll suggest that. I've lookup some of the backend
of mesa, I think it's really nice, though there is a lot of concept
that won't work in a multi-repo CI. Again, I need to refresh on what
was moved from the enterprise to the community version in this regard,

> 
> --Jason
> 
> 
> > > That's a much more difficult goal then it looks like. Let each
> > > projects
> > > manage their CI graph and content, as each case is unique. Running
> > > more
> > > tests, or building more code isn't the main issue as the CPU time is
> > > mostly sponsored. The data transfers between the cloud of gitlab and
> > > the runners (which are external), along to sending OS image to Lava
> > > labs is what is likely the most expensive.
> > > 
> > > As it was already mention in the thread, what we are missing now, and
> > > being worked on, is per group/project statistics that give us the
> > > hotspot so we can better target the optimization work.
> > 
> > Yes, would be nice to know what the hotspot is, indeed.
> > 
> > As far as I understand, the problem is not CI itself, but the bandwidth
> > needed by the build artifacts, right? Would it be possible to not host
> > the build artifacts on the gitlab, but rather only the place where the
> > build actually happened? Or at least, only transfer the build artifacts
> > on-demand?
> > 
> > I'm not exactly familiar with how the system works, so sorry if this is
> > a silly question.
> > 
> > ___
> > mesa-dev mailing list
> > mesa-...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services

2020-03-02 Thread Nicolas Dufresne
Hi Jason,

I personally think the suggestion are still a relatively good
brainstorm data for those implicated. Of course, those not implicated
in the CI scripting itself, I'd say just keep in mind that nothing is
black and white and every changes end-up being time consuming.

Le dimanche 01 mars 2020 à 14:18 -0600, Jason Ekstrand a écrit :
> I've seen a number of suggestions which will do one or both of those things 
> including:
> 
>  - Batching merge requests

Agreed. Or at least I foresee quite complicated code to handle the case
of one batched merge failing the tests, or worst, with flicky tests.

>  - Not running CI on the master branch

A small clarification, this depends on the chosen work-flow. In
GStreamer, we use a rebase flow, so "merge" button isn't really
merging. It means that to merge you need your branch to be rebased on
top of the latest. As it is multi-repo, there is always a tiny chance
of breakage due to mid-air collision in changes in other repos. What we
see is that the post "merge" cannot even catch them all (as we already
observed once). In fact, it usually does not catch anything. Or each
time it cached something, we only notice on the next MR.0 So we are
really considering doing this as for this specific workflow/project, we
found very little gain of having it.

With real merge, the code being tested before/after the merge is
different, and for that I agree with you.

>  - Shutting off CI

Of course :-), specially that we had CI before gitlab in GStreamer
(just not pre-commit), we don't want a regress that far in the past.

>  - Preventing CI on other non-MR branches

Another small nuance, mesa does not prevent CI, it only makes it manual
on non-MR. Users can go click run to get CI results. We could also have
option to trigger the ci (the opposite of ci.skip) from git command
line.

>  - Disabling CI on WIP MRs

That I'm also mitigated about.

>  - I'm sure there are more...


regards,
Nicolas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-03-02 Thread Nicolas Dufresne
Le samedi 29 février 2020 à 19:14 +0100, Timur Kristóf a écrit :
> On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:
> > On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
> >  wrote:
> > > On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
> > > > Yeah, changes on vulkan drivers or backend compilers should be
> > > > fairly
> > > > sandboxed.
> > > > 
> > > > We also have tools that only work for intel stuff, that should
> > > > never
> > > > trigger anything on other people's HW.
> > > > 
> > > > Could something be worked out using the tags?
> > > 
> > > I think so! We have the pre-defined environment variable
> > > CI_MERGE_REQUEST_LABELS, and we can do variable conditions:
> > > 
> > > https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables
> > > 
> > > That sounds like a pretty neat middle-ground to me. I just hope
> > > that
> > > new pipelines are triggered if new labels are added, because not
> > > everyone is allowed to set labels, and sometimes people forget...
> > 
> > There's also this which is somewhat more robust:
> > https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569
> 
> My 20 cents:
> 
> 1. I think we should completely disable running the CI on MRs which are
> marked WIP. Speaking from personal experience, I usually make a lot of
> changes to my MRs before they are merged, so it is a waste of CI
> resources.

In the mean time, you can help by taking the habit to use:

  git push -o ci.skip

CI is in fact run for all branches that you push. When we (GStreamer
Project) started our CI we wanted to limit this to MR, but haven't
found a good way yet (and Gitlab is not helping much). The main issue
is that it's near impossible to use gitlab web API from a runner
(requires private key, in an all or nothing manner). But with the
current situation we are revisiting this.

The truth is that probably every CI have lot of room for optimization,
but it can be really time consuming. So until we have a reason to, we
live with inefficiency, like over sized artifact, unused artifacts,
over-sized docker image, etc. Doing a new round of optimization is
obviously a clear short term goals for project, including GStreamer
project. We have discussions going on and are trying to find solutions.
Notably, we would like to get rid of the post merge CI, as in a rebase
flow like we have in GStreamer, it's a really minor risk.

> 
> 2. Maybe we could take this one step further and only allow the CI to
> be only triggered manually instead of automatically on every push.
> 
> 3. I completely agree with Pierre-Eric on MR 2569, let's not run the
> full CI pipeline on every change, only those parts which are affected
> by the change. It not only costs money, but is also frustrating when
> you submit a change and you get unrelated failures from a completely
> unrelated driver.

That's a much more difficult goal then it looks like. Let each projects
manage their CI graph and content, as each case is unique. Running more
tests, or building more code isn't the main issue as the CPU time is
mostly sponsored. The data transfers between the cloud of gitlab and
the runners (which are external), along to sending OS image to Lava
labs is what is likely the most expensive.

As it was already mention in the thread, what we are missing now, and
being worked on, is per group/project statistics that give us the
hotspot so we can better target the optimization work.

> 
> Best regards,
> Timur
> 
> ___
> gstreamer-devel mailing list
> gstreamer-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Overlay support in the i.MX7 display

2019-11-10 Thread Nicolas Dufresne
Le lundi 04 novembre 2019 à 14:58 +0200, Laurent Pinchart a écrit :
> Hello,
> 
> On Mon, Nov 04, 2019 at 10:09:47AM +0200, Pekka Paalanen wrote:
> > On Sun, 03 Nov 2019 19:15:49 +0100 Stefan Agner wrote:
> > > On 2019-11-01 09:43, Laurent Pinchart wrote:
> > > > Hello,
> > > > 
> > > > I'm looking at the available options to support overlays in the display
> > > > pipeline of the i.MX7. The LCDIF itself unfortunaltey doesn't support
> > > > overlays, the feature being implemented in the PXP. A driver for the PXP
> > > > is available but only supports older SoCs whose PXP doesn't support
> > > > overlays. This driver is implemented as a V4L2 mem2mem driver, which
> > > > makes support of additional input channels impossible.  
> > > 
> > > Thanks for bringing this up, it is a topic I have wondered too:
> > > Interaction between PXP and mxsfb.
> > > 
> > > I am not very familiar with the V4L2 subsystem so take my opinions with
> > > a grain of salt.
> > > 
> > > > Here are the options I can envision:
> > > > 
> > > > - Extend the existing PXP driver to support multiple channels. This is
> > > >   technically feasible, but will require moving away from the V4L2
> > > >   mem2mem framework, which would break userspace. I don't think this
> > > >   path could lead anywhere.
> > > > 
> > > > - Write a new PXP driver for the i.MX7, still using V4L2, but with
> > > >   multiple video nodes. This would allow blending multiple layers, but
> > > >   would require writing the output to memory, while the PXP has support
> > > >   for direct connections to the LCDIF (through small SRAM buffers).
> > > >   Performances would thus be suboptimal. The API would also be awkward,
> > > >   as using the PXP for display would require usage of V4L2 in
> > > >   applications.  
> > > 
> > > So the video nodes would be sinks? I would expect overlays to be usable
> > > through KMS, I guess that would then not work, correct?
> 
> There would be sink video nodes for the PXP inputs, and one source video
> node for the PXP output. The PXP can be used stand-alone, in
> memory-to-memory mode, and V4L2 is a good fit for that.
> 
> > > > - Extend the mxsfb driver with PXP support, and expose the PXP inputs as
> > > >   KMS planes. The PXP would only be used when available, and would be
> > > >   transparent to applications. This would however prevent using it
> > > >   separately from the display (to perform multi-pass alpha blending for
> > > >   instance).  
> > > 
> > > KMS planes are well defined and are well integrated with the KMS API, so
> > > I prefer this option. But is this compatible with the currently
> > > supported video use-case? E.g. could we make PXP available through V4L2
> > > and through DRM/mxsfb?
> 
> That's the issue, it's not easily doable. I think we could do so, but
> how to ensure mutual exclusion between the two APIs needs to be
> researched. I fear it will result in an awkward solution with fuzzy
> semantics. A module parameter could be an option, but wouldn't be very
> flexible.
> 
> > > Not sure what your use case is exactly, but when playing a video I
> > > wonder where is the higher value using PXP: Color conversion and scaling
> > > or compositing...? I would expect higher value in the former use case.
> 
> I think it's highly use-case-dependent.
> 
> > mind, with Wayland architecture, color conversion and scaling could be
> > at the same level/step as compositing, in the display server instead of
> > an application. Hence if the PXP capabilities were advertised as KMS
> > planes, there should be nothing to patch in Wayland-designed
> > applications to make use of them, assuming the applications did not
> > already rely on V4L2 M2M devices.
> > 
> > Would it not be possible to expose PXP through both uAPI interfaces? At
> > least KMS atomic's TEST_ONLY feature would make it easy to say "no" to
> > userspace if another bit of userspace already reserved the device via
> > e.g. V4L2.
> 
> We would also need to figure out how to do it the other way around,
> reporting properly through V4L2 that the device is busy. I think it's
> feasible, but I doubt it would result in anything usable for userspace.

We already have this needs for decoders with fixed number of streams.

> If the KMS device exposes multiple planes unconditionally and fails the
> atomic commit if the PXP is used through V4L2, I think it would be hard
> for Wayland to use this consistently. Given that I expect the PXP to be
> mostly used for display purpose I'm tempted to allocate it for display
> unconditionally, or, possibly, decide how to expose it through a module
> parameter.

It's a strange statement "mostly used for display purpose", considering
the upstream driver exist for video scaling and color conversion, and
no one have yet implement the "display purpose" driver.

My impression is that the complication is kernel specific (the fact we
very have two subsystems for the same IPs). Since software wise,
sharing and allocating resources 

Re: Overlay support in the i.MX7 display

2019-11-10 Thread Nicolas Dufresne
Le mardi 05 novembre 2019 à 10:17 +0100, Philipp Zabel a écrit :
> Hi Laurent,
> 
> On Fri, 2019-11-01 at 10:43 +0200, Laurent Pinchart wrote:
> > Hello,
> > 
> > I'm looking at the available options to support overlays in the display
> > pipeline of the i.MX7. The LCDIF itself unfortunaltey doesn't support
> > overlays, the feature being implemented in the PXP. A driver for the PXP
> > is available but only supports older SoCs whose PXP doesn't support
> > overlays. This driver is implemented as a V4L2 mem2mem driver, which
> > makes support of additional input channels impossible.
> > 
> > Here are the options I can envision:
> > 
> > - Extend the existing PXP driver to support multiple channels. This is
> >   technically feasible, but will require moving away from the V4L2
> >   mem2mem framework, which would break userspace. I don't think this
> >   path could lead anywhere.
> 
> I may be biased, but please don't break the V4L2 mem2mem usecase :)
> 
> > - Write a new PXP driver for the i.MX7, still using V4L2, but with
> >   multiple video nodes. This would allow blending multiple layers, but
> >   would require writing the output to memory, while the PXP has support
> >   for direct connections to the LCDIF (through small SRAM buffers).
> >   Performances would thus be suboptimal. The API would also be awkward,
> >   as using the PXP for display would require usage of V4L2 in
> >   applications.
> 
> I'm not sure V4L2 is the best API for multi-pass 2D composition,
> especially as the PXP is able to blit an overlay onto a background in
> place.

There was some userspace (GStreamer element) doing exactly that with
v4l2 m2m using the imx6 driver. The API was fine, even though fences
would have made programming it easier.

https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/308

(not merge as we don't have an agreement on kernel side, notably we
don't even have a way to control the blend function, so the result is
likely dependant on the use case the driver was written for)

The real limitation was that these IP usually supports more then just
blit/blend over another surface, and as you said, supports background.
And to support this use case, we'd need an m2m driver with multiple queues per 
direction. That was discussed in that last workshop at ELCE, and applies to 
other m2m IP like muxers and demuxers which exist on STB kind of SoC.

> 
> > - Extend the mxsfb driver with PXP support, and expose the PXP inputs as
> >   KMS planes. The PXP would only be used when available, and would be
> >   transparent to applications. This would however prevent using it
> >   separately from the display (to perform multi-pass alpha blending for
> >   instance).
> 
> For the SRAM block row buffer path to the LCDIF, I think the KMS plane
> abstraction is the way to go. The DRM and V4L2 drivers could be made to
> use a shared backend, such that only one of plane composition and V4L2
> scaling/CSC functions can work at the same time.
> 
> > What would be the best option going forward ? Would any of you, by any
> > chance, have already started work in this area ?
> 
> I have not worked on this.
> 
> regards
> Philipp
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Overlay support in the i.MX7 display

2019-11-10 Thread Nicolas Dufresne
Le lundi 04 novembre 2019 à 10:09 +0200, Pekka Paalanen a écrit :
> On Sun, 03 Nov 2019 19:15:49 +0100
> Stefan Agner  wrote:
> 
> > Hi Laurent,
> > 
> > On 2019-11-01 09:43, Laurent Pinchart wrote:
> > > Hello,
> > > 
> > > I'm looking at the available options to support overlays in the display
> > > pipeline of the i.MX7. The LCDIF itself unfortunaltey doesn't support
> > > overlays, the feature being implemented in the PXP. A driver for the PXP
> > > is available but only supports older SoCs whose PXP doesn't support
> > > overlays. This driver is implemented as a V4L2 mem2mem driver, which
> > > makes support of additional input channels impossible.  
> > 
> > Thanks for bringing this up, it is a topic I have wondered too:
> > Interaction between PXP and mxsfb.
> > 
> > I am not very familiar with the V4L2 subsystem so take my opinions with
> > a grain of salt.
> > 
> > > Here are the options I can envision:
> > > 
> > > - Extend the existing PXP driver to support multiple channels. This is
> > >   technically feasible, but will require moving away from the V4L2
> > >   mem2mem framework, which would break userspace. I don't think this
> > >   path could lead anywhere.
> > > 
> > > - Write a new PXP driver for the i.MX7, still using V4L2, but with
> > >   multiple video nodes. This would allow blending multiple layers, but
> > >   would require writing the output to memory, while the PXP has support
> > >   for direct connections to the LCDIF (through small SRAM buffers).
> > >   Performances would thus be suboptimal. The API would also be awkward,
> > >   as using the PXP for display would require usage of V4L2 in
> > >   applications.  
> > 
> > So the video nodes would be sinks? I would expect overlays to be usable
> > through KMS, I guess that would then not work, correct?
> > 
> > > - Extend the mxsfb driver with PXP support, and expose the PXP inputs as
> > >   KMS planes. The PXP would only be used when available, and would be
> > >   transparent to applications. This would however prevent using it
> > >   separately from the display (to perform multi-pass alpha blending for
> > >   instance).  
> > 
> > KMS planes are well defined and are well integrated with the KMS API, so
> > I prefer this option. But is this compatible with the currently
> > supported video use-case? E.g. could we make PXP available through V4L2
> > and through DRM/mxsfb?
> > 
> > Not sure what your use case is exactly, but when playing a video I
> > wonder where is the higher value using PXP: Color conversion and scaling
> > or compositing...? I would expect higher value in the former use case.
> 
> Hi,
> 
> mind, with Wayland architecture, color conversion and scaling could be
> at the same level/step as compositing, in the display server instead of
> an application. Hence if the PXP capabilities were advertised as KMS
> planes, there should be nothing to patch in Wayland-designed
> applications to make use of them, assuming the applications did not
> already rely on V4L2 M2M devices.

The PXP can already be used with GStreamer v4l2convert element, for CSC
and scaling.

> 
> Would it not be possible to expose PXP through both uAPI interfaces? At
> least KMS atomic's TEST_ONLY feature would make it easy to say "no" to
> userspace if another bit of userspace already reserved the device via
> e.g. V4L2.

Same exist for decoders with fixed number of streams/instances I think.

> 
> 
> Thanks,
> pq

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Overlay support in the i.MX7 display

2019-11-10 Thread Nicolas Dufresne
Le lundi 04 novembre 2019 à 19:24 +0100, Daniel Vetter a écrit :
> On Mon, Nov 04, 2019 at 02:58:29PM +0200, Laurent Pinchart wrote:
> > Hello,
> > 
> > On Mon, Nov 04, 2019 at 10:09:47AM +0200, Pekka Paalanen wrote:
> > > On Sun, 03 Nov 2019 19:15:49 +0100 Stefan Agner wrote:
> > > > On 2019-11-01 09:43, Laurent Pinchart wrote:
> > > > > Hello,
> > > > > 
> > > > > I'm looking at the available options to support overlays in the 
> > > > > display
> > > > > pipeline of the i.MX7. The LCDIF itself unfortunaltey doesn't support
> > > > > overlays, the feature being implemented in the PXP. A driver for the 
> > > > > PXP
> > > > > is available but only supports older SoCs whose PXP doesn't support
> > > > > overlays. This driver is implemented as a V4L2 mem2mem driver, which
> > > > > makes support of additional input channels impossible.  
> > > > 
> > > > Thanks for bringing this up, it is a topic I have wondered too:
> > > > Interaction between PXP and mxsfb.
> > > > 
> > > > I am not very familiar with the V4L2 subsystem so take my opinions with
> > > > a grain of salt.
> > > > 
> > > > > Here are the options I can envision:
> > > > > 
> > > > > - Extend the existing PXP driver to support multiple channels. This is
> > > > >   technically feasible, but will require moving away from the V4L2
> > > > >   mem2mem framework, which would break userspace. I don't think this
> > > > >   path could lead anywhere.
> > > > > 
> > > > > - Write a new PXP driver for the i.MX7, still using V4L2, but with
> > > > >   multiple video nodes. This would allow blending multiple layers, but
> > > > >   would require writing the output to memory, while the PXP has 
> > > > > support
> > > > >   for direct connections to the LCDIF (through small SRAM buffers).
> > > > >   Performances would thus be suboptimal. The API would also be 
> > > > > awkward,
> > > > >   as using the PXP for display would require usage of V4L2 in
> > > > >   applications.  
> > > > 
> > > > So the video nodes would be sinks? I would expect overlays to be usable
> > > > through KMS, I guess that would then not work, correct?
> > 
> > There would be sink video nodes for the PXP inputs, and one source video
> > node for the PXP output. The PXP can be used stand-alone, in
> > memory-to-memory mode, and V4L2 is a good fit for that.
> > 
> > > > > - Extend the mxsfb driver with PXP support, and expose the PXP inputs 
> > > > > as
> > > > >   KMS planes. The PXP would only be used when available, and would be
> > > > >   transparent to applications. This would however prevent using it
> > > > >   separately from the display (to perform multi-pass alpha blending 
> > > > > for
> > > > >   instance).  
> > > > 
> > > > KMS planes are well defined and are well integrated with the KMS API, so
> > > > I prefer this option. But is this compatible with the currently
> > > > supported video use-case? E.g. could we make PXP available through V4L2
> > > > and through DRM/mxsfb?
> > 
> > That's the issue, it's not easily doable. I think we could do so, but
> > how to ensure mutual exclusion between the two APIs needs to be
> > researched. I fear it will result in an awkward solution with fuzzy
> > semantics. A module parameter could be an option, but wouldn't be very
> > flexible.
> > 
> > > > Not sure what your use case is exactly, but when playing a video I
> > > > wonder where is the higher value using PXP: Color conversion and scaling
> > > > or compositing...? I would expect higher value in the former use case.
> > 
> > I think it's highly use-case-dependent.
> > 
> > > mind, with Wayland architecture, color conversion and scaling could be
> > > at the same level/step as compositing, in the display server instead of
> > > an application. Hence if the PXP capabilities were advertised as KMS
> > > planes, there should be nothing to patch in Wayland-designed
> > > applications to make use of them, assuming the applications did not
> > > already rely on V4L2 M2M devices.
> > > 
> > > Would it not be possible to expose PXP through both uAPI interfaces? At
> > > least KMS atomic's TEST_ONLY feature would make it easy to say "no" to
> > > userspace if another bit of userspace already reserved the device via
> > > e.g. V4L2.
> > 
> > We would also need to figure out how to do it the other way around,
> > reporting properly through V4L2 that the device is busy. I think it's
> > feasible, but I doubt it would result in anything usable for userspace.
> > If the KMS device exposes multiple planes unconditionally and fails the
> > atomic commit if the PXP is used through V4L2, I think it would be hard
> > for Wayland to use this consistently. Given that I expect the PXP to be
> > mostly used for display purpose I'm tempted to allocate it for display
> > unconditionally, or, possibly, decide how to expose it through a module
> > parameter.
> 
> KMS should be fine if planes are missing, userspace is supposed to be able
> to cope with that. Not all userspace does, but welp.
>  

Re: [PATCH 6/7] misc: bcm-vk: add Broadcom Valkyrie driver

2019-08-27 Thread Nicolas Dufresne
Le mardi 27 août 2019 à 16:14 +0200, Arnd Bergmann a écrit :
> On Thu, Aug 22, 2019 at 9:25 PM Scott Branden
>  wrote:
> > Add Broadcom Valkyrie driver offload engine.
> > This driver interfaces to the Valkyrie PCIe offload engine to perform
> > should offload functions as video transcoding on multiple streams
> > in parallel.  Valkyrie device is booted from files loaded using
> > request_firmware_into_buf mechanism.  After booted card status is updated
> > and messages can then be sent to the card.
> > Such messages contain scatter gather list of addresses
> > to pull data from the host to perform operations on.
> > 
> > Signed-off-by: Scott Branden 
> > Signed-off-by: Desmond Yan 
> > Signed-off-by: James Hu 
> 
> Can you explain the decision to make this is a standalone misc driver
> rather than hooking into the existing framework in drivers/media?
> 
> There is an existing interface that looks like it could fit the hardware
> in include/media/v4l2-mem2mem.h. Have you considered using that?
> 
> There is also support for video transcoding using GPUs in
> driver/gpu/drm/, that could also be used in theory, though it sounds
> like a less optimal fit.

I believe that a major obstacle with this driver is usability. Even
though I have read through, I believe it's just impossible for anyone
to actually write Open Source userspace for it. The commit message does
not even try to help in this regard.

Note that depending on the feature your transcoder has, there is also
the option to model it around the media controller. That is notably
useful for certain transcoders that will also do scaling and produce
multiple streams (for adaptive streaming usecases were you want to
share a single decoder).

An 1 to 1 transcoder modeled around m2m would eventually required
documentation so that other transcoder can be implemented in a way that
they would share the same userspace. This is currently being worked on
for m2m encoder and decoders (including state-less variants).

regards,
Nicolas




signature.asc
Description: This is a digitally signed message part


  1   2   >