Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-11-12 Thread Hans Verkuil
On 11/12/2018 10:29 AM, Philipp Zabel wrote:
> Hi Tomasz,
> 
> On Sun, 2018-11-11 at 12:43 +0900, Tomasz Figa wrote:
>> On Sat, Nov 10, 2018 at 6:06 AM Nicolas Dufresne  
>> wrote:
>>>
>>> Le jeudi 08 novembre 2018 à 16:45 +0900, Tomasz Figa a écrit :
> In this patch we should consider a way to tell userspace that this has
> been opt in, otherwise existing userspace will have to remain using
> sub-optimal copy based reclaiming in order to ensure that renegotiation
> can work on older kernel tool. At worst someone could probably do trial
> and error (reqbufs(1)/mmap/reqbufs(0)) but on CMA with large buffers
> this introduces extra startup time.

 Would such REQBUFS dance be really needed? Couldn't one simply try
 reqbufs(0) when it's really needed and if it fails then do the copy,
 otherwise just proceed normally?
>>>
>>> In simple program, maybe, in modularized code, where the consumer of
>>> these buffer (the one that is forced to make a copy) does not know the
>>> origin of the DMABuf, it's a bit complicated.
>>>
>>> In GStreamer as an example, the producer is a plugin called
>>> libgstvideo4linux2.so, while the common consumer would be libgstkms.so.
>>> They don't know each other. The pipeline would be described as:
>>>
>>>   v4l2src ! kmssink
>>>
>>> GStreamer does not have an explicit reclaiming mechanism. No one knew
>>> about V4L2 restrictions when this was designed, DMABuf didn't exist and
>>> GStreamer didn't have OMX support.
>>>
>>> What we ended up crafting, as a plaster, is that when upstream element
>>> (v4l2src) query a new allocation from downstream (kmssink), we always
>>> copy and return any ancient buffers by copying. kmssink holds on a
>>> buffer because we can't remove the scannout buffer on the display. This
>>> is slow and inefficient, and also totally unneeded if the dmabuf
>>> originate from other kernel subsystems (like DRM).
>>>
>>> So what I'd like to be able to do, to support this in a more optimal
>>> and generic way, is to mark the buffers that needs reclaiming before
>>> letting them go. But for that, I would need a flag somewhere to tell me
>>> this kernel allow this.
>>
>> Okay, got it. Thanks for explaining it.
>>
>>>
>>> You got the context, maybe the conclusion is that I should simply do
>>> kernel version check, though I'm sure a lot of people will backport
>>> this, which means that check won't work so well.
>>>
>>> Let me know, I understand adding more API is not fun, but as nothing is
>>> ever versionned in the linux-media world, it's really hard to detect
>>> and use new behaviour while supporting what everyone currently run on
>>> their systems.
>>>
>>> I would probably try and find a way to implement your suggestion, and
>>> then introduce a flag in the query itself, but I would need to think
>>> about it a little more. It's not as simple as it look like
>>> unfortunately.
>>
>> It sounds like a good fit for a new capability in v4l2_requestbuffers
>> and v4l2_create_buffers structs [1]. Perhaps something like
>> V4L2_BUF_CAP_SUPPORTS_FREE_AFTER_EXPORT? Hans, what do you think?
> 
> Maybe V4L2_BUF_CAP_SUPPORTS_ORPHANS? With this patch, while the buffers
> are in use, reqbufs(0) doesn't free them, they are orphaned. Also, this
> patch allows reqbufs(0) not only after export, but also while mmapped.

Signaling this through a BUF_CAP makes sense. It's really what it is there
for.

If someone can make an updated patch of
https://lore.kernel.org/patchwork/patch/607853/, then it makes sense to
merge it.

How about: 'SUPPORTS_ORPHANED_BUFS'?

Regards,

Hans

> 
> regards
> Philipp
> 



Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-11-12 Thread Philipp Zabel
Hi Tomasz,

On Sun, 2018-11-11 at 12:43 +0900, Tomasz Figa wrote:
> On Sat, Nov 10, 2018 at 6:06 AM Nicolas Dufresne  wrote:
> > 
> > Le jeudi 08 novembre 2018 à 16:45 +0900, Tomasz Figa a écrit :
> > > > In this patch we should consider a way to tell userspace that this has
> > > > been opt in, otherwise existing userspace will have to remain using
> > > > sub-optimal copy based reclaiming in order to ensure that renegotiation
> > > > can work on older kernel tool. At worst someone could probably do trial
> > > > and error (reqbufs(1)/mmap/reqbufs(0)) but on CMA with large buffers
> > > > this introduces extra startup time.
> > > 
> > > Would such REQBUFS dance be really needed? Couldn't one simply try
> > > reqbufs(0) when it's really needed and if it fails then do the copy,
> > > otherwise just proceed normally?
> > 
> > In simple program, maybe, in modularized code, where the consumer of
> > these buffer (the one that is forced to make a copy) does not know the
> > origin of the DMABuf, it's a bit complicated.
> > 
> > In GStreamer as an example, the producer is a plugin called
> > libgstvideo4linux2.so, while the common consumer would be libgstkms.so.
> > They don't know each other. The pipeline would be described as:
> > 
> >   v4l2src ! kmssink
> > 
> > GStreamer does not have an explicit reclaiming mechanism. No one knew
> > about V4L2 restrictions when this was designed, DMABuf didn't exist and
> > GStreamer didn't have OMX support.
> > 
> > What we ended up crafting, as a plaster, is that when upstream element
> > (v4l2src) query a new allocation from downstream (kmssink), we always
> > copy and return any ancient buffers by copying. kmssink holds on a
> > buffer because we can't remove the scannout buffer on the display. This
> > is slow and inefficient, and also totally unneeded if the dmabuf
> > originate from other kernel subsystems (like DRM).
> > 
> > So what I'd like to be able to do, to support this in a more optimal
> > and generic way, is to mark the buffers that needs reclaiming before
> > letting them go. But for that, I would need a flag somewhere to tell me
> > this kernel allow this.
> 
> Okay, got it. Thanks for explaining it.
> 
> > 
> > You got the context, maybe the conclusion is that I should simply do
> > kernel version check, though I'm sure a lot of people will backport
> > this, which means that check won't work so well.
> > 
> > Let me know, I understand adding more API is not fun, but as nothing is
> > ever versionned in the linux-media world, it's really hard to detect
> > and use new behaviour while supporting what everyone currently run on
> > their systems.
> > 
> > I would probably try and find a way to implement your suggestion, and
> > then introduce a flag in the query itself, but I would need to think
> > about it a little more. It's not as simple as it look like
> > unfortunately.
> 
> It sounds like a good fit for a new capability in v4l2_requestbuffers
> and v4l2_create_buffers structs [1]. Perhaps something like
> V4L2_BUF_CAP_SUPPORTS_FREE_AFTER_EXPORT? Hans, what do you think?

Maybe V4L2_BUF_CAP_SUPPORTS_ORPHANS? With this patch, while the buffers
are in use, reqbufs(0) doesn't free them, they are orphaned. Also, this
patch allows reqbufs(0) not only after export, but also while mmapped.

regards
Philipp


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-11-10 Thread Tomasz Figa
On Sat, Nov 10, 2018 at 6:06 AM Nicolas Dufresne  wrote:
>
> Le jeudi 08 novembre 2018 à 16:45 +0900, Tomasz Figa a écrit :
> > > In this patch we should consider a way to tell userspace that this has
> > > been opt in, otherwise existing userspace will have to remain using
> > > sub-optimal copy based reclaiming in order to ensure that renegotiation
> > > can work on older kernel tool. At worst someone could probably do trial
> > > and error (reqbufs(1)/mmap/reqbufs(0)) but on CMA with large buffers
> > > this introduces extra startup time.
> >
> > Would such REQBUFS dance be really needed? Couldn't one simply try
> > reqbufs(0) when it's really needed and if it fails then do the copy,
> > otherwise just proceed normally?
>
> In simple program, maybe, in modularized code, where the consumer of
> these buffer (the one that is forced to make a copy) does not know the
> origin of the DMABuf, it's a bit complicated.
>
> In GStreamer as an example, the producer is a plugin called
> libgstvideo4linux2.so, while the common consumer would be libgstkms.so.
> They don't know each other. The pipeline would be described as:
>
>   v4l2src ! kmssink
>
> GStreamer does not have an explicit reclaiming mechanism. No one knew
> about V4L2 restrictions when this was designed, DMABuf didn't exist and
> GStreamer didn't have OMX support.
>
> What we ended up crafting, as a plaster, is that when upstream element
> (v4l2src) query a new allocation from downstream (kmssink), we always
> copy and return any ancient buffers by copying. kmssink holds on a
> buffer because we can't remove the scannout buffer on the display. This
> is slow and inefficient, and also totally unneeded if the dmabuf
> originate from other kernel subsystems (like DRM).
>
> So what I'd like to be able to do, to support this in a more optimal
> and generic way, is to mark the buffers that needs reclaiming before
> letting them go. But for that, I would need a flag somewhere to tell me
> this kernel allow this.

Okay, got it. Thanks for explaining it.

>
> You got the context, maybe the conclusion is that I should simply do
> kernel version check, though I'm sure a lot of people will backport
> this, which means that check won't work so well.
>
> Let me know, I understand adding more API is not fun, but as nothing is
> ever versionned in the linux-media world, it's really hard to detect
> and use new behaviour while supporting what everyone currently run on
> their systems.
>
> I would probably try and find a way to implement your suggestion, and
> then introduce a flag in the query itself, but I would need to think
> about it a little more. It's not as simple as it look like
> unfortunately.

It sounds like a good fit for a new capability in v4l2_requestbuffers
and v4l2_create_buffers structs [1]. Perhaps something like
V4L2_BUF_CAP_SUPPORTS_FREE_AFTER_EXPORT? Hans, what do you think?

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/include/uapi/linux/videodev2.h?h=next-20181109#n866

Best regards,
Tomasz


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-11-09 Thread Nicolas Dufresne
Le jeudi 08 novembre 2018 à 16:45 +0900, Tomasz Figa a écrit :
> > In this patch we should consider a way to tell userspace that this has
> > been opt in, otherwise existing userspace will have to remain using
> > sub-optimal copy based reclaiming in order to ensure that renegotiation
> > can work on older kernel tool. At worst someone could probably do trial
> > and error (reqbufs(1)/mmap/reqbufs(0)) but on CMA with large buffers
> > this introduces extra startup time.
> 
> Would such REQBUFS dance be really needed? Couldn't one simply try
> reqbufs(0) when it's really needed and if it fails then do the copy,
> otherwise just proceed normally?

In simple program, maybe, in modularized code, where the consumer of
these buffer (the one that is forced to make a copy) does not know the
origin of the DMABuf, it's a bit complicated.

In GStreamer as an example, the producer is a plugin called
libgstvideo4linux2.so, while the common consumer would be libgstkms.so.
They don't know each other. The pipeline would be described as:

  v4l2src ! kmssink

GStreamer does not have an explicit reclaiming mechanism. No one knew
about V4L2 restrictions when this was designed, DMABuf didn't exist and
GStreamer didn't have OMX support.

What we ended up crafting, as a plaster, is that when upstream element
(v4l2src) query a new allocation from downstream (kmssink), we always
copy and return any ancient buffers by copying. kmssink holds on a
buffer because we can't remove the scannout buffer on the display. This
is slow and inefficient, and also totally unneeded if the dmabuf
originate from other kernel subsystems (like DRM).

So what I'd like to be able to do, to support this in a more optimal
and generic way, is to mark the buffers that needs reclaiming before
letting them go. But for that, I would need a flag somewhere to tell me
this kernel allow this.

You got the context, maybe the conclusion is that I should simply do
kernel version check, though I'm sure a lot of people will backport
this, which means that check won't work so well.

Let me know, I understand adding more API is not fun, but as nothing is
ever versionned in the linux-media world, it's really hard to detect
and use new behaviour while supporting what everyone currently run on
their systems.

I would probably try and find a way to implement your suggestion, and
then introduce a flag in the query itself, but I would need to think
about it a little more. It's not as simple as it look like
unfortunately.

Nicolas


signature.asc
Description: This is a digitally signed message part


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-11-07 Thread Tomasz Figa
Hi Nicolas,

On Sat, Oct 27, 2018 at 6:38 PM Nicolas Dufresne  wrote:
>
> Le lundi 22 octobre 2018 à 12:37 +0900, Tomasz Figa a écrit :
> > Hi Philipp,
> >
> > On Mon, Oct 22, 2018 at 1:28 AM Philipp Zabel  wrote:
> > >
> > > On Wed, Oct 03, 2018 at 05:24:39PM +0900, Tomasz Figa wrote:
> > > [...]
> > > > > Yes, but that would fall in a complete redesign I guess. The buffer
> > > > > allocation scheme is very inflexible. You can't have buffers of two
> > > > > dimensions allocated at the same time for the same queue. Worst, you
> > > > > cannot leave even 1 buffer as your scannout buffer while reallocating
> > > > > new buffers, this is not permitted by the framework (in software). As 
> > > > > a
> > > > > side effect, there is no way to optimize the resolution changes, you
> > > > > even have to copy your scannout buffer on the CPU, to free it in order
> > > > > to proceed. Resolution changes are thus painfully slow, by design.
> > >
> > > [...]
> > > > Also, I fail to understand the scanout issue. If one exports a vb2
> > > > buffer to a DMA-buf and import it to the scanout engine, it can keep
> > > > scanning out from it as long as it want, because the DMA-buf will hold
> > > > a reference on the buffer, even if it's removed from the vb2 queue.
> > >
> > > REQBUFS 0 fails if the vb2 buffer is still in use, including from dmabuf
> > > attachments: vb2_buffer_in_use checks the num_users memop. The refcount
> > > returned by num_users shared between the vmarea handler and dmabuf ops,
> > > so any dmabuf attachment counts towards in_use.
> >
> > Ah, right. I've managed to completely forget about it, since we have a
> > downstream patch that we attempted to upstream earlier [1], but didn't
> > have a chance to follow up on the comments and there wasn't much
> > interest in it in general.
> >
> > [1] https://lore.kernel.org/patchwork/patch/607853/
> >
> > Perhaps it would be worth reviving?
>
> In this patch we should consider a way to tell userspace that this has
> been opt in, otherwise existing userspace will have to remain using
> sub-optimal copy based reclaiming in order to ensure that renegotiation
> can work on older kernel tool. At worst someone could probably do trial
> and error (reqbufs(1)/mmap/reqbufs(0)) but on CMA with large buffers
> this introduces extra startup time.

Would such REQBUFS dance be really needed? Couldn't one simply try
reqbufs(0) when it's really needed and if it fails then do the copy,
otherwise just proceed normally?

Best regards,
Tomasz


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-27 Thread Nicolas Dufresne
Le lundi 22 octobre 2018 à 12:37 +0900, Tomasz Figa a écrit :
> Hi Philipp,
> 
> On Mon, Oct 22, 2018 at 1:28 AM Philipp Zabel  wrote:
> > 
> > On Wed, Oct 03, 2018 at 05:24:39PM +0900, Tomasz Figa wrote:
> > [...]
> > > > Yes, but that would fall in a complete redesign I guess. The buffer
> > > > allocation scheme is very inflexible. You can't have buffers of two
> > > > dimensions allocated at the same time for the same queue. Worst, you
> > > > cannot leave even 1 buffer as your scannout buffer while reallocating
> > > > new buffers, this is not permitted by the framework (in software). As a
> > > > side effect, there is no way to optimize the resolution changes, you
> > > > even have to copy your scannout buffer on the CPU, to free it in order
> > > > to proceed. Resolution changes are thus painfully slow, by design.
> > 
> > [...]
> > > Also, I fail to understand the scanout issue. If one exports a vb2
> > > buffer to a DMA-buf and import it to the scanout engine, it can keep
> > > scanning out from it as long as it want, because the DMA-buf will hold
> > > a reference on the buffer, even if it's removed from the vb2 queue.
> > 
> > REQBUFS 0 fails if the vb2 buffer is still in use, including from dmabuf
> > attachments: vb2_buffer_in_use checks the num_users memop. The refcount
> > returned by num_users shared between the vmarea handler and dmabuf ops,
> > so any dmabuf attachment counts towards in_use.
> 
> Ah, right. I've managed to completely forget about it, since we have a
> downstream patch that we attempted to upstream earlier [1], but didn't
> have a chance to follow up on the comments and there wasn't much
> interest in it in general.
> 
> [1] https://lore.kernel.org/patchwork/patch/607853/
> 
> Perhaps it would be worth reviving?

In this patch we should consider a way to tell userspace that this has
been opt in, otherwise existing userspace will have to remain using
sub-optimal copy based reclaiming in order to ensure that renegotiation
can work on older kernel tool. At worst someone could probably do trial
and error (reqbufs(1)/mmap/reqbufs(0)) but on CMA with large buffers
this introduces extra startup time.

> 
> Best regards,
> Tomasz


signature.asc
Description: This is a digitally signed message part


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-26 Thread Laurent Pinchart
Hi Tomasz,

On Friday, 26 October 2018 14:41:26 EEST Tomasz Figa wrote:
> On Thu, Sep 20, 2018 at 11:42 PM Hans Verkuil  wrote:
> > Some parts of the V4L2 API are awkward to use and I think it would be
> > a good idea to look at possible candidates for that.
> > 
> > Examples are the ioctls that use struct v4l2_buffer: the multiplanar
> > support is really horrible, and writing code to support both single and
> > multiplanar is hard. We are also running out of fields and the timeval
> > isn't y2038 compliant.
> > 
> > A proof-of-concept is here:
> > 
> > https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=a
> > 95549df06d9900f3559afdbb9da06bd4b22d1f3
> > 
> > It's a bit old, but it gives a good impression of what I have in mind.
> 
> On a related, but slightly different note, I'm wondering how we should
> handle a case where we have an M format (e.g. NV12M with 2 memory
> planes), but only 1 DMA-buf with all planes to import. That generally
> means that we have to use the same DMA-buf FD with an offset for each
> plane. In theory, v4l2_plane::data_offset could be used for this, but
> the documentation says that it should be set by the application only
> for OUTPUT planes. Moreover, existing drivers tend to just ignore
> it...

The following patches may be of interest.

https://patchwork.linuxtv.org/patch/29177/
https://patchwork.linuxtv.org/patch/29178/

> There is also the opposite problem. Sometimes the application is given
> 3 different FDs but pointing to the same buffer. If it has to work
> with a video device that only supports non-M formats, it can either
> fail, making it unusable, or blindly assume that they all point to the
> same buffer and just give the first FD to the video device (we do it
> in Chromium, since our allocator is guaranteed to keep all planes of
> planar formats in one buffer, if to be used with V4L2).
> 
> Something that we could do is allowing the QBUF semantics of M formats
> for non-M formats, where the application would fill the planes[] array
> for all planes with all the FDs it has and the kernel could then
> figure out if they point to the same buffer (i.e. resolve to the same
> dma_buf struct) or fail if not.
> 
> [...]
> 
> > Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps,
> > again in order to improve single vs multiplanar handling.
> 
> I'd definitely be more than happy to see the plane handling unified
> between non-M and M formats, in general. The list of problems with
> current interface:
> 
> 1) The userspace has to hardcode the computations of bytesperline for
> chroma planes of non-M formats (while they are reported for M
> formats).
> 
> 2) Similarly, offsets of the planes in the buffer for non-M formats
> must be explicitly calculated in the application,
> 
> 3) Drivers have to explicitly handle both non-M and M formats or
> otherwise they would suffer from issues with application compatibility
> or sharing buffers with other devices (one supporting only M and the
> other only non-M),
> 
> 4) Inconsistency in the meaning of planes[0].sizeimage for non-M
> formats and M formats, making it impossible to use planes[0].sizeimage
> to set the luma plane size in the hardware for non-M formats (since
> it's the total size of all planes).
> 
> I might have probably forgotten about something, but generally fixing
> the 4 above, would be a really big step forward.

-- 
Regards,

Laurent Pinchart





Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-26 Thread Tomasz Figa
On Thu, Sep 20, 2018 at 11:42 PM Hans Verkuil  wrote:
>
> Some parts of the V4L2 API are awkward to use and I think it would be
> a good idea to look at possible candidates for that.
>
> Examples are the ioctls that use struct v4l2_buffer: the multiplanar support 
> is
> really horrible, and writing code to support both single and multiplanar is 
> hard.
> We are also running out of fields and the timeval isn't y2038 compliant.
>
> A proof-of-concept is here:
>
> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=a95549df06d9900f3559afdbb9da06bd4b22d1f3
>
> It's a bit old, but it gives a good impression of what I have in mind.

On a related, but slightly different note, I'm wondering how we should
handle a case where we have an M format (e.g. NV12M with 2 memory
planes), but only 1 DMA-buf with all planes to import. That generally
means that we have to use the same DMA-buf FD with an offset for each
plane. In theory, v4l2_plane::data_offset could be used for this, but
the documentation says that it should be set by the application only
for OUTPUT planes. Moreover, existing drivers tend to just ignore
it...


There is also the opposite problem. Sometimes the application is given
3 different FDs but pointing to the same buffer. If it has to work
with a video device that only supports non-M formats, it can either
fail, making it unusable, or blindly assume that they all point to the
same buffer and just give the first FD to the video device (we do it
in Chromium, since our allocator is guaranteed to keep all planes of
planar formats in one buffer, if to be used with V4L2).

Something that we could do is allowing the QBUF semantics of M formats
for non-M formats, where the application would fill the planes[] array
for all planes with all the FDs it has and the kernel could then
figure out if they point to the same buffer (i.e. resolve to the same
dma_buf struct) or fail if not.

[...]

> Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps, again in
> order to improve single vs multiplanar handling.

I'd definitely be more than happy to see the plane handling unified
between non-M and M formats, in general. The list of problems with
current interface:

1) The userspace has to hardcode the computations of bytesperline for
chroma planes of non-M formats (while they are reported for M
formats).

2) Similarly, offsets of the planes in the buffer for non-M formats
must be explicitly calculated in the application,

3) Drivers have to explicitly handle both non-M and M formats or
otherwise they would suffer from issues with application compatibility
or sharing buffers with other devices (one supporting only M and the
other only non-M),

4) Inconsistency in the meaning of planes[0].sizeimage for non-M
formats and M formats, making it impossible to use planes[0].sizeimage
to set the luma plane size in the hardware for non-M formats (since
it's the total size of all planes).

I might have probably forgotten about something, but generally fixing
the 4 above, would be a really big step forward.

Best regards,
Tomasz


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-23 Thread Philipp Zabel
Hi Tomasz,

On Mon, Oct 22, 2018 at 12:37:57PM +0900, Tomasz Figa wrote:
[...]
> On Mon, Oct 22, 2018 at 1:28 AM Philipp Zabel  wrote:
[...]
> > REQBUFS 0 fails if the vb2 buffer is still in use, including from dmabuf
> > attachments: vb2_buffer_in_use checks the num_users memop. The refcount
> > returned by num_users shared between the vmarea handler and dmabuf ops,
> > so any dmabuf attachment counts towards in_use.
> 
> Ah, right. I've managed to completely forget about it, since we have a
> downstream patch that we attempted to upstream earlier [1], but didn't
> have a chance to follow up on the comments and there wasn't much
> interest in it in general.
> 
> [1] https://lore.kernel.org/patchwork/patch/607853/
> 
> Perhaps it would be worth reviving?

Yes, thanks for the pointer. I've completely missed that patch.

I was under the mistaken impression that there was some technical reason
to keep the queue around until after the last dmabuf attachment is gone,
but everything is properly refcounted.

regards
Philipp


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-22 Thread Dave Stevenson
Hi Tomasz

On Mon, 22 Oct 2018 at 04:38, Tomasz Figa  wrote:
>
> Hi Philipp,
>
> On Mon, Oct 22, 2018 at 1:28 AM Philipp Zabel  wrote:
> >
> > On Wed, Oct 03, 2018 at 05:24:39PM +0900, Tomasz Figa wrote:
> > [...]
> > > > Yes, but that would fall in a complete redesign I guess. The buffer
> > > > allocation scheme is very inflexible. You can't have buffers of two
> > > > dimensions allocated at the same time for the same queue. Worst, you
> > > > cannot leave even 1 buffer as your scannout buffer while reallocating
> > > > new buffers, this is not permitted by the framework (in software). As a
> > > > side effect, there is no way to optimize the resolution changes, you
> > > > even have to copy your scannout buffer on the CPU, to free it in order
> > > > to proceed. Resolution changes are thus painfully slow, by design.
> > [...]
> > > Also, I fail to understand the scanout issue. If one exports a vb2
> > > buffer to a DMA-buf and import it to the scanout engine, it can keep
> > > scanning out from it as long as it want, because the DMA-buf will hold
> > > a reference on the buffer, even if it's removed from the vb2 queue.
> >
> > REQBUFS 0 fails if the vb2 buffer is still in use, including from dmabuf
> > attachments: vb2_buffer_in_use checks the num_users memop. The refcount
> > returned by num_users shared between the vmarea handler and dmabuf ops,
> > so any dmabuf attachment counts towards in_use.
>
> Ah, right. I've managed to completely forget about it, since we have a
> downstream patch that we attempted to upstream earlier [1], but didn't
> have a chance to follow up on the comments and there wasn't much
> interest in it in general.
>
> [1] https://lore.kernel.org/patchwork/patch/607853/
>
> Perhaps it would be worth reviving?

There's been recent interest in this from the LibreElec folk as they
are wanting to shift to using V4L2 M2M for codecs on as many platforms
as possible. They'll use it wrapped by FFmpeg, and they're working on
patches to get FFmpeg exporting dmabufs to be consumed by DRM.

Hans had pointed me at your patch, and I've been using it to get
around the issue.
I did start writing the requested docs changes [1], but I'm afraid
upstreaming stuff has been a low priority for me recently. If that
patch suffices, then feel free to pick it up. Otherwise I'll do so
when I get some time (likely to be a fair number of weeks away).

  Dave

[1] 
https://github.com/6by9/linux/commit/09619d9427d9d44ae5e72e0e85e64cb3ea9727da


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-21 Thread Tomasz Figa
Hi Philipp,

On Mon, Oct 22, 2018 at 1:28 AM Philipp Zabel  wrote:
>
> On Wed, Oct 03, 2018 at 05:24:39PM +0900, Tomasz Figa wrote:
> [...]
> > > Yes, but that would fall in a complete redesign I guess. The buffer
> > > allocation scheme is very inflexible. You can't have buffers of two
> > > dimensions allocated at the same time for the same queue. Worst, you
> > > cannot leave even 1 buffer as your scannout buffer while reallocating
> > > new buffers, this is not permitted by the framework (in software). As a
> > > side effect, there is no way to optimize the resolution changes, you
> > > even have to copy your scannout buffer on the CPU, to free it in order
> > > to proceed. Resolution changes are thus painfully slow, by design.
> [...]
> > Also, I fail to understand the scanout issue. If one exports a vb2
> > buffer to a DMA-buf and import it to the scanout engine, it can keep
> > scanning out from it as long as it want, because the DMA-buf will hold
> > a reference on the buffer, even if it's removed from the vb2 queue.
>
> REQBUFS 0 fails if the vb2 buffer is still in use, including from dmabuf
> attachments: vb2_buffer_in_use checks the num_users memop. The refcount
> returned by num_users shared between the vmarea handler and dmabuf ops,
> so any dmabuf attachment counts towards in_use.

Ah, right. I've managed to completely forget about it, since we have a
downstream patch that we attempted to upstream earlier [1], but didn't
have a chance to follow up on the comments and there wasn't much
interest in it in general.

[1] https://lore.kernel.org/patchwork/patch/607853/

Perhaps it would be worth reviving?

Best regards,
Tomasz


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-21 Thread Philipp Zabel
On Wed, Oct 03, 2018 at 05:24:39PM +0900, Tomasz Figa wrote:
[...]
> > Yes, but that would fall in a complete redesign I guess. The buffer
> > allocation scheme is very inflexible. You can't have buffers of two
> > dimensions allocated at the same time for the same queue. Worst, you
> > cannot leave even 1 buffer as your scannout buffer while reallocating
> > new buffers, this is not permitted by the framework (in software). As a
> > side effect, there is no way to optimize the resolution changes, you
> > even have to copy your scannout buffer on the CPU, to free it in order
> > to proceed. Resolution changes are thus painfully slow, by design.
[...]
> Also, I fail to understand the scanout issue. If one exports a vb2
> buffer to a DMA-buf and import it to the scanout engine, it can keep
> scanning out from it as long as it want, because the DMA-buf will hold
> a reference on the buffer, even if it's removed from the vb2 queue.

REQBUFS 0 fails if the vb2 buffer is still in use, including from dmabuf
attachments: vb2_buffer_in_use checks the num_users memop. The refcount
returned by num_users shared between the vmarea handler and dmabuf ops,
so any dmabuf attachment counts towards in_use.

regards
Philipp


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-21 Thread Philipp Zabel
On Thu, Sep 20, 2018 at 02:14:07PM -0400, Nicolas Dufresne wrote:
> > Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps, again 
> > in
> > order to improve single vs multiplanar handling.
> 
> Yes, but that would fall in a complete redesign I guess. The buffer
> allocation scheme is very inflexible. You can't have buffers of two
> dimensions allocated at the same time for the same queue. Worst, you
> cannot leave even 1 buffer as your scannout buffer while reallocating
> new buffers, this is not permitted by the framework (in software). As a
> side effect, there is no way to optimize the resolution changes, you
> even have to copy your scannout buffer on the CPU, to free it in order
> to proceed. Resolution changes are thus painfully slow, by design.

I've seen the same issue when exporting dmabufs from a V4L2 decoder and
importing them into OpenGL textures. Mesa caches state so aggressively,
even destroying all textures and flushing OpenGL is not enough to remove
all references to the imported resource. Only after another render step
the dmabuf fds are closed and thus make REQBUFS 0 possible on the
exporting capture queue.
This leads to a catch-22 situation during a resolution change, because
we'd already need the new buffers to do an OpenGL render without the old
buffers, so that the old buffers can be released back to V4L2, so that
V4L2 can allocate the new buffers...
It would be very helpful in this situation if exported dmabufs could
just be orphaned by REQBUFS 0.

regards
Philipp


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-20 Thread Sakari Ailus
On Wed, Oct 17, 2018 at 11:46:54PM +0300, Laurent Pinchart wrote:
> Hi Hans,
> 
> On Wednesday, 17 October 2018 12:16:14 EEST Hans Verkuil wrote:
> > On 10/17/2018 10:57 AM, Laurent Pinchart wrote:
> > > On Thursday, 20 September 2018 17:42:15 EEST Hans Verkuil wrote:
> > >> Some parts of the V4L2 API are awkward to use and I think it would be
> > >> a good idea to look at possible candidates for that.
> > >> 
> > >> Examples are the ioctls that use struct v4l2_buffer: the multiplanar
> > >> support is really horrible, and writing code to support both single and
> > >> multiplanar is hard. We are also running out of fields and the timeval
> > >> isn't y2038 compliant.
> > >> 
> > >> A proof-of-concept is here:
> > >> 
> > >> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=
> > >> a95 549df06d9900f3559afdbb9da06bd4b22d1f3
> > >> 
> > >> It's a bit old, but it gives a good impression of what I have in mind.
> > >> 
> > >> Another candidate is
> > >> VIDIOC_SUBDEV_ENUM_FRAME_INTERVAL/VIDIOC_ENUM_FRAMEINTERVALS: expressing
> > >> frame intervals as a fraction is really awkward and so is the fact that
> > >> the subdev and 'normal' ioctls are not the same.
> > >> 
> > >> Would using nanoseconds or something along those lines for intervals be
> > >> better?
> > >> 
> > >> I have similar concerns with VIDIOC_SUBDEV_ENUM_FRAME_SIZE where there is
> > >> no stepwise option, making it different from VIDIOC_ENUM_FRAMESIZES. But
> > >> it should be possible to extend VIDIOC_SUBDEV_ENUM_FRAME_SIZE with
> > >> stepwise support, I think.
> > > 
> > > If we refresh the enumeration ioctls, I propose moving away from the one
> > > syscall per value model, and returning everything in one
> > > (userspace-allocated) buffer. The same could apply to all enumerations
> > > (such as controls for instance), even if we don't address them all in one
> > > go.
> > 
> > I'm not convinced about this, primarily because I think these enums are done
> > at configuration time, and rarely if ever while streaming. So does it
> > really make a difference in practice? Feedback on this would be welcome
> > during the summit meeting.
> 
> It's indeed not a hot path in most cases, but if you enumerate formats, frame 
> sizes and frame intervals, you end up with three nested loops with lots of 
> ioctl calls. This makes the code on the userspace side more complex than it 
> should be, and incurs an overhead. If we rework the enumeration ioctls for 
> other reasons, I believe we should fix this as wel.

Agreed; an interface that can tell you how many entries there are, and then
give those to you is more straightforward to use and more efficient. I
wouldn't change how things work just the sake of doing that, though; there
needs to be something else that needs to be changed and cannot be supported
using the existing API as well.

-- 
Sakari Ailus
e-mail: sakari.ai...@iki.fi


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-17 Thread Laurent Pinchart
Hi Hans,

On Wednesday, 17 October 2018 12:16:14 EEST Hans Verkuil wrote:
> On 10/17/2018 10:57 AM, Laurent Pinchart wrote:
> > On Thursday, 20 September 2018 17:42:15 EEST Hans Verkuil wrote:
> >> Some parts of the V4L2 API are awkward to use and I think it would be
> >> a good idea to look at possible candidates for that.
> >> 
> >> Examples are the ioctls that use struct v4l2_buffer: the multiplanar
> >> support is really horrible, and writing code to support both single and
> >> multiplanar is hard. We are also running out of fields and the timeval
> >> isn't y2038 compliant.
> >> 
> >> A proof-of-concept is here:
> >> 
> >> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=
> >> a95 549df06d9900f3559afdbb9da06bd4b22d1f3
> >> 
> >> It's a bit old, but it gives a good impression of what I have in mind.
> >> 
> >> Another candidate is
> >> VIDIOC_SUBDEV_ENUM_FRAME_INTERVAL/VIDIOC_ENUM_FRAMEINTERVALS: expressing
> >> frame intervals as a fraction is really awkward and so is the fact that
> >> the subdev and 'normal' ioctls are not the same.
> >> 
> >> Would using nanoseconds or something along those lines for intervals be
> >> better?
> >> 
> >> I have similar concerns with VIDIOC_SUBDEV_ENUM_FRAME_SIZE where there is
> >> no stepwise option, making it different from VIDIOC_ENUM_FRAMESIZES. But
> >> it should be possible to extend VIDIOC_SUBDEV_ENUM_FRAME_SIZE with
> >> stepwise support, I think.
> > 
> > If we refresh the enumeration ioctls, I propose moving away from the one
> > syscall per value model, and returning everything in one
> > (userspace-allocated) buffer. The same could apply to all enumerations
> > (such as controls for instance), even if we don't address them all in one
> > go.
> 
> I'm not convinced about this, primarily because I think these enums are done
> at configuration time, and rarely if ever while streaming. So does it
> really make a difference in practice? Feedback on this would be welcome
> during the summit meeting.

It's indeed not a hot path in most cases, but if you enumerate formats, frame 
sizes and frame intervals, you end up with three nested loops with lots of 
ioctl calls. This makes the code on the userspace side more complex than it 
should be, and incurs an overhead. If we rework the enumeration ioctls for 
other reasons, I believe we should fix this as wel.

> >> Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps,
> >> again in order to improve single vs multiplanar handling.
> > 
> > If we refresh the G/S/TRY format ioctls (and I think we should), I would
> > propose moving to a G/S model with ACTIVE and TRY formats, as for subdevs.
> > This should make it easier to construct full device states internally, in
> > order to implement proper request API support for formats. We should then
> > also document much better how formats and selection rectangles interact.
> 
> Interesting. I was planning a slide for this.
> 
> >> It is not the intention to come to a full design, it's more to test the
> >> waters so to speak.
> > 
> > Another item that we're missing is a way to delete buffers (the
> > counterpart of VIDIOC_CREATE_BUFS). As this will introduce holes in the
> > buffer indices, we might also need to revamp VIDIOC_CREATE_BUFS (which
> > would give us a chance to move away from the format structure passed to
> > that ioctl).
> 
> I'm just writing the slide for that :-)

Thanks :-)

-- 
Regards,

Laurent Pinchart





Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-17 Thread Hans Verkuil
On 10/17/2018 10:57 AM, Laurent Pinchart wrote:
> Hi Hans,
> 
> On Thursday, 20 September 2018 17:42:15 EEST Hans Verkuil wrote:
>> Some parts of the V4L2 API are awkward to use and I think it would be
>> a good idea to look at possible candidates for that.
>>
>> Examples are the ioctls that use struct v4l2_buffer: the multiplanar support
>> is really horrible, and writing code to support both single and multiplanar
>> is hard. We are also running out of fields and the timeval isn't y2038
>> compliant.
>>
>> A proof-of-concept is here:
>>
>> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=a95
>> 549df06d9900f3559afdbb9da06bd4b22d1f3
>>
>> It's a bit old, but it gives a good impression of what I have in mind.
>>
>> Another candidate is
>> VIDIOC_SUBDEV_ENUM_FRAME_INTERVAL/VIDIOC_ENUM_FRAMEINTERVALS: expressing
>> frame intervals as a fraction is really awkward and so is the fact that the
>> subdev and 'normal' ioctls are not the same.
>>
>> Would using nanoseconds or something along those lines for intervals be
>> better?
>>
>> I have similar concerns with VIDIOC_SUBDEV_ENUM_FRAME_SIZE where there is no
>> stepwise option, making it different from VIDIOC_ENUM_FRAMESIZES. But it
>> should be possible to extend VIDIOC_SUBDEV_ENUM_FRAME_SIZE with stepwise
>> support, I think.
> 
> If we refresh the enumeration ioctls, I propose moving away from the one 
> syscall per value model, and returning everything in one 
> (userspace-allocated) 
> buffer. The same could apply to all enumerations (such as controls for 
> instance), even if we don't address them all in one go.

I'm not convinced about this, primarily because I think these enums are done
at configuration time, and rarely if ever while streaming. So does it really
make a difference in practice? Feedback on this would be welcome during the
summit meeting.

> 
>> Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps, again
>> in order to improve single vs multiplanar handling.
> 
> If we refresh the G/S/TRY format ioctls (and I think we should), I would 
> propose moving to a G/S model with ACTIVE and TRY formats, as for subdevs. 
> This should make it easier to construct full device states internally, in 
> order to implement proper request API support for formats. We should then 
> also 
> document much better how formats and selection rectangles interact.

Interesting. I was planning a slide for this.

>> It is not the intention to come to a full design, it's more to test the
>> waters so to speak.
> 
> Another item that we're missing is a way to delete buffers (the counterpart 
> of 
> VIDIOC_CREATE_BUFS). As this will introduce holes in the buffer indices, we 
> might also need to revamp VIDIOC_CREATE_BUFS (which would give us a chance to 
> move away from the format structure passed to that ioctl).
> 

I'm just writing the slide for that :-)

Regards,

Hans


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-17 Thread Laurent Pinchart
Hi Hans,

On Thursday, 20 September 2018 17:42:15 EEST Hans Verkuil wrote:
> Some parts of the V4L2 API are awkward to use and I think it would be
> a good idea to look at possible candidates for that.
> 
> Examples are the ioctls that use struct v4l2_buffer: the multiplanar support
> is really horrible, and writing code to support both single and multiplanar
> is hard. We are also running out of fields and the timeval isn't y2038
> compliant.
> 
> A proof-of-concept is here:
> 
> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=a95
> 549df06d9900f3559afdbb9da06bd4b22d1f3
> 
> It's a bit old, but it gives a good impression of what I have in mind.
> 
> Another candidate is
> VIDIOC_SUBDEV_ENUM_FRAME_INTERVAL/VIDIOC_ENUM_FRAMEINTERVALS: expressing
> frame intervals as a fraction is really awkward and so is the fact that the
> subdev and 'normal' ioctls are not the same.
> 
> Would using nanoseconds or something along those lines for intervals be
> better?
> 
> I have similar concerns with VIDIOC_SUBDEV_ENUM_FRAME_SIZE where there is no
> stepwise option, making it different from VIDIOC_ENUM_FRAMESIZES. But it
> should be possible to extend VIDIOC_SUBDEV_ENUM_FRAME_SIZE with stepwise
> support, I think.

If we refresh the enumeration ioctls, I propose moving away from the one 
syscall per value model, and returning everything in one (userspace-allocated) 
buffer. The same could apply to all enumerations (such as controls for 
instance), even if we don't address them all in one go.

> Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps, again
> in order to improve single vs multiplanar handling.

If we refresh the G/S/TRY format ioctls (and I think we should), I would 
propose moving to a G/S model with ACTIVE and TRY formats, as for subdevs. 
This should make it easier to construct full device states internally, in 
order to implement proper request API support for formats. We should then also 
document much better how formats and selection rectangles interact.

> It is not the intention to come to a full design, it's more to test the
> waters so to speak.

Another item that we're missing is a way to delete buffers (the counterpart of 
VIDIOC_CREATE_BUFS). As this will introduce holes in the buffer indices, we 
might also need to revamp VIDIOC_CREATE_BUFS (which would give us a chance to 
move away from the format structure passed to that ioctl).

-- 
Regards,

Laurent Pinchart





Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-16 Thread Hans Verkuil
On 10/03/18 10:24, Tomasz Figa wrote:
> On Fri, Sep 21, 2018 at 3:14 AM Nicolas Dufresne  wrote:
>>
>> Le jeudi 20 septembre 2018 à 16:42 +0200, Hans Verkuil a écrit :
>>> Some parts of the V4L2 API are awkward to use and I think it would be
>>> a good idea to look at possible candidates for that.
>>>
>>> Examples are the ioctls that use struct v4l2_buffer: the multiplanar 
>>> support is
>>> really horrible, and writing code to support both single and multiplanar is 
>>> hard.
>>> We are also running out of fields and the timeval isn't y2038 compliant.
>>>
>>> A proof-of-concept is here:
>>>
>>> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=a95549df06d9900f3559afdbb9da06bd4b22d1f3
>>>
>>> It's a bit old, but it gives a good impression of what I have in mind.
>>>
>>> Another candidate is 
>>> VIDIOC_SUBDEV_ENUM_FRAME_INTERVAL/VIDIOC_ENUM_FRAMEINTERVALS:
>>> expressing frame intervals as a fraction is really awkward and so is the 
>>> fact
>>> that the subdev and 'normal' ioctls are not the same.
>>>
>>> Would using nanoseconds or something along those lines for intervals be 
>>> better?
>>
>> This one is not a good idea, because you cannot represent well known
>> rates used a lot in the field. Like 6/1001 also known as 59.94 Hz.
>> You could endup with drift issues.
>>
>> For me, what is the most difficult with this API is the fact that it
>> uses frame internal (duration) instead of frame rate.
>>
>>>
>>> I have similar concerns with VIDIOC_SUBDEV_ENUM_FRAME_SIZE where there is no
>>> stepwise option, making it different from VIDIOC_ENUM_FRAMESIZES. But it 
>>> should
>>> be possible to extend VIDIOC_SUBDEV_ENUM_FRAME_SIZE with stepwise support, I
>>> think.
>>
>> One of the thing to fix, maybe it's doable now, is the differentiation
>> between allocation size and display size. Pretty much all video capture
>> code assumes this is display size and ignores the selection API. This
>> should be documented explicitly.
> 
> Technically, there is already a distinction between allocation and
> display size at format level - width and height could be interpreted
> as the rectangle containing the captured video, while bytesperline and
> sizeimage would match to the allocation size. The behavior between
> drivers seems to be extremely variable - some would just enforce
> bytesperline = bpp*width and sizeimage = bytesperline*height, while
> others would actually give control over them to the user space.
> 
> It's a bit more complicated with video decoders, because the stream,
> in addition to the 2 sizes mentioned before, is also characterized by
> coded size, which corresponds to the actual number of pixels encoded
> in the stream, even if not all contain pixels to be displayed. This is
> something that needs to be specified explicitly (and is, in my
> documentation patches) in the Codec Interface.
> 
>>
>> In fact, the display/allocation dimension isn't very nice, as both
>> information overlaps in structures. As an example, you call S_FMT with
>> a display size you want, and it will return you an allocation size
>> (which yes, can be smaller, because we always round to the middle).
>>
>>>
>>> Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps, again 
>>> in
>>> order to improve single vs multiplanar handling.
>>
>> Yes, but that would fall in a complete redesign I guess. The buffer
>> allocation scheme is very inflexible. You can't have buffers of two
>> dimensions allocated at the same time for the same queue. Worst, you
>> cannot leave even 1 buffer as your scannout buffer while reallocating
>> new buffers, this is not permitted by the framework (in software). As a
>> side effect, there is no way to optimize the resolution changes, you
>> even have to copy your scannout buffer on the CPU, to free it in order
>> to proceed. Resolution changes are thus painfully slow, by design.
> 
> Hmm? There is VIDIOC_CREATEBUFS, which can allows you to allocate
> buffers for explicitly specified format, with other buffers already
> existing in the queue.

Of course, we are missing the VIDIOC_DELETEBUFS ioctl. Also, CREATEBUFS
is rather awful. Using v4l2_format in the struct was a major mistake.

> 
> Also, I fail to understand the scanout issue. If one exports a vb2
> buffer to a DMA-buf and import it to the scanout engine, it can keep
> scanning out from it as long as it want, because the DMA-buf will hold
> a reference on the buffer, even if it's removed from the vb2 queue.
> 
>>
>> You also cannot switch from internal buffers to importing buffers
>> easily (in some case you, like encoder, you cannot do that without
>> flushing the encoder state).
> 
> Hmm, technically VIDIOC_CREATEBUFS accepts the "memory" type value,
> but I'm not sure what happens if the queue already has buffers
> requested with different one.

It is not allowed to mix memory types, that will return -EINVAL.

I have to say that this is the first time I had this request.

It is probably doable, but the 

Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-10-03 Thread Tomasz Figa
On Fri, Sep 21, 2018 at 3:14 AM Nicolas Dufresne  wrote:
>
> Le jeudi 20 septembre 2018 à 16:42 +0200, Hans Verkuil a écrit :
> > Some parts of the V4L2 API are awkward to use and I think it would be
> > a good idea to look at possible candidates for that.
> >
> > Examples are the ioctls that use struct v4l2_buffer: the multiplanar 
> > support is
> > really horrible, and writing code to support both single and multiplanar is 
> > hard.
> > We are also running out of fields and the timeval isn't y2038 compliant.
> >
> > A proof-of-concept is here:
> >
> > https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=a95549df06d9900f3559afdbb9da06bd4b22d1f3
> >
> > It's a bit old, but it gives a good impression of what I have in mind.
> >
> > Another candidate is 
> > VIDIOC_SUBDEV_ENUM_FRAME_INTERVAL/VIDIOC_ENUM_FRAMEINTERVALS:
> > expressing frame intervals as a fraction is really awkward and so is the 
> > fact
> > that the subdev and 'normal' ioctls are not the same.
> >
> > Would using nanoseconds or something along those lines for intervals be 
> > better?
>
> This one is not a good idea, because you cannot represent well known
> rates used a lot in the field. Like 6/1001 also known as 59.94 Hz.
> You could endup with drift issues.
>
> For me, what is the most difficult with this API is the fact that it
> uses frame internal (duration) instead of frame rate.
>
> >
> > I have similar concerns with VIDIOC_SUBDEV_ENUM_FRAME_SIZE where there is no
> > stepwise option, making it different from VIDIOC_ENUM_FRAMESIZES. But it 
> > should
> > be possible to extend VIDIOC_SUBDEV_ENUM_FRAME_SIZE with stepwise support, I
> > think.
>
> One of the thing to fix, maybe it's doable now, is the differentiation
> between allocation size and display size. Pretty much all video capture
> code assumes this is display size and ignores the selection API. This
> should be documented explicitly.

Technically, there is already a distinction between allocation and
display size at format level - width and height could be interpreted
as the rectangle containing the captured video, while bytesperline and
sizeimage would match to the allocation size. The behavior between
drivers seems to be extremely variable - some would just enforce
bytesperline = bpp*width and sizeimage = bytesperline*height, while
others would actually give control over them to the user space.

It's a bit more complicated with video decoders, because the stream,
in addition to the 2 sizes mentioned before, is also characterized by
coded size, which corresponds to the actual number of pixels encoded
in the stream, even if not all contain pixels to be displayed. This is
something that needs to be specified explicitly (and is, in my
documentation patches) in the Codec Interface.

>
> In fact, the display/allocation dimension isn't very nice, as both
> information overlaps in structures. As an example, you call S_FMT with
> a display size you want, and it will return you an allocation size
> (which yes, can be smaller, because we always round to the middle).
>
> >
> > Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps, again 
> > in
> > order to improve single vs multiplanar handling.
>
> Yes, but that would fall in a complete redesign I guess. The buffer
> allocation scheme is very inflexible. You can't have buffers of two
> dimensions allocated at the same time for the same queue. Worst, you
> cannot leave even 1 buffer as your scannout buffer while reallocating
> new buffers, this is not permitted by the framework (in software). As a
> side effect, there is no way to optimize the resolution changes, you
> even have to copy your scannout buffer on the CPU, to free it in order
> to proceed. Resolution changes are thus painfully slow, by design.

Hmm? There is VIDIOC_CREATEBUFS, which can allows you to allocate
buffers for explicitly specified format, with other buffers already
existing in the queue.

Also, I fail to understand the scanout issue. If one exports a vb2
buffer to a DMA-buf and import it to the scanout engine, it can keep
scanning out from it as long as it want, because the DMA-buf will hold
a reference on the buffer, even if it's removed from the vb2 queue.

>
> You also cannot switch from internal buffers to importing buffers
> easily (in some case you, like encoder, you cannot do that without
> flushing the encoder state).

Hmm, technically VIDIOC_CREATEBUFS accepts the "memory" type value,
but I'm not sure what happens if the queue already has buffers
requested with different one.

Best regards,
Tomasz


Re: [RFP] Which V4L2 ioctls could be replaced by better versions?

2018-09-20 Thread Nicolas Dufresne
Le jeudi 20 septembre 2018 à 16:42 +0200, Hans Verkuil a écrit :
> Some parts of the V4L2 API are awkward to use and I think it would be
> a good idea to look at possible candidates for that.
> 
> Examples are the ioctls that use struct v4l2_buffer: the multiplanar support 
> is
> really horrible, and writing code to support both single and multiplanar is 
> hard.
> We are also running out of fields and the timeval isn't y2038 compliant.
> 
> A proof-of-concept is here:
> 
> https://git.linuxtv.org/hverkuil/media_tree.git/commit/?h=v4l2-buffer=a95549df06d9900f3559afdbb9da06bd4b22d1f3
> 
> It's a bit old, but it gives a good impression of what I have in mind.
> 
> Another candidate is 
> VIDIOC_SUBDEV_ENUM_FRAME_INTERVAL/VIDIOC_ENUM_FRAMEINTERVALS:
> expressing frame intervals as a fraction is really awkward and so is the fact
> that the subdev and 'normal' ioctls are not the same.
> 
> Would using nanoseconds or something along those lines for intervals be 
> better?

This one is not a good idea, because you cannot represent well known
rates used a lot in the field. Like 6/1001 also known as 59.94 Hz.
You could endup with drift issues.

For me, what is the most difficult with this API is the fact that it
uses frame internal (duration) instead of frame rate.

> 
> I have similar concerns with VIDIOC_SUBDEV_ENUM_FRAME_SIZE where there is no
> stepwise option, making it different from VIDIOC_ENUM_FRAMESIZES. But it 
> should
> be possible to extend VIDIOC_SUBDEV_ENUM_FRAME_SIZE with stepwise support, I
> think.

One of the thing to fix, maybe it's doable now, is the differentiation
between allocation size and display size. Pretty much all video capture
code assumes this is display size and ignores the selection API. This
should be documented explicitly.

In fact, the display/allocation dimension isn't very nice, as both
information overlaps in structures. As an example, you call S_FMT with
a display size you want, and it will return you an allocation size
(which yes, can be smaller, because we always round to the middle).

> 
> Do we have more ioctls that could use a refresh? S/G/TRY_FMT perhaps, again in
> order to improve single vs multiplanar handling.

Yes, but that would fall in a complete redesign I guess. The buffer
allocation scheme is very inflexible. You can't have buffers of two
dimensions allocated at the same time for the same queue. Worst, you
cannot leave even 1 buffer as your scannout buffer while reallocating
new buffers, this is not permitted by the framework (in software). As a
side effect, there is no way to optimize the resolution changes, you
even have to copy your scannout buffer on the CPU, to free it in order
to proceed. Resolution changes are thus painfully slow, by design.

You also cannot switch from internal buffers to importing buffers
easily (in some case you, like encoder, you cannot do that without
flushing the encoder state).

> 
> It is not the intention to come to a full design, it's more to test the waters
> so to speak.
> 
> Regards,
> 
>   Hans


signature.asc
Description: This is a digitally signed message part