Re: [RFC 00/10] V4L2 explicit synchronization support
On 06/09/2017 12:25 AM, Gustavo Padovan wrote: > 2017-06-08 Shuah Khan: > >> Hi Gustavo, >> >> On Thu, Jun 8, 2017 at 2:17 PM, Mauro Carvalho Chehab >> wrote: >>> Hi Gustavo, >>> >>> Em Wed, 24 May 2017 21:31:01 -0300 >>> Gustavo Padovan escreveu: >>> Hi all, I've been working on the v2 of this series, but I think I hit a blocker when trying to cover the case where the driver asks to requeue the buffer. It is related to the out-fence side. In the current implementation we return on QBUF an out-fence fd that is not tied to any buffer, because we don't know the queueing order until the buffer is queued to the driver. Then when the buffer is queued we use the BUF_QUEUED event to notify userspace of the index of the buffer, so now userspace knows the buffer associated to the out-fence fd received earlier. Userspace goes ahead and send a DRM Atomic Request to the kernel to display that buffer on the screen once the fence signals. If it is a nonblocking request the fence waiting is past the check phase, thus it isn't allowed to fail anymore. But now, what happens if the V4L2 driver calls buffer_done() asking to requeue the buffer. That means the operation failed and can't signal the fence, starving the DRM side. We need to fix that. The only way I can see is to guarantee ordering of buffers when out-fences are used. Ordering is something that HAL3 needs to so maybe there is more than one reason to do it like this. I'm not a V4L2 expert, so I don't know all the consequences of such a change. Any other ideas? The current patchset is at: https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences >> >> Do you plan to send the v2 out? I did a quick review and have a few comments. >> >> [media] vb2: split out queueing from vb_core_qbuf() >> >> It changes the sequence a bit. >> >> /* Fill buffer information for the userspace */ >> if (pb) >> call_void_bufop(q, fill_user_buffer, vb, pb); >> >> With the changes - user information is filled before __enqueue_in_driver(vb); > > Without my changes it also fills it before __enqueue_in_driver() when > start_streaming wasn't called yet. So I don't think it really matters. Right, with this change, it fills the buffer before __enqueue_in_driver() when start_streaming is called. Is that an issue, I don't know for sure. It might not be necessary perhaps if buffer is filled in the path when stream is already called. > >> >> Anyway, it might be a good idea to send the v2 out for review and we can >> review >> patches in detail. I am hoping to test your patch series on odroid-xu4 >> next week. >> Could you please add me to the thread as well as include me when you send >> v2 and subsequent versions. > > I will send a v2 as soon as I can, but from Thursday next week until > the 25th I'll be on vacation. okay sounds good. thanks, -- Shuah
Re: [RFC 00/10] V4L2 explicit synchronization support
Le lundi 03 avril 2017 à 15:46 -0400, Javier Martinez Canillas a écrit : > > The problem is that adding implicit fences changed the behavior of > > the ioctls, causing gstreamer to wait forever for buffers to be ready. > > > > The problem was related to trying to make user-space unaware of the implicit > fences support, and so it tried to QBUF a buffer that had already a pending > fence. A workaround was to block the second QBUF ioctl if the buffer had a > pending fence, but this caused the mentioned deadlock since GStreamer wasn't > expecting the QBUF ioctl to block. That QBUF may block isn't a problem, but modern userspace application, not just GStreamer, need "cancellable" operations. This is achieved by avoiding blocking call that cannot be interrupted. What is usually done, is that we poll the device FD to determine when it is safe to call QBUF in a way that it will return in a finit amount of time. For the implicit fence, it could not work, since the driver is not yet aware of the fence, hence cannot use it to delay the poll operation. Though, it's not clear why it couldn't wait asynchronously like this RFC is doing with the explicit fences. In the current RFC, the fences are signalled through a callback, and QBUF is split in half. So waiting on the fence is done elsewhere, and the qbuf operation completes on the fence callback thread. Nicolas signature.asc Description: This is a digitally signed message part
Re: [RFC 00/10] V4L2 explicit synchronization support
2017-06-08 Shuah Khan: > Hi Gustavo, > > On Thu, Jun 8, 2017 at 2:17 PM, Mauro Carvalho Chehab > wrote: > > Hi Gustavo, > > > > Em Wed, 24 May 2017 21:31:01 -0300 > > Gustavo Padovan escreveu: > > > >> Hi all, > >> > >> I've been working on the v2 of this series, but I think I hit a blocker > >> when trying to cover the case where the driver asks to requeue the > >> buffer. It is related to the out-fence side. > >> > >> In the current implementation we return on QBUF an out-fence fd that is not > >> tied to any buffer, because we don't know the queueing order until the > >> buffer is queued to the driver. Then when the buffer is queued we use > >> the BUF_QUEUED event to notify userspace of the index of the buffer, > >> so now userspace knows the buffer associated to the out-fence fd > >> received earlier. > >> > >> Userspace goes ahead and send a DRM Atomic Request to the kernel to > >> display that buffer on the screen once the fence signals. If it is > >> a nonblocking request the fence waiting is past the check phase, thus > >> it isn't allowed to fail anymore. > >> > >> But now, what happens if the V4L2 driver calls buffer_done() asking > >> to requeue the buffer. That means the operation failed and can't > >> signal the fence, starving the DRM side. > >> > >> We need to fix that. The only way I can see is to guarantee ordering of > >> buffers when out-fences are used. Ordering is something that HAL3 needs > >> to so maybe there is more than one reason to do it like this. I'm not > >> a V4L2 expert, so I don't know all the consequences of such a change. > >> > >> Any other ideas? > >> > >> The current patchset is at: > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences > > Do you plan to send the v2 out? I did a quick review and have a few comments. > > [media] vb2: split out queueing from vb_core_qbuf() > > It changes the sequence a bit. > > /* Fill buffer information for the userspace */ > if (pb) > call_void_bufop(q, fill_user_buffer, vb, pb); > > With the changes - user information is filled before __enqueue_in_driver(vb); Without my changes it also fills it before __enqueue_in_driver() when start_streaming wasn't called yet. So I don't think it really matters. > > Anyway, it might be a good idea to send the v2 out for review and we can > review > patches in detail. I am hoping to test your patch series on odroid-xu4 > next week. > Could you please add me to the thread as well as include me when you send > v2 and subsequent versions. I will send a v2 as soon as I can, but from Thursday next week until the 25th I'll be on vacation. Gustavo
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Mauro, 2017-06-08 Mauro Carvalho Chehab: > Hi Gustavo, > > Em Wed, 24 May 2017 21:31:01 -0300 > Gustavo Padovan escreveu: > > > Hi all, > > > > I've been working on the v2 of this series, but I think I hit a blocker > > when trying to cover the case where the driver asks to requeue the > > buffer. It is related to the out-fence side. > > > > In the current implementation we return on QBUF an out-fence fd that is not > > tied to any buffer, because we don't know the queueing order until the > > buffer is queued to the driver. Then when the buffer is queued we use > > the BUF_QUEUED event to notify userspace of the index of the buffer, > > so now userspace knows the buffer associated to the out-fence fd > > received earlier. > > > > Userspace goes ahead and send a DRM Atomic Request to the kernel to > > display that buffer on the screen once the fence signals. If it is > > a nonblocking request the fence waiting is past the check phase, thus > > it isn't allowed to fail anymore. > > > > But now, what happens if the V4L2 driver calls buffer_done() asking > > to requeue the buffer. That means the operation failed and can't > > signal the fence, starving the DRM side. > > > > We need to fix that. The only way I can see is to guarantee ordering of > > buffers when out-fences are used. Ordering is something that HAL3 needs > > to so maybe there is more than one reason to do it like this. I'm not > > a V4L2 expert, so I don't know all the consequences of such a change. > > > > Any other ideas? > > > > The current patchset is at: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences > > Currently, nothing warrants that buffers will be returned in order, > but that should be the case of most drivers. I guess the main > exception would be mem2mem codec drivers. On those, the driver > or the hardware may opt to reorder the buffers. > > If this is a mandatory requirement for explicit fences to work, then > we'll need to be able to explicitly enable it, per driver, and > clearly document that drivers using it *should* warrant that the > dequeued buffer will follow the queued order. It is mandatory in the sense it won't work properly and make DRM fail an atomic commit if a buffer is requeued. So it is fair to ask drivers to guarantee ordering is a good thing. Then later we can deal with the drivers that can't. > > We may need to modify VB2 in order to enforce it or return an > error if the order doesn't match. Yeah, I'll look into how to do this. Gustavo
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Gustavo, On Thu, Jun 8, 2017 at 2:17 PM, Mauro Carvalho Chehabwrote: > Hi Gustavo, > > Em Wed, 24 May 2017 21:31:01 -0300 > Gustavo Padovan escreveu: > >> Hi all, >> >> I've been working on the v2 of this series, but I think I hit a blocker >> when trying to cover the case where the driver asks to requeue the >> buffer. It is related to the out-fence side. >> >> In the current implementation we return on QBUF an out-fence fd that is not >> tied to any buffer, because we don't know the queueing order until the >> buffer is queued to the driver. Then when the buffer is queued we use >> the BUF_QUEUED event to notify userspace of the index of the buffer, >> so now userspace knows the buffer associated to the out-fence fd >> received earlier. >> >> Userspace goes ahead and send a DRM Atomic Request to the kernel to >> display that buffer on the screen once the fence signals. If it is >> a nonblocking request the fence waiting is past the check phase, thus >> it isn't allowed to fail anymore. >> >> But now, what happens if the V4L2 driver calls buffer_done() asking >> to requeue the buffer. That means the operation failed and can't >> signal the fence, starving the DRM side. >> >> We need to fix that. The only way I can see is to guarantee ordering of >> buffers when out-fences are used. Ordering is something that HAL3 needs >> to so maybe there is more than one reason to do it like this. I'm not >> a V4L2 expert, so I don't know all the consequences of such a change. >> >> Any other ideas? >> >> The current patchset is at: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences Do you plan to send the v2 out? I did a quick review and have a few comments. [media] vb2: split out queueing from vb_core_qbuf() It changes the sequence a bit. /* Fill buffer information for the userspace */ if (pb) call_void_bufop(q, fill_user_buffer, vb, pb); With the changes - user information is filled before __enqueue_in_driver(vb); Anyway, it might be a good idea to send the v2 out for review and we can review patches in detail. I am hoping to test your patch series on odroid-xu4 next week. Could you please add me to the thread as well as include me when you send v2 and subsequent versions. thanks, -- Shuah
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Gustavo, Em Wed, 24 May 2017 21:31:01 -0300 Gustavo Padovanescreveu: > Hi all, > > I've been working on the v2 of this series, but I think I hit a blocker > when trying to cover the case where the driver asks to requeue the > buffer. It is related to the out-fence side. > > In the current implementation we return on QBUF an out-fence fd that is not > tied to any buffer, because we don't know the queueing order until the > buffer is queued to the driver. Then when the buffer is queued we use > the BUF_QUEUED event to notify userspace of the index of the buffer, > so now userspace knows the buffer associated to the out-fence fd > received earlier. > > Userspace goes ahead and send a DRM Atomic Request to the kernel to > display that buffer on the screen once the fence signals. If it is > a nonblocking request the fence waiting is past the check phase, thus > it isn't allowed to fail anymore. > > But now, what happens if the V4L2 driver calls buffer_done() asking > to requeue the buffer. That means the operation failed and can't > signal the fence, starving the DRM side. > > We need to fix that. The only way I can see is to guarantee ordering of > buffers when out-fences are used. Ordering is something that HAL3 needs > to so maybe there is more than one reason to do it like this. I'm not > a V4L2 expert, so I don't know all the consequences of such a change. > > Any other ideas? > > The current patchset is at: > > https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences Currently, nothing warrants that buffers will be returned in order, but that should be the case of most drivers. I guess the main exception would be mem2mem codec drivers. On those, the driver or the hardware may opt to reorder the buffers. If this is a mandatory requirement for explicit fences to work, then we'll need to be able to explicitly enable it, per driver, and clearly document that drivers using it *should* warrant that the dequeued buffer will follow the queued order. We may need to modify VB2 in order to enforce it or return an error if the order doesn't match. -- Thanks, Mauro
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi all, I've been working on the v2 of this series, but I think I hit a blocker when trying to cover the case where the driver asks to requeue the buffer. It is related to the out-fence side. In the current implementation we return on QBUF an out-fence fd that is not tied to any buffer, because we don't know the queueing order until the buffer is queued to the driver. Then when the buffer is queued we use the BUF_QUEUED event to notify userspace of the index of the buffer, so now userspace knows the buffer associated to the out-fence fd received earlier. Userspace goes ahead and send a DRM Atomic Request to the kernel to display that buffer on the screen once the fence signals. If it is a nonblocking request the fence waiting is past the check phase, thus it isn't allowed to fail anymore. But now, what happens if the V4L2 driver calls buffer_done() asking to requeue the buffer. That means the operation failed and can't signal the fence, starving the DRM side. We need to fix that. The only way I can see is to guarantee ordering of buffers when out-fences are used. Ordering is something that HAL3 needs to so maybe there is more than one reason to do it like this. I'm not a V4L2 expert, so I don't know all the consequences of such a change. Any other ideas? The current patchset is at: https://git.kernel.org/pub/scm/linux/kernel/git/padovan/linux.git/log/?h=v4l2-fences Regards, Gustavo 2017-03-13 Gustavo Padovan: > From: Gustavo Padovan > > Hi, > > This RFC adds support for Explicit Synchronization of shared buffers in V4L2. > It uses the Sync File Framework[1] as vector to communicate the fences > between kernel and userspace. > > I'm sending this to start the discussion on the best approach to implement > Explicit Synchronization, please check the TODO/OPEN section below. > > Explicit Synchronization allows us to control the synchronization of > shared buffers from userspace by passing fences to the kernel and/or > receiving them from the the kernel. > > Fences passed to the kernel are named in-fences and the kernel should wait > them to signal before using the buffer. On the other side, the kernel creates > out-fences for every buffer it receives from userspace. This fence is sent > back > to userspace and it will signal when the capture, for example, has finished. > > Signalling an out-fence in V4L2 would mean that the job on the buffer is done > and the buffer can be used by other drivers. > > Current RFC implementation > -- > > The current implementation is not intended to be more than a PoC to start > the discussion on how Explicit Synchronization should be supported in V4L2. > > The first patch proposes an userspace API for fences, then on patch 2 > we prepare to the addition of in-fences in patch 3, by introducing the > infrastructure on vb2 to wait on an in-fence signal before queueing the buffer > in the driver. > > Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid > drivers to enable to subscribe and dequeue events on it. > > Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace > know that particular buffer was enqueued in the driver. This is needed, > because we return the out-fence fd as an out argument in QBUF, but at the time > it returns we don't know to which buffer the fence will be attached thus > the BUF_QUEUED event tells which buffer is associated to the fence received in > QBUF by userspace. > > Patches 8 and 9 add more fence infrastructure to support out-fences and > finally > patch 10 adds support to out-fences. > > TODO/OPEN: > -- > > * For this first implementation we will keep the ordering of the buffers > queued > in videobuf2, that means we will only enqueue buffer whose fence was signalled > if that buffer is the first one in the queue. Otherwise it has to wait until > it > is the first one. This is not implmented yet. Later we could create a flag to > allow unordered queing in the drivers from vb2 if needed. > > * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for > simplicity they are per-buffer, but Mauro and Javier raised the option of > doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation > at least on cases when we have Capture hw that releases the Y frame before the > other frames for example. When using V4L2 per-plane out-fences to communicate > with KMS they would need to be merged together as currently the DRM Plane > interface only supports one fence per DRM Plane. > > In-fences should be per-buffer as the DRM only has per-buffer fences, but > in case of mem2mem operations per-plane fences might be useful? > > So should we have both ways, per-plane and per-buffer, or just one of them > for now? > > * other open topics are how to deal with hw-fences and Request API. > > Comments are welcome! > > Regards, > > Gustavo > > --- > Gustavo Padovan (9):
Re: [RFC 00/10] V4L2 explicit synchronization support
Hello Gustavo, On 04/06/2017 10:08 AM, Gustavo Padovan wrote: > Hi Javier, > > 2017-04-05 Javier Martinez Canillas: > >> Hello Gustavo, >> >> On 04/05/2017 11:09 AM, Gustavo Padovan wrote: >>> 2017-04-03 Javier Martinez Canillas : >>> Hello Mauro and Gustavo, On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote: > Hi Gustavo, > > Em Mon, 13 Mar 2017 16:20:25 -0300 > Gustavo Padovan escreveu: > >> From: Gustavo Padovan >> >> Hi, >> >> This RFC adds support for Explicit Synchronization of shared buffers in >> V4L2. >> It uses the Sync File Framework[1] as vector to communicate the fences >> between kernel and userspace. > > Thanks for your work! > > I looked on your patchset, and I didn't notice anything really weird > there. So, instead on reviewing patch per patch, I prefer to discuss > about the requirements and API, as depending on it, the code base will > change a lot. > Agree that's better to first set on an uAPI and then implement based on that. > I'd like to do some tests with it on devices with mem2mem drivers. > My plan is to use an Exynos board for such thing, but I guess that > the DRM driver for it currently doesn't. I'm seeing internally if someone > could be sure that Exynos driver upstream will become ready for such > tests. > Not sure if you should try to do testing before agreeing on an uAPI and implementation. > Javier wrote some patches last year meant to implement implicit > fences support. What we noticed is that, while his mechanism worked > fine for pure capture and pure output devices, when we added a mem2mem > device, on a DMABUF+fences pipeline, e. g.: > > sensor -> [m2m] -> DRM > > End everything using fences/DMABUF, the fences mechanism caused dead > locks on existing userspace apps. > > A m2m device has both capture and output devnodes. Both should be > queued/dequeued. The capture queue is synchronized internally at the > driver with the output buffer[1]. > > [1] The names here are counter-intuitive: "capture" is a devnode > where userspace receives a video stream; "output" is a devnode where > userspace feeds a video stream. > > The problem is that adding implicit fences changed the behavior of > the ioctls, causing gstreamer to wait forever for buffers to be ready. > The problem was related to trying to make user-space unaware of the implicit fences support, and so it tried to QBUF a buffer that had already a pending fence. A workaround was to block the second QBUF ioctl if the buffer had a pending fence, but this caused the mentioned deadlock since GStreamer wasn't expecting the QBUF ioctl to block. > I suspect that, even with explicit fences, the behavior of Q/DQ > will be incompatible with the current behavior (or will require some > dirty hacks to make it identical). >>> >>> For QBUF the only difference is that we set flags for fences and pass >>> and receives in and out fences. For DQBUF the behavior is exactly the >>> same. What incompatibles or hacks do you see? >>> >>> I had the expectation that the flags would be for userspace to learn >>> about any different behavior. >>> > > So, IMHO, the best would be to use a new set of ioctls, when fences are > used (like VIDIOC_QFENCE/DQFENCE). > For explicit you can check if there's an input-fence so is different than implicit, but still I agree that it would be better to have specific ioctls. >>> >>> I'm pretty new to v4l2 so I don't know all use cases yet, but what I >>> thought was to just add extra flags to QBUF to mark when using fences >>> instead of having userspace to setup completely new ioctls for fences. >>> The burden for userspace should be smaller with flags. >>> >> >> Yes, you are right. So I guess its better indeed to just extend the current >> ioctls as you propose and only move to new ones if really needed. >> >> >> I'm sending this to start the discussion on the best approach to >> implement >> Explicit Synchronization, please check the TODO/OPEN section below. >> >> Explicit Synchronization allows us to control the synchronization of >> shared buffers from userspace by passing fences to the kernel and/or >> receiving them from the the kernel. >> >> Fences passed to the kernel are named in-fences and the kernel should >> wait >> them to signal before using the buffer. On the other side, the kernel >> creates >> out-fences for every buffer it receives from userspace. This fence is >> sent back >> to userspace and it will signal when the capture, for example, has >> finished. >>
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Javier, 2017-04-05 Javier Martinez Canillas: > Hello Gustavo, > > On 04/05/2017 11:09 AM, Gustavo Padovan wrote: > > 2017-04-03 Javier Martinez Canillas : > > > >> Hello Mauro and Gustavo, > >> > >> On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote: > >>> Hi Gustavo, > >>> > >>> Em Mon, 13 Mar 2017 16:20:25 -0300 > >>> Gustavo Padovan escreveu: > >>> > From: Gustavo Padovan > > Hi, > > This RFC adds support for Explicit Synchronization of shared buffers in > V4L2. > It uses the Sync File Framework[1] as vector to communicate the fences > between kernel and userspace. > >>> > >>> Thanks for your work! > >>> > >>> I looked on your patchset, and I didn't notice anything really weird > >>> there. So, instead on reviewing patch per patch, I prefer to discuss > >>> about the requirements and API, as depending on it, the code base will > >>> change a lot. > >>> > >> > >> Agree that's better to first set on an uAPI and then implement based on > >> that. > >> > >>> I'd like to do some tests with it on devices with mem2mem drivers. > >>> My plan is to use an Exynos board for such thing, but I guess that > >>> the DRM driver for it currently doesn't. I'm seeing internally if someone > >>> could be sure that Exynos driver upstream will become ready for such > >>> tests. > >>> > >> > >> Not sure if you should try to do testing before agreeing on an uAPI and > >> implementation. > >> > >>> Javier wrote some patches last year meant to implement implicit > >>> fences support. What we noticed is that, while his mechanism worked > >>> fine for pure capture and pure output devices, when we added a mem2mem > >>> device, on a DMABUF+fences pipeline, e. g.: > >>> > >>> sensor -> [m2m] -> DRM > >>> > >>> End everything using fences/DMABUF, the fences mechanism caused dead > >>> locks on existing userspace apps. > >>> > >>> A m2m device has both capture and output devnodes. Both should be > >>> queued/dequeued. The capture queue is synchronized internally at the > >>> driver with the output buffer[1]. > >>> > >>> [1] The names here are counter-intuitive: "capture" is a devnode > >>> where userspace receives a video stream; "output" is a devnode where > >>> userspace feeds a video stream. > >>> > >>> The problem is that adding implicit fences changed the behavior of > >>> the ioctls, causing gstreamer to wait forever for buffers to be ready. > >>> > >> > >> The problem was related to trying to make user-space unaware of the > >> implicit > >> fences support, and so it tried to QBUF a buffer that had already a pending > >> fence. A workaround was to block the second QBUF ioctl if the buffer had a > >> pending fence, but this caused the mentioned deadlock since GStreamer > >> wasn't > >> expecting the QBUF ioctl to block. > >> > >>> I suspect that, even with explicit fences, the behavior of Q/DQ > >>> will be incompatible with the current behavior (or will require some > >>> dirty hacks to make it identical). > > > > For QBUF the only difference is that we set flags for fences and pass > > and receives in and out fences. For DQBUF the behavior is exactly the > > same. What incompatibles or hacks do you see? > > > > I had the expectation that the flags would be for userspace to learn > > about any different behavior. > > > >>> > >>> So, IMHO, the best would be to use a new set of ioctls, when fences are > >>> used (like VIDIOC_QFENCE/DQFENCE). > >>> > >> > >> For explicit you can check if there's an input-fence so is different than > >> implicit, but still I agree that it would be better to have specific > >> ioctls. > > > > I'm pretty new to v4l2 so I don't know all use cases yet, but what I > > thought was to just add extra flags to QBUF to mark when using fences > > instead of having userspace to setup completely new ioctls for fences. > > The burden for userspace should be smaller with flags. > > > > Yes, you are right. So I guess its better indeed to just extend the current > ioctls as you propose and only move to new ones if really needed. > > >> > > I'm sending this to start the discussion on the best approach to > implement > Explicit Synchronization, please check the TODO/OPEN section below. > > Explicit Synchronization allows us to control the synchronization of > shared buffers from userspace by passing fences to the kernel and/or > receiving them from the the kernel. > > Fences passed to the kernel are named in-fences and the kernel should > wait > them to signal before using the buffer. On the other side, the kernel > creates > out-fences for every buffer it receives from userspace. This fence is > sent back > to userspace and it will signal when the capture, for example, has > finished. > > Signalling an out-fence in V4L2 would mean that the job on the
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Gustavo, On Wed, Apr 05, 2017 at 05:24:57PM +0200, Gustavo Padovan wrote: > Hi Sakari, > > 2017-04-04 Sakari Ailus: > > > Hi Gustavo, > > > > Thank you for the patchset. Please see my comments below. > > > > On Mon, Mar 13, 2017 at 04:20:25PM -0300, Gustavo Padovan wrote: > > > From: Gustavo Padovan > > > > > > Hi, > > > > > > This RFC adds support for Explicit Synchronization of shared buffers in > > > V4L2. > > > It uses the Sync File Framework[1] as vector to communicate the fences > > > between kernel and userspace. > > > > > > I'm sending this to start the discussion on the best approach to implement > > > Explicit Synchronization, please check the TODO/OPEN section below. > > > > > > Explicit Synchronization allows us to control the synchronization of > > > shared buffers from userspace by passing fences to the kernel and/or > > > receiving them from the the kernel. > > > > > > Fences passed to the kernel are named in-fences and the kernel should wait > > > them to signal before using the buffer. On the other side, the kernel > > > creates > > > out-fences for every buffer it receives from userspace. This fence is > > > sent back > > > to userspace and it will signal when the capture, for example, has > > > finished. > > > > > > Signalling an out-fence in V4L2 would mean that the job on the buffer is > > > done > > > and the buffer can be used by other drivers. > > > > Shouldn't you be able to add two fences to the buffer, one in and one out? > > I.e. you'd have the buffer passed from another device to a V4L2 device and > > on to a third device. > > > > (Or, two fences per a plane, as you elaborated below.) > > The out one should be created by V4L2 in this case, sent to userspace > and then sent to third device. Another options is what we've been > calling future fences in DRM. Where we may have a syscall to create this > out-fence for us and then we could pass both in and out fence to the > device. But that can be supported later along with what this RFC > proposes. Please excuse my ignorance on fences. I just wanted to make sure that case was also considered. struct v4l2_buffer will run out of space soon so we'll need a replacement anyway. The timecode field is still available for re-use... > > > > > > > > > > Current RFC implementation > > > -- > > > > > > The current implementation is not intended to be more than a PoC to start > > > the discussion on how Explicit Synchronization should be supported in > > > V4L2. > > > > > > The first patch proposes an userspace API for fences, then on patch 2 > > > we prepare to the addition of in-fences in patch 3, by introducing the > > > infrastructure on vb2 to wait on an in-fence signal before queueing the > > > buffer > > > in the driver. > > > > > > Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid > > > drivers to enable to subscribe and dequeue events on it. > > > > > > Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let > > > userspace > > > know that particular buffer was enqueued in the driver. This is needed, > > > because we return the out-fence fd as an out argument in QBUF, but at the > > > time > > > it returns we don't know to which buffer the fence will be attached thus > > > the BUF_QUEUED event tells which buffer is associated to the fence > > > received in > > > QBUF by userspace. > > > > > > Patches 8 and 9 add more fence infrastructure to support out-fences and > > > finally > > > patch 10 adds support to out-fences. > > > > > > TODO/OPEN: > > > -- > > > > > > * For this first implementation we will keep the ordering of the buffers > > > queued > > > in videobuf2, that means we will only enqueue buffer whose fence was > > > signalled > > > if that buffer is the first one in the queue. Otherwise it has to wait > > > until it > > > is the first one. This is not implmented yet. Later we could create a > > > flag to > > > allow unordered queing in the drivers from vb2 if needed. > > > > > > * Should we have out-fences per-buffer or per-plane? or both? In this > > > RFC, for > > > simplicity they are per-buffer, but Mauro and Javier raised the option of > > > doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU > > > operation > > > at least on cases when we have Capture hw that releases the Y frame > > > before the > > > other frames for example. When using V4L2 per-plane out-fences to > > > communicate > > > with KMS they would need to be merged together as currently the DRM Plane > > > interface only supports one fence per DRM Plane. > > > > > > In-fences should be per-buffer as the DRM only has per-buffer fences, but > > > in case of mem2mem operations per-plane fences might be useful? > > > > > > So should we have both ways, per-plane and per-buffer, or just one of them > > > for now? > > > > The data_offset field is only present in struct
Re: [RFC 00/10] V4L2 explicit synchronization support
Hello Gustavo, On 04/05/2017 11:09 AM, Gustavo Padovan wrote: > 2017-04-03 Javier Martinez Canillas: > >> Hello Mauro and Gustavo, >> >> On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote: >>> Hi Gustavo, >>> >>> Em Mon, 13 Mar 2017 16:20:25 -0300 >>> Gustavo Padovan escreveu: >>> From: Gustavo Padovan Hi, This RFC adds support for Explicit Synchronization of shared buffers in V4L2. It uses the Sync File Framework[1] as vector to communicate the fences between kernel and userspace. >>> >>> Thanks for your work! >>> >>> I looked on your patchset, and I didn't notice anything really weird >>> there. So, instead on reviewing patch per patch, I prefer to discuss >>> about the requirements and API, as depending on it, the code base will >>> change a lot. >>> >> >> Agree that's better to first set on an uAPI and then implement based on that. >> >>> I'd like to do some tests with it on devices with mem2mem drivers. >>> My plan is to use an Exynos board for such thing, but I guess that >>> the DRM driver for it currently doesn't. I'm seeing internally if someone >>> could be sure that Exynos driver upstream will become ready for such >>> tests. >>> >> >> Not sure if you should try to do testing before agreeing on an uAPI and >> implementation. >> >>> Javier wrote some patches last year meant to implement implicit >>> fences support. What we noticed is that, while his mechanism worked >>> fine for pure capture and pure output devices, when we added a mem2mem >>> device, on a DMABUF+fences pipeline, e. g.: >>> >>> sensor -> [m2m] -> DRM >>> >>> End everything using fences/DMABUF, the fences mechanism caused dead >>> locks on existing userspace apps. >>> >>> A m2m device has both capture and output devnodes. Both should be >>> queued/dequeued. The capture queue is synchronized internally at the >>> driver with the output buffer[1]. >>> >>> [1] The names here are counter-intuitive: "capture" is a devnode >>> where userspace receives a video stream; "output" is a devnode where >>> userspace feeds a video stream. >>> >>> The problem is that adding implicit fences changed the behavior of >>> the ioctls, causing gstreamer to wait forever for buffers to be ready. >>> >> >> The problem was related to trying to make user-space unaware of the implicit >> fences support, and so it tried to QBUF a buffer that had already a pending >> fence. A workaround was to block the second QBUF ioctl if the buffer had a >> pending fence, but this caused the mentioned deadlock since GStreamer wasn't >> expecting the QBUF ioctl to block. >> >>> I suspect that, even with explicit fences, the behavior of Q/DQ >>> will be incompatible with the current behavior (or will require some >>> dirty hacks to make it identical). > > For QBUF the only difference is that we set flags for fences and pass > and receives in and out fences. For DQBUF the behavior is exactly the > same. What incompatibles or hacks do you see? > > I had the expectation that the flags would be for userspace to learn > about any different behavior. > >>> >>> So, IMHO, the best would be to use a new set of ioctls, when fences are >>> used (like VIDIOC_QFENCE/DQFENCE). >>> >> >> For explicit you can check if there's an input-fence so is different than >> implicit, but still I agree that it would be better to have specific ioctls. > > I'm pretty new to v4l2 so I don't know all use cases yet, but what I > thought was to just add extra flags to QBUF to mark when using fences > instead of having userspace to setup completely new ioctls for fences. > The burden for userspace should be smaller with flags. > Yes, you are right. So I guess its better indeed to just extend the current ioctls as you propose and only move to new ones if really needed. >> I'm sending this to start the discussion on the best approach to implement Explicit Synchronization, please check the TODO/OPEN section below. Explicit Synchronization allows us to control the synchronization of shared buffers from userspace by passing fences to the kernel and/or receiving them from the the kernel. Fences passed to the kernel are named in-fences and the kernel should wait them to signal before using the buffer. On the other side, the kernel creates out-fences for every buffer it receives from userspace. This fence is sent back to userspace and it will signal when the capture, for example, has finished. Signalling an out-fence in V4L2 would mean that the job on the buffer is done and the buffer can be used by other drivers. Current RFC implementation -- The current implementation is not intended to be more than a PoC to start the discussion on how Explicit Synchronization should be supported in V4L2. The first patch proposes an
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Sakari, 2017-04-04 Sakari Ailus: > Hi Gustavo, > > Thank you for the patchset. Please see my comments below. > > On Mon, Mar 13, 2017 at 04:20:25PM -0300, Gustavo Padovan wrote: > > From: Gustavo Padovan > > > > Hi, > > > > This RFC adds support for Explicit Synchronization of shared buffers in > > V4L2. > > It uses the Sync File Framework[1] as vector to communicate the fences > > between kernel and userspace. > > > > I'm sending this to start the discussion on the best approach to implement > > Explicit Synchronization, please check the TODO/OPEN section below. > > > > Explicit Synchronization allows us to control the synchronization of > > shared buffers from userspace by passing fences to the kernel and/or > > receiving them from the the kernel. > > > > Fences passed to the kernel are named in-fences and the kernel should wait > > them to signal before using the buffer. On the other side, the kernel > > creates > > out-fences for every buffer it receives from userspace. This fence is sent > > back > > to userspace and it will signal when the capture, for example, has finished. > > > > Signalling an out-fence in V4L2 would mean that the job on the buffer is > > done > > and the buffer can be used by other drivers. > > Shouldn't you be able to add two fences to the buffer, one in and one out? > I.e. you'd have the buffer passed from another device to a V4L2 device and > on to a third device. > > (Or, two fences per a plane, as you elaborated below.) The out one should be created by V4L2 in this case, sent to userspace and then sent to third device. Another options is what we've been calling future fences in DRM. Where we may have a syscall to create this out-fence for us and then we could pass both in and out fence to the device. But that can be supported later along with what this RFC proposes. > > > > > Current RFC implementation > > -- > > > > The current implementation is not intended to be more than a PoC to start > > the discussion on how Explicit Synchronization should be supported in V4L2. > > > > The first patch proposes an userspace API for fences, then on patch 2 > > we prepare to the addition of in-fences in patch 3, by introducing the > > infrastructure on vb2 to wait on an in-fence signal before queueing the > > buffer > > in the driver. > > > > Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid > > drivers to enable to subscribe and dequeue events on it. > > > > Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace > > know that particular buffer was enqueued in the driver. This is needed, > > because we return the out-fence fd as an out argument in QBUF, but at the > > time > > it returns we don't know to which buffer the fence will be attached thus > > the BUF_QUEUED event tells which buffer is associated to the fence received > > in > > QBUF by userspace. > > > > Patches 8 and 9 add more fence infrastructure to support out-fences and > > finally > > patch 10 adds support to out-fences. > > > > TODO/OPEN: > > -- > > > > * For this first implementation we will keep the ordering of the buffers > > queued > > in videobuf2, that means we will only enqueue buffer whose fence was > > signalled > > if that buffer is the first one in the queue. Otherwise it has to wait > > until it > > is the first one. This is not implmented yet. Later we could create a flag > > to > > allow unordered queing in the drivers from vb2 if needed. > > > > * Should we have out-fences per-buffer or per-plane? or both? In this RFC, > > for > > simplicity they are per-buffer, but Mauro and Javier raised the option of > > doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU > > operation > > at least on cases when we have Capture hw that releases the Y frame before > > the > > other frames for example. When using V4L2 per-plane out-fences to > > communicate > > with KMS they would need to be merged together as currently the DRM Plane > > interface only supports one fence per DRM Plane. > > > > In-fences should be per-buffer as the DRM only has per-buffer fences, but > > in case of mem2mem operations per-plane fences might be useful? > > > > So should we have both ways, per-plane and per-buffer, or just one of them > > for now? > > The data_offset field is only present in struct v4l2_plane, i.e. it is only > available through using the multi-planar API even if you just have a single > plane. I didn't get why you mentioned the data_offset field. :) Gustavo
Re: [RFC 00/10] V4L2 explicit synchronization support
2017-04-03 Javier Martinez Canillas: > Hello Mauro and Gustavo, > > On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote: > > Hi Gustavo, > > > > Em Mon, 13 Mar 2017 16:20:25 -0300 > > Gustavo Padovan escreveu: > > > >> From: Gustavo Padovan > >> > >> Hi, > >> > >> This RFC adds support for Explicit Synchronization of shared buffers in > >> V4L2. > >> It uses the Sync File Framework[1] as vector to communicate the fences > >> between kernel and userspace. > > > > Thanks for your work! > > > > I looked on your patchset, and I didn't notice anything really weird > > there. So, instead on reviewing patch per patch, I prefer to discuss > > about the requirements and API, as depending on it, the code base will > > change a lot. > > > > Agree that's better to first set on an uAPI and then implement based on that. > > > I'd like to do some tests with it on devices with mem2mem drivers. > > My plan is to use an Exynos board for such thing, but I guess that > > the DRM driver for it currently doesn't. I'm seeing internally if someone > > could be sure that Exynos driver upstream will become ready for such > > tests. > > > > Not sure if you should try to do testing before agreeing on an uAPI and > implementation. > > > Javier wrote some patches last year meant to implement implicit > > fences support. What we noticed is that, while his mechanism worked > > fine for pure capture and pure output devices, when we added a mem2mem > > device, on a DMABUF+fences pipeline, e. g.: > > > > sensor -> [m2m] -> DRM > > > > End everything using fences/DMABUF, the fences mechanism caused dead > > locks on existing userspace apps. > > > > A m2m device has both capture and output devnodes. Both should be > > queued/dequeued. The capture queue is synchronized internally at the > > driver with the output buffer[1]. > > > > [1] The names here are counter-intuitive: "capture" is a devnode > > where userspace receives a video stream; "output" is a devnode where > > userspace feeds a video stream. > > > > The problem is that adding implicit fences changed the behavior of > > the ioctls, causing gstreamer to wait forever for buffers to be ready. > > > > The problem was related to trying to make user-space unaware of the implicit > fences support, and so it tried to QBUF a buffer that had already a pending > fence. A workaround was to block the second QBUF ioctl if the buffer had a > pending fence, but this caused the mentioned deadlock since GStreamer wasn't > expecting the QBUF ioctl to block. > > > I suspect that, even with explicit fences, the behavior of Q/DQ > > will be incompatible with the current behavior (or will require some > > dirty hacks to make it identical). For QBUF the only difference is that we set flags for fences and pass and receives in and out fences. For DQBUF the behavior is exactly the same. What incompatibles or hacks do you see? I had the expectation that the flags would be for userspace to learn about any different behavior. > > > > So, IMHO, the best would be to use a new set of ioctls, when fences are > > used (like VIDIOC_QFENCE/DQFENCE). > > > > For explicit you can check if there's an input-fence so is different than > implicit, but still I agree that it would be better to have specific ioctls. I'm pretty new to v4l2 so I don't know all use cases yet, but what I thought was to just add extra flags to QBUF to mark when using fences instead of having userspace to setup completely new ioctls for fences. The burden for userspace should be smaller with flags. > > >> > >> I'm sending this to start the discussion on the best approach to implement > >> Explicit Synchronization, please check the TODO/OPEN section below. > >> > >> Explicit Synchronization allows us to control the synchronization of > >> shared buffers from userspace by passing fences to the kernel and/or > >> receiving them from the the kernel. > >> > >> Fences passed to the kernel are named in-fences and the kernel should wait > >> them to signal before using the buffer. On the other side, the kernel > >> creates > >> out-fences for every buffer it receives from userspace. This fence is sent > >> back > >> to userspace and it will signal when the capture, for example, has > >> finished. > >> > >> Signalling an out-fence in V4L2 would mean that the job on the buffer is > >> done > >> and the buffer can be used by other drivers. > >> > >> Current RFC implementation > >> -- > >> > >> The current implementation is not intended to be more than a PoC to start > >> the discussion on how Explicit Synchronization should be supported in V4L2. > >> > >> The first patch proposes an userspace API for fences, then on patch 2 > >> we prepare to the addition of in-fences in patch 3, by introducing the > >> infrastructure on vb2 to wait on an in-fence signal before queueing the > >> buffer > >> in the driver. > >> > >> Patch 4
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Gustavo, Thank you for the patchset. Please see my comments below. On Mon, Mar 13, 2017 at 04:20:25PM -0300, Gustavo Padovan wrote: > From: Gustavo Padovan> > Hi, > > This RFC adds support for Explicit Synchronization of shared buffers in V4L2. > It uses the Sync File Framework[1] as vector to communicate the fences > between kernel and userspace. > > I'm sending this to start the discussion on the best approach to implement > Explicit Synchronization, please check the TODO/OPEN section below. > > Explicit Synchronization allows us to control the synchronization of > shared buffers from userspace by passing fences to the kernel and/or > receiving them from the the kernel. > > Fences passed to the kernel are named in-fences and the kernel should wait > them to signal before using the buffer. On the other side, the kernel creates > out-fences for every buffer it receives from userspace. This fence is sent > back > to userspace and it will signal when the capture, for example, has finished. > > Signalling an out-fence in V4L2 would mean that the job on the buffer is done > and the buffer can be used by other drivers. Shouldn't you be able to add two fences to the buffer, one in and one out? I.e. you'd have the buffer passed from another device to a V4L2 device and on to a third device. (Or, two fences per a plane, as you elaborated below.) > > Current RFC implementation > -- > > The current implementation is not intended to be more than a PoC to start > the discussion on how Explicit Synchronization should be supported in V4L2. > > The first patch proposes an userspace API for fences, then on patch 2 > we prepare to the addition of in-fences in patch 3, by introducing the > infrastructure on vb2 to wait on an in-fence signal before queueing the buffer > in the driver. > > Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid > drivers to enable to subscribe and dequeue events on it. > > Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace > know that particular buffer was enqueued in the driver. This is needed, > because we return the out-fence fd as an out argument in QBUF, but at the time > it returns we don't know to which buffer the fence will be attached thus > the BUF_QUEUED event tells which buffer is associated to the fence received in > QBUF by userspace. > > Patches 8 and 9 add more fence infrastructure to support out-fences and > finally > patch 10 adds support to out-fences. > > TODO/OPEN: > -- > > * For this first implementation we will keep the ordering of the buffers > queued > in videobuf2, that means we will only enqueue buffer whose fence was signalled > if that buffer is the first one in the queue. Otherwise it has to wait until > it > is the first one. This is not implmented yet. Later we could create a flag to > allow unordered queing in the drivers from vb2 if needed. > > * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for > simplicity they are per-buffer, but Mauro and Javier raised the option of > doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation > at least on cases when we have Capture hw that releases the Y frame before the > other frames for example. When using V4L2 per-plane out-fences to communicate > with KMS they would need to be merged together as currently the DRM Plane > interface only supports one fence per DRM Plane. > > In-fences should be per-buffer as the DRM only has per-buffer fences, but > in case of mem2mem operations per-plane fences might be useful? > > So should we have both ways, per-plane and per-buffer, or just one of them > for now? The data_offset field is only present in struct v4l2_plane, i.e. it is only available through using the multi-planar API even if you just have a single plane. I'd say it'd be appropriate to have a fence per-plane, but I have no strong opinion either way at least at the moment. How otherwise could you make use of this in multi-planar OUTPUT queues? It may seem like a far-fetched use case but I still wouldn't ignore it in design. > > * other open topics are how to deal with hw-fences and Request API. > > Comments are welcome! > > Regards, > > Gustavo > > --- > Gustavo Padovan (9): > [media] vb2: add explicit fence user API > [media] vb2: split out queueing from vb_core_qbuf() > [media] vb2: add in-fence support to QBUF > [media] uvc: enable subscriptions to other events > [media] vivid: assign the specific device to the vb2_queue->dev > [media] v4l: add V4L2_EVENT_BUF_QUEUED event > [media] v4l: add support to BUF_QUEUED event > [media] vb2: add infrastructure to support out-fences > [media] vb2: add out-fence support to QBUF > > Javier Martinez Canillas (1): > [media] vb2: add videobuf2 dma-buf fence helpers > > drivers/media/Kconfig | 1 + > drivers/media/platform/vivid/vivid-core.c
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Gustavo, On Mon, Apr 3, 2017 at 1:46 PM, Javier Martinez Canillaswrote: > Hello Mauro and Gustavo, > > On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote: >> Hi Gustavo, >> >> Em Mon, 13 Mar 2017 16:20:25 -0300 >> Gustavo Padovan escreveu: >> >>> From: Gustavo Padovan >>> >>> Hi, >>> >>> This RFC adds support for Explicit Synchronization of shared buffers in >>> V4L2. >>> It uses the Sync File Framework[1] as vector to communicate the fences >>> between kernel and userspace. >> >> Thanks for your work! >> >> I looked on your patchset, and I didn't notice anything really weird >> there. So, instead on reviewing patch per patch, I prefer to discuss >> about the requirements and API, as depending on it, the code base will >> change a lot. >> > > Agree that's better to first set on an uAPI and then implement based on that. Yes. Agreeing on uAPI first would work well. > >> I'd like to do some tests with it on devices with mem2mem drivers. >> My plan is to use an Exynos board for such thing, but I guess that >> the DRM driver for it currently doesn't. I'm seeing internally if someone >> could be sure that Exynos driver upstream will become ready for such >> tests. >> > > Not sure if you should try to do testing before agreeing on an uAPI and > implementation. Running some tests might give you a better feel for m2m - export buf - drm cases without fences on exynos. This could help understand the performance gains with fences. > >> Javier wrote some patches last year meant to implement implicit >> fences support. What we noticed is that, while his mechanism worked >> fine for pure capture and pure output devices, when we added a mem2mem >> device, on a DMABUF+fences pipeline, e. g.: >> >> sensor -> [m2m] -> DRM >> >> End everything using fences/DMABUF, the fences mechanism caused dead >> locks on existing userspace apps. >> >> A m2m device has both capture and output devnodes. Both should be >> queued/dequeued. The capture queue is synchronized internally at the >> driver with the output buffer[1]. >> >> [1] The names here are counter-intuitive: "capture" is a devnode >> where userspace receives a video stream; "output" is a devnode where >> userspace feeds a video stream. >> >> The problem is that adding implicit fences changed the behavior of >> the ioctls, causing gstreamer to wait forever for buffers to be ready. >> > > The problem was related to trying to make user-space unaware of the implicit > fences support, and so it tried to QBUF a buffer that had already a pending > fence. A workaround was to block the second QBUF ioctl if the buffer had a > pending fence, but this caused the mentioned deadlock since GStreamer wasn't > expecting the QBUF ioctl to block. > >> I suspect that, even with explicit fences, the behavior of Q/DQ >> will be incompatible with the current behavior (or will require some >> dirty hacks to make it identical). One big difference between implicit and explicit is that use-space is aware of fences in the case of explicit. Can that knowledge be helpful in avoiding the the problems we have seen with explicit? >> >> So, IMHO, the best would be to use a new set of ioctls, when fences are >> used (like VIDIOC_QFENCE/DQFENCE). >> > > For explicit you can check if there's an input-fence so is different than > implicit, but still I agree that it would be better to have specific ioctls. > It would be nice if we can avoid adding more ioctls. As I mentioned earlier, user-space is aware that fences are in use. Can we key off of that and make it a queue property to be used to change the user-space action for fence vs. no fence case? thanks, -- Shuah
Re: [RFC 00/10] V4L2 explicit synchronization support
Hello Mauro and Gustavo, On 04/03/2017 07:16 AM, Mauro Carvalho Chehab wrote: > Hi Gustavo, > > Em Mon, 13 Mar 2017 16:20:25 -0300 > Gustavo Padovanescreveu: > >> From: Gustavo Padovan >> >> Hi, >> >> This RFC adds support for Explicit Synchronization of shared buffers in V4L2. >> It uses the Sync File Framework[1] as vector to communicate the fences >> between kernel and userspace. > > Thanks for your work! > > I looked on your patchset, and I didn't notice anything really weird > there. So, instead on reviewing patch per patch, I prefer to discuss > about the requirements and API, as depending on it, the code base will > change a lot. > Agree that's better to first set on an uAPI and then implement based on that. > I'd like to do some tests with it on devices with mem2mem drivers. > My plan is to use an Exynos board for such thing, but I guess that > the DRM driver for it currently doesn't. I'm seeing internally if someone > could be sure that Exynos driver upstream will become ready for such > tests. > Not sure if you should try to do testing before agreeing on an uAPI and implementation. > Javier wrote some patches last year meant to implement implicit > fences support. What we noticed is that, while his mechanism worked > fine for pure capture and pure output devices, when we added a mem2mem > device, on a DMABUF+fences pipeline, e. g.: > > sensor -> [m2m] -> DRM > > End everything using fences/DMABUF, the fences mechanism caused dead > locks on existing userspace apps. > > A m2m device has both capture and output devnodes. Both should be > queued/dequeued. The capture queue is synchronized internally at the > driver with the output buffer[1]. > > [1] The names here are counter-intuitive: "capture" is a devnode > where userspace receives a video stream; "output" is a devnode where > userspace feeds a video stream. > > The problem is that adding implicit fences changed the behavior of > the ioctls, causing gstreamer to wait forever for buffers to be ready. > The problem was related to trying to make user-space unaware of the implicit fences support, and so it tried to QBUF a buffer that had already a pending fence. A workaround was to block the second QBUF ioctl if the buffer had a pending fence, but this caused the mentioned deadlock since GStreamer wasn't expecting the QBUF ioctl to block. > I suspect that, even with explicit fences, the behavior of Q/DQ > will be incompatible with the current behavior (or will require some > dirty hacks to make it identical). > > So, IMHO, the best would be to use a new set of ioctls, when fences are > used (like VIDIOC_QFENCE/DQFENCE). > For explicit you can check if there's an input-fence so is different than implicit, but still I agree that it would be better to have specific ioctls. >> >> I'm sending this to start the discussion on the best approach to implement >> Explicit Synchronization, please check the TODO/OPEN section below. >> >> Explicit Synchronization allows us to control the synchronization of >> shared buffers from userspace by passing fences to the kernel and/or >> receiving them from the the kernel. >> >> Fences passed to the kernel are named in-fences and the kernel should wait >> them to signal before using the buffer. On the other side, the kernel creates >> out-fences for every buffer it receives from userspace. This fence is sent >> back >> to userspace and it will signal when the capture, for example, has finished. >> >> Signalling an out-fence in V4L2 would mean that the job on the buffer is done >> and the buffer can be used by other drivers. >> >> Current RFC implementation >> -- >> >> The current implementation is not intended to be more than a PoC to start >> the discussion on how Explicit Synchronization should be supported in V4L2. >> >> The first patch proposes an userspace API for fences, then on patch 2 >> we prepare to the addition of in-fences in patch 3, by introducing the >> infrastructure on vb2 to wait on an in-fence signal before queueing the >> buffer >> in the driver. >> >> Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid >> drivers to enable to subscribe and dequeue events on it. >> >> Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace >> know that particular buffer was enqueued in the driver. This is needed, >> because we return the out-fence fd as an out argument in QBUF, but at the >> time >> it returns we don't know to which buffer the fence will be attached thus >> the BUF_QUEUED event tells which buffer is associated to the fence received >> in >> QBUF by userspace. >> >> Patches 8 and 9 add more fence infrastructure to support out-fences and >> finally >> patch 10 adds support to out-fences. >> >> TODO/OPEN: >> -- >> >> * For this first implementation we will keep the ordering of the buffers >> queued >> in videobuf2, that means we will only
Re: [RFC 00/10] V4L2 explicit synchronization support
Hi Gustavo, Em Mon, 13 Mar 2017 16:20:25 -0300 Gustavo Padovanescreveu: > From: Gustavo Padovan > > Hi, > > This RFC adds support for Explicit Synchronization of shared buffers in V4L2. > It uses the Sync File Framework[1] as vector to communicate the fences > between kernel and userspace. Thanks for your work! I looked on your patchset, and I didn't notice anything really weird there. So, instead on reviewing patch per patch, I prefer to discuss about the requirements and API, as depending on it, the code base will change a lot. I'd like to do some tests with it on devices with mem2mem drivers. My plan is to use an Exynos board for such thing, but I guess that the DRM driver for it currently doesn't. I'm seeing internally if someone could be sure that Exynos driver upstream will become ready for such tests. Javier wrote some patches last year meant to implement implicit fences support. What we noticed is that, while his mechanism worked fine for pure capture and pure output devices, when we added a mem2mem device, on a DMABUF+fences pipeline, e. g.: sensor -> [m2m] -> DRM End everything using fences/DMABUF, the fences mechanism caused dead locks on existing userspace apps. A m2m device has both capture and output devnodes. Both should be queued/dequeued. The capture queue is synchronized internally at the driver with the output buffer[1]. [1] The names here are counter-intuitive: "capture" is a devnode where userspace receives a video stream; "output" is a devnode where userspace feeds a video stream. The problem is that adding implicit fences changed the behavior of the ioctls, causing gstreamer to wait forever for buffers to be ready. I suspect that, even with explicit fences, the behavior of Q/DQ will be incompatible with the current behavior (or will require some dirty hacks to make it identical). So, IMHO, the best would be to use a new set of ioctls, when fences are used (like VIDIOC_QFENCE/DQFENCE). > > I'm sending this to start the discussion on the best approach to implement > Explicit Synchronization, please check the TODO/OPEN section below. > > Explicit Synchronization allows us to control the synchronization of > shared buffers from userspace by passing fences to the kernel and/or > receiving them from the the kernel. > > Fences passed to the kernel are named in-fences and the kernel should wait > them to signal before using the buffer. On the other side, the kernel creates > out-fences for every buffer it receives from userspace. This fence is sent > back > to userspace and it will signal when the capture, for example, has finished. > > Signalling an out-fence in V4L2 would mean that the job on the buffer is done > and the buffer can be used by other drivers. > > Current RFC implementation > -- > > The current implementation is not intended to be more than a PoC to start > the discussion on how Explicit Synchronization should be supported in V4L2. > > The first patch proposes an userspace API for fences, then on patch 2 > we prepare to the addition of in-fences in patch 3, by introducing the > infrastructure on vb2 to wait on an in-fence signal before queueing the buffer > in the driver. > > Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid > drivers to enable to subscribe and dequeue events on it. > > Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace > know that particular buffer was enqueued in the driver. This is needed, > because we return the out-fence fd as an out argument in QBUF, but at the time > it returns we don't know to which buffer the fence will be attached thus > the BUF_QUEUED event tells which buffer is associated to the fence received in > QBUF by userspace. > > Patches 8 and 9 add more fence infrastructure to support out-fences and > finally > patch 10 adds support to out-fences. > > TODO/OPEN: > -- > > * For this first implementation we will keep the ordering of the buffers > queued > in videobuf2, that means we will only enqueue buffer whose fence was signalled > if that buffer is the first one in the queue. Otherwise it has to wait until > it > is the first one. This is not implmented yet. Later we could create a flag to > allow unordered queing in the drivers from vb2 if needed. The V4L2 spec doesn't warrant that the buffers will be dequeued at the queue order. In practice, however, most drivers will not reorder. Yet, mem2mem codec drivers may reorder the buffers at the output, as the luminance information (Y) usually comes first on JPEG/MPEG-like formats. > * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for > simplicity they are per-buffer, but Mauro and Javier raised the option of > doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation > at least on cases when we have Capture hw that releases the Y frame before the > other frames
[RFC 00/10] V4L2 explicit synchronization support
From: Gustavo PadovanHi, This RFC adds support for Explicit Synchronization of shared buffers in V4L2. It uses the Sync File Framework[1] as vector to communicate the fences between kernel and userspace. I'm sending this to start the discussion on the best approach to implement Explicit Synchronization, please check the TODO/OPEN section below. Explicit Synchronization allows us to control the synchronization of shared buffers from userspace by passing fences to the kernel and/or receiving them from the the kernel. Fences passed to the kernel are named in-fences and the kernel should wait them to signal before using the buffer. On the other side, the kernel creates out-fences for every buffer it receives from userspace. This fence is sent back to userspace and it will signal when the capture, for example, has finished. Signalling an out-fence in V4L2 would mean that the job on the buffer is done and the buffer can be used by other drivers. Current RFC implementation -- The current implementation is not intended to be more than a PoC to start the discussion on how Explicit Synchronization should be supported in V4L2. The first patch proposes an userspace API for fences, then on patch 2 we prepare to the addition of in-fences in patch 3, by introducing the infrastructure on vb2 to wait on an in-fence signal before queueing the buffer in the driver. Patch 4 fix uvc v4l2 event handling and patch 5 configure q->dev for vivid drivers to enable to subscribe and dequeue events on it. Patches 6-7 enables support to notify BUF_QUEUED events, i.e., let userspace know that particular buffer was enqueued in the driver. This is needed, because we return the out-fence fd as an out argument in QBUF, but at the time it returns we don't know to which buffer the fence will be attached thus the BUF_QUEUED event tells which buffer is associated to the fence received in QBUF by userspace. Patches 8 and 9 add more fence infrastructure to support out-fences and finally patch 10 adds support to out-fences. TODO/OPEN: -- * For this first implementation we will keep the ordering of the buffers queued in videobuf2, that means we will only enqueue buffer whose fence was signalled if that buffer is the first one in the queue. Otherwise it has to wait until it is the first one. This is not implmented yet. Later we could create a flag to allow unordered queing in the drivers from vb2 if needed. * Should we have out-fences per-buffer or per-plane? or both? In this RFC, for simplicity they are per-buffer, but Mauro and Javier raised the option of doing per-plane fences. That could benefit mem2mem and V4L2 <-> GPU operation at least on cases when we have Capture hw that releases the Y frame before the other frames for example. When using V4L2 per-plane out-fences to communicate with KMS they would need to be merged together as currently the DRM Plane interface only supports one fence per DRM Plane. In-fences should be per-buffer as the DRM only has per-buffer fences, but in case of mem2mem operations per-plane fences might be useful? So should we have both ways, per-plane and per-buffer, or just one of them for now? * other open topics are how to deal with hw-fences and Request API. Comments are welcome! Regards, Gustavo --- Gustavo Padovan (9): [media] vb2: add explicit fence user API [media] vb2: split out queueing from vb_core_qbuf() [media] vb2: add in-fence support to QBUF [media] uvc: enable subscriptions to other events [media] vivid: assign the specific device to the vb2_queue->dev [media] v4l: add V4L2_EVENT_BUF_QUEUED event [media] v4l: add support to BUF_QUEUED event [media] vb2: add infrastructure to support out-fences [media] vb2: add out-fence support to QBUF Javier Martinez Canillas (1): [media] vb2: add videobuf2 dma-buf fence helpers drivers/media/Kconfig | 1 + drivers/media/platform/vivid/vivid-core.c | 10 +- drivers/media/usb/uvc/uvc_v4l2.c | 2 +- drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 +- drivers/media/v4l2-core/v4l2-ctrls.c | 6 +- drivers/media/v4l2-core/videobuf2-core.c | 139 -- drivers/media/v4l2-core/videobuf2-v4l2.c | 29 +- include/media/videobuf2-core.h| 12 ++- include/media/videobuf2-fence.h | 49 + include/uapi/linux/videodev2.h| 12 ++- 10 files changed, 218 insertions(+), 46 deletions(-) create mode 100644 include/media/videobuf2-fence.h -- 2.9.3