Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-19 Thread Mauro Carvalho Chehab
Em 18-05-2011 16:46, Sakari Ailus escreveu:
 Hans Verkuil wrote:
 Note that many video receivers cannot stall. You can't tell them to wait 
 until
 the last buffer finished processing. This is different from some/most? 
 sensors.
 
 Not even image sensors. They just output the frame data; if the receiver
 runs out of buffers the data is just lost. And if any part of the frame
 is lost, there's no use for other parts of it either. But that's
 something the receiver must handle, i.e. discard the data and increment
 frame number (field_count in v4l2_buffer).
 
 The interfaces used by image sensors, be they parallel or serial, do not
 provide means to inform the sensor that the receiver has run out of
 buffer space. These interfaces are just unidirectional.

Well, it depends on how the hardware works, really. On most (all?) designs, the
IP block responsible to receive data from a sensor (or to transmit data, on an
output device) is capable of generating an IRQ to notify the OS that a 
framebuffer was filled. So, the V4L driver can mark that buffer as finished 
and remove it from the list of the queued buffers. Although the current API's
don't allow to create a new buffer if the list is empty, it may actually make
sense to allow kernel to dynamically create a new buffer, warranting that the
sensor (or receiver) will never run out of buffers under normal usage.

Of course, the maximum number of buffers should be specified, to avoid having
an unacceptable delay. On such case, the frame will end by being discarded.
It makes sense to provide a way to report userspace if this happens.

Mauro.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-18 Thread Sakari Ailus
Hans Verkuil wrote:
 Note that many video receivers cannot stall. You can't tell them to wait until
 the last buffer finished processing. This is different from some/most? 
 sensors.

Not even image sensors. They just output the frame data; if the receiver
runs out of buffers the data is just lost. And if any part of the frame
is lost, there's no use for other parts of it either. But that's
something the receiver must handle, i.e. discard the data and increment
frame number (field_count in v4l2_buffer).

The interfaces used by image sensors, be they parallel or serial, do not
provide means to inform the sensor that the receiver has run out of
buffer space. These interfaces are just unidirectional.

Regards,

-- 
Sakari Ailus
sakari.ai...@maxwell.research.nokia.com
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-17 Thread Mauro Carvalho Chehab
Em 15-05-2011 18:10, Hans Verkuil escreveu:
 On Saturday, May 14, 2011 13:46:03 Mauro Carvalho Chehab wrote:
 Em 14-05-2011 13:02, Hans Verkuil escreveu:
 On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:

 So, based at all I've seen, I'm pretty much convinced that the normal MMAP
 way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
 are not the best way to share data with framebuffers.

 I agree with that, but it is a different story between two V4L2 devices. 
 There
 you obviously want to use the streaming ioctls and still share buffers.

 I don't think so. the requirement for syncing the framebuffer between the two
 V4L2 devices is pretty much the same as we have with one V4L2 device and one 
 GPU.

 On both cases, the requirement is to pass a framebuffer between two 
 entities, 
 and not a video stream.

 For example, imagine something like:

  V4L2 camera = V4L2 encoder t MPEG2
   ||
   LL== GPU

For the sake of clarity on my next comments, I'm naming the V4L2 camera buffer
write endpoint as producer and the 2 buffer read endpoints as consumers. 

 Both GPU and the V4L2 encoder should use the same logic to be sure that they 
 will
 use a buffer that were filled already by the camera. Also, the V4L2 camera
 driver can't re-use such framebuffer before being sure that both consumers 
 has already stopped using it.
 
 No. A camera whose output is sent to a resizer and then to a SW/FW/HW encoder
 is a typical example where you want to queue/dequeue buffers.

Why? On a framebuffer-oriented set of ioctl's, some kernel internal calls will
need to take care of the buffer usage, in order to be sure when a buffer can
be rewritten, as userspace has no way to know when a buffer needs to be 
queued/dequeued.

In other words, the framebuffer kernel API will probably be using a kernel 
structure like:

struct v4l2_fb_handler {
bool has_finished;  /* Marks when a handler 
finishes to handle the buffer */
bool is_producer;   /* Used by the handler 
that writes data into the buffer */

struct list_head *handlers; /* List with all 
handlers */

void (*qbuf)(struct v4l2_fb_handler *handler);  /* qbuf-like callback, 
called after having a buffer filled */

v4l2_buffer_ID  buf;/* Buffer ID (or 
filehandler?) - In practice, it will probably be a list with the available 
buffers */

void *priv; /* handler priv data */
}

While stream is on, a kernel logic will run a loop, doing basically the steps 
bellow:

1) Wait for the producer to rise the has_finished flag;

2) call qbuf() for all consumers. The qbuf() call shouldn't block; it 
just calls 
   a per-handler logic to start using that buffer;

3) When each fb handler finishes using its buffer, it will rise 
has_finished flag;

4) After having all buffer handlers marked as has_finished, cleans the 
has_finished
  flags and re-queue the buffer.

Step (2) is equivalent to VIDIOC_QBUF, and step (4) is equivalent to 
VIDIOC_DQBUF.

PS.: The above is just a simplified view of such handler. We'll probably need 
more steps. For
example, between (1) and (2) it may probably need some logic to check if is 
there an available
empty buffer. If not, create a new one and use it.

What happens with REQBUF/QBUF/DQBUF is that:
- with those calls, there's just one buffer consumer, and just one 
buffer producer;
- either the producer or the consumer is on userspace, and the other 
pair is
  at kernelspace;
- buffers are allocated before the start of a process, via an explicit 
call;
- buffers need to be mmapped, in order to be visible at userspace.

None of the above applies to a framebuffer-oriented API:
- more than one buffer consumer is allowed;
- consumers and producers are on kernelspace (it might be needed to 
have an
an API for handling such buffers also on userspace, although it doesn't sound a 
good
idea to me, IMHO);
- buffers can be dynamically allocated/de-allocated;
- buffers don't need to be mmapped to userspace.

 Especially since
 the various parts of the pipeline may stall for a bit so you don't want to 
 lose
 frames. That's not what the overlay API is for, that's what our streaming API
 gives us.
 
 The use case above isn't even possible without copying. At least, I don't see 
 a
 way, unless the GPU buffer is non-destructive. In that case you can give the
 frame to the GPU, and when the GPU is finished you can give it to the encoder.
 I suspect that might become quite complex though.

Well, if some fb consumers would also be rewriting the buffers, serializing 
them is
needed, as you can't allow another process to access a memory that CPU is 
destroying 
at the same time, as you'll have unpredicted 

Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-17 Thread Mauro Carvalho Chehab
Em 17-05-2011 09:49, Mauro Carvalho Chehab escreveu:
 Em 15-05-2011 18:10, Hans Verkuil escreveu:
 On Saturday, May 14, 2011 13:46:03 Mauro Carvalho Chehab wrote:
 Em 14-05-2011 13:02, Hans Verkuil escreveu:
 On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:

 So, based at all I've seen, I'm pretty much convinced that the normal MMAP
 way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
 are not the best way to share data with framebuffers.

 I agree with that, but it is a different story between two V4L2 devices. 
 There
 you obviously want to use the streaming ioctls and still share buffers.

 I don't think so. the requirement for syncing the framebuffer between the 
 two
 V4L2 devices is pretty much the same as we have with one V4L2 device and 
 one GPU.

 On both cases, the requirement is to pass a framebuffer between two 
 entities, 
 and not a video stream.

 For example, imagine something like:

 V4L2 camera = V4L2 encoder t MPEG2
  ||
  LL== GPU
 
 For the sake of clarity on my next comments, I'm naming the V4L2 camera 
 buffer
 write endpoint as producer and the 2 buffer read endpoints as consumers. 

 Both GPU and the V4L2 encoder should use the same logic to be sure that 
 they will
 use a buffer that were filled already by the camera. Also, the V4L2 camera
 driver can't re-use such framebuffer before being sure that both consumers 
 has already stopped using it.

 No. A camera whose output is sent to a resizer and then to a SW/FW/HW encoder
 is a typical example where you want to queue/dequeue buffers.
 
 Why? On a framebuffer-oriented set of ioctl's, some kernel internal calls will
 need to take care of the buffer usage, in order to be sure when a buffer can
 be rewritten, as userspace has no way to know when a buffer needs to be 
 queued/dequeued.
 
 In other words, the framebuffer kernel API will probably be using a kernel 
 structure like:
 
 struct v4l2_fb_handler {
   bool has_finished;  /* Marks when a handler 
 finishes to handle the buffer */
   bool is_producer;   /* Used by the handler 
 that writes data into the buffer */
 
   struct list_head *handlers; /* List with all 
 handlers */
 
   void (*qbuf)(struct v4l2_fb_handler *handler);  /* qbuf-like callback, 
 called after having a buffer filled */
 
   v4l2_buffer_ID  buf;/* Buffer ID (or 
 filehandler?) - In practice, it will probably be a list with the available 
 buffers */
 
   void *priv; /* handler priv data */
 }
 
 While stream is on, a kernel logic will run a loop, doing basically the steps 
 bellow:
 
   1) Wait for the producer to rise the has_finished flag;
 
   2) call qbuf() for all consumers. The qbuf() call shouldn't block; it 
 just calls 
  a per-handler logic to start using that buffer;
 
   3) When each fb handler finishes using its buffer, it will rise 
 has_finished flag;
 
   4) After having all buffer handlers marked as has_finished, cleans the 
 has_finished
 flags and re-queue the buffer.
 
 Step (2) is equivalent to VIDIOC_QBUF, and step (4) is equivalent to 
 VIDIOC_DQBUF.
 
 PS.: The above is just a simplified view of such handler. We'll probably need 
 more steps. For
 example, between (1) and (2) it may probably need some logic to check if is 
 there an available
 empty buffer. If not, create a new one and use it.
 
 What happens with REQBUF/QBUF/DQBUF is that:
   - with those calls, there's just one buffer consumer, and just one 
 buffer producer;
   - either the producer or the consumer is on userspace, and the other 
 pair is
 at kernelspace;
   - buffers are allocated before the start of a process, via an explicit 
 call;
   - buffers need to be mmapped, in order to be visible at userspace.
 
 None of the above applies to a framebuffer-oriented API:
   - more than one buffer consumer is allowed;
   - consumers and producers are on kernelspace (it might be needed to 
 have an
 an API for handling such buffers also on userspace, although it doesn't sound 
 a good
 idea to me, IMHO);

A side note: in the specific case of X server and display drivers, such 
kernelspace-userspace
API  for buffers already exists. I don't know DRI/GEM/KMS enough to tell 
exactly how this work 
or if it will require some changes or not, in order to work like the above, but 
it seems that
the right approach is to try to use or extend the existing API's, instead of 
creating 
something new.

The main point is: DQBUF/QBUF API assumes that userspace has full control at 
the buffer usage,
and buffer is handled at userspace (so, they should be mmapped there). This is 
not the general
case where another IP block at the chip is re-using the buffer, or if is there 
another DMA engine
doing direct transfers on it.

   - buffers 

Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-17 Thread Mauro Carvalho Chehab
Em 16-05-2011 17:45, Guennadi Liakhovetski escreveu:
 On Sat, 14 May 2011, Mauro Carvalho Chehab wrote:
 
 Em 18-04-2011 17:15, Jesse Barker escreveu:
 One of the big issues we've been faced with at Linaro is around GPU
 and multimedia device integration, in particular the memory management
 requirements for supporting them on ARM.  This next cycle, we'll be
 focusing on driving consensus around a unified memory management
 solution for embedded systems that support multiple architectures and
 SoCs.  This is listed as part of our working set of requirements for
 the next six-month cycle (in spite of the URL, this is not being
 treated as a graphics-specific topic - we also have participation from
 multimedia and kernel working group folks):

   https://wiki.linaro.org/Cycles//TechnicalTopics/Graphics

 As part of the memory management needs, Linaro organized several discussions
 during Linaro Development Summit (LDS), at Budapest, and invited me and other
 members of the V4L and DRI community to discuss about the requirements.
 I wish to thank Linaro for its initiative.
 
 [snip]
 
 Btw, the need of managing buffers is currently being covered by the proposal
 for new ioctl()s to support multi-sized video-buffers [1].

 [1] http://www.spinics.net/lists/linux-media/msg30869.html

 It makes sense to me to discuss such proposal together with the above 
 discussions, 
 in order to keep the API consistent.
 
 The author of that RFC would have been thankful, if he had been put on 
 Cc: ;) 

If I had added everybody interested on this summary, probably most smtp servers 
would
refuse to deliver the message thinking that it is a SPAM ;) My intention were 
to submit
a feedback about it when analysing your rfc patches, if you weren't able to see 
it
before.

 But anyway, yes, consistency is good, but is my understanding 
 correct, that functionally these two extensions - multi-size and 
 buffer-forwarding/reuse are independent?

Yes.

 We have to think about making the 
 APIs consistent, e.g., by reusing data structures. But it's also good to 
 make incremental smaller changes where possible, isn't it? So, yes, we 
 should think about consistency, but develop and apply those two extensions 
 separately?

True, but one discussion can benefit the other. IMO, we should not rush new
userspace API merges, to avoid merging a code that weren't reasonably discussed,
as otherwise, the API will become too messy.

Thanks,
Mauro.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-16 Thread Guennadi Liakhovetski
On Sat, 14 May 2011, Mauro Carvalho Chehab wrote:

 Em 18-04-2011 17:15, Jesse Barker escreveu:
  One of the big issues we've been faced with at Linaro is around GPU
  and multimedia device integration, in particular the memory management
  requirements for supporting them on ARM.  This next cycle, we'll be
  focusing on driving consensus around a unified memory management
  solution for embedded systems that support multiple architectures and
  SoCs.  This is listed as part of our working set of requirements for
  the next six-month cycle (in spite of the URL, this is not being
  treated as a graphics-specific topic - we also have participation from
  multimedia and kernel working group folks):
  
https://wiki.linaro.org/Cycles//TechnicalTopics/Graphics
 
 As part of the memory management needs, Linaro organized several discussions
 during Linaro Development Summit (LDS), at Budapest, and invited me and other
 members of the V4L and DRI community to discuss about the requirements.
 I wish to thank Linaro for its initiative.

[snip]

 Btw, the need of managing buffers is currently being covered by the proposal
 for new ioctl()s to support multi-sized video-buffers [1].
 
 [1] http://www.spinics.net/lists/linux-media/msg30869.html
 
 It makes sense to me to discuss such proposal together with the above 
 discussions, 
 in order to keep the API consistent.

The author of that RFC would have been thankful, if he had been put on 
Cc: ;) But anyway, yes, consistency is good, but is my understanding 
correct, that functionally these two extensions - multi-size and 
buffer-forwarding/reuse are independent? We have to think about making the 
APIs consistent, e.g., by reusing data structures. But it's also good to 
make incremental smaller changes where possible, isn't it? So, yes, we 
should think about consistency, but develop and apply those two extensions 
separately?

Thanks
Guennadi

 On my understanding, the SoC people that are driving those changes will
 be working on providing the API proposals for it. They should also be
 providing the needed patches, open source drivers and userspace 
 application(s) 
 that allows testing and validating the GPU == V4L transfers using the newly 
 API.
 
 Thanks,
 Mauro
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-15 Thread Hans Verkuil
On Saturday, May 14, 2011 13:46:03 Mauro Carvalho Chehab wrote:
 Em 14-05-2011 13:02, Hans Verkuil escreveu:
  On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:
 
  So, based at all I've seen, I'm pretty much convinced that the normal MMAP
  way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
  are not the best way to share data with framebuffers.
  
  I agree with that, but it is a different story between two V4L2 devices. 
  There
  you obviously want to use the streaming ioctls and still share buffers.
 
 I don't think so. the requirement for syncing the framebuffer between the two
 V4L2 devices is pretty much the same as we have with one V4L2 device and one 
 GPU.
 
 On both cases, the requirement is to pass a framebuffer between two entities, 
 and not a video stream.
 
 For example, imagine something like:
 
   V4L2 camera = V4L2 encoder t MPEG2
||
LL== GPU
 
 Both GPU and the V4L2 encoder should use the same logic to be sure that they 
 will
 use a buffer that were filled already by the camera. Also, the V4L2 camera
 driver can't re-use such framebuffer before being sure that both consumers 
 has already stopped using it.

No. A camera whose output is sent to a resizer and then to a SW/FW/HW encoder
is a typical example where you want to queue/dequeue buffers. Especially since
the various parts of the pipeline may stall for a bit so you don't want to lose
frames. That's not what the overlay API is for, that's what our streaming API
gives us.

The use case above isn't even possible without copying. At least, I don't see a
way, unless the GPU buffer is non-destructive. In that case you can give the
frame to the GPU, and when the GPU is finished you can give it to the encoder.
I suspect that might become quite complex though.

Note that many video receivers cannot stall. You can't tell them to wait until
the last buffer finished processing. This is different from some/most? sensors.

So if you try to send the input of a video receiver to some device that requires
syncing which can cause stalls, then that will not work without losing frames.
Which especially for video encoding is not desirable.

Of course, it might be that we mean the same, but just use different words :-(

Regards,

Hans

 
 So, it is the same requirement as having four displays receiving such 
 framebuffer.
 
 Of course, a GPU endpoint may require some extra information for the blending,
 but also a V4L node may require some other type of extra information.
 
  We probably need
  something that it will be an enhanced version of the 
  VIDIOC_FBUF/VIDIOC_OVERLAY
  ioctls. Unfortunately, we can't just add more stuff there, as there's no
  reserved space. So, we'll probably add some VIDIOC_FBUF2 series of ioctl's.
  
  That will be useful as well to add better support for blending and 
  Z-ordering
  between overlays. The old API for that is very limited in that respect.
 
 Agreed.
 
 Mauro.
 
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-15 Thread Alan Cox
  On both cases, the requirement is to pass a framebuffer between two 
  entities, 
  and not a video stream.

It may not even be a framebuffer. In many cases you'll pass a framebuffer
or some memory target (in DRI think probably a GEM handle), in fact in
theory you can do much of this now.

  use a buffer that were filled already by the camera. Also, the V4L2 camera
  driver can't re-use such framebuffer before being sure that both consumers 
  has already stopped using it.

You also potentially need fences which complicates the interface
somewhat.

 The use case above isn't even possible without copying. At least, I don't see 
 a
 way, unless the GPU buffer is non-destructive. In that case you can give the
 frame to the GPU, and when the GPU is finished you can give it to the encoder.
 I suspect that might become quite complex though.

It's actually no different to giving a buffer to the GPU some of the time
and the CPU other bits. In those cases you often need to ensure private
ownership each side and do fencing/cache flushing as needed.

 Note that many video receivers cannot stall. You can't tell them to wait until
 the last buffer finished processing. This is different from some/most? 
 sensors.

A lot of video receivers also keep the bits away from the CPU as part of
the general DRM delusion TV operators work under. That means you've got
an object that has a handle, has operations (alpha, fade, scale, etc) but
you can never touch the bits. In the TV/Video world not unsurprisingly
that is often seen as the 'primary' frame buffer as well. You've got a
set of mappable framebuffers the CPU can touch plus other video sources
that can be mixed and placed but the CPU can only touch the mappable
objects that form part of the picture.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-15 Thread Rob Clark
On Sun, May 15, 2011 at 4:27 PM, Alan Cox a...@lxorguk.ukuu.org.uk wrote:
  On both cases, the requirement is to pass a framebuffer between two 
  entities,
  and not a video stream.

 It may not even be a framebuffer. In many cases you'll pass a framebuffer
 or some memory target (in DRI think probably a GEM handle), in fact in
 theory you can do much of this now.

  use a buffer that were filled already by the camera. Also, the V4L2 camera
  driver can't re-use such framebuffer before being sure that both consumers
  has already stopped using it.

 You also potentially need fences which complicates the interface
 somewhat.

Presumable this is going through something like DRI2, so the client
application, which is what is interacting w/ V4L2 interface for camera
and perhaps video encoder, would call something that turns into a
ScheduleSwap() call on xserver side, returning a frame count to wait
for, and then at some point later ScheduleWaitMSC() to wait for that
frame count to know the GPU is done with the buffer.  The fences would
be buried somewhere within DRM (kernel) and xserver driver (userspace)
to keep the client app blocked until GPU is done.

You probably don't want the V4L2 devices to be too deeply connected to
how the GPU does synchronization, or otherwise V4L2 would need to
support each different DRM+xserver driver and how it implements buffer
synchronization with the GPU..

BR,
-R

 The use case above isn't even possible without copying. At least, I don't 
 see a
 way, unless the GPU buffer is non-destructive. In that case you can give the
 frame to the GPU, and when the GPU is finished you can give it to the 
 encoder.
 I suspect that might become quite complex though.

 It's actually no different to giving a buffer to the GPU some of the time
 and the CPU other bits. In those cases you often need to ensure private
 ownership each side and do fencing/cache flushing as needed.

 Note that many video receivers cannot stall. You can't tell them to wait 
 until
 the last buffer finished processing. This is different from some/most? 
 sensors.

 A lot of video receivers also keep the bits away from the CPU as part of
 the general DRM delusion TV operators work under. That means you've got
 an object that has a handle, has operations (alpha, fade, scale, etc) but
 you can never touch the bits. In the TV/Video world not unsurprisingly
 that is often seen as the 'primary' frame buffer as well. You've got a
 set of mappable framebuffers the CPU can touch plus other video sources
 that can be mixed and placed but the CPU can only touch the mappable
 objects that form part of the picture.

 Alan
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-14 Thread Mauro Carvalho Chehab
Em 18-04-2011 17:15, Jesse Barker escreveu:
 One of the big issues we've been faced with at Linaro is around GPU
 and multimedia device integration, in particular the memory management
 requirements for supporting them on ARM.  This next cycle, we'll be
 focusing on driving consensus around a unified memory management
 solution for embedded systems that support multiple architectures and
 SoCs.  This is listed as part of our working set of requirements for
 the next six-month cycle (in spite of the URL, this is not being
 treated as a graphics-specific topic - we also have participation from
 multimedia and kernel working group folks):
 
   https://wiki.linaro.org/Cycles//TechnicalTopics/Graphics

As part of the memory management needs, Linaro organized several discussions
during Linaro Development Summit (LDS), at Budapest, and invited me and other
members of the V4L and DRI community to discuss about the requirements.
I wish to thank Linaro for its initiative.

Basically, on several SoC designs, the GPU and the CPU are integrated into
the same chipset and they can share the same memory for a framebuffer. Also,
they may have some IP blocks that allow processing the framebuffer internally,
to do things like enhancing the image and converting it into an mpeg stream.

The desire, from the SoC developers, is that those operations should be
done using zero-copy transfers.

This resembles somewhat the idea of the VIDIOC_OVERLAY/VIDIOC_FBUF API, 
that was used in the old days where CPUs weren't fast enough to process
video without generating a huge load on it. So the overlay mode were created
to allow direct PCI2PCI transfers from the video capture board into the
display adapter, using XVideo extension, and removing the overload at the
CPU due to a video stream. It were designed as a Kernel API for it, and an
userspace X11 driver, that passes a framebuffer reference to the V4L driver,
where it is used to program the DMA transfers to happen inside the framebuffer.

At the LDS, we had a 3-day discussions about how the buffer sharing should
be handled, and Linaro is producing a blueprint plan to address the needs.
We had also a discussion about V4L and KMS, allowing both communities to better
understand how things are supposed to work on the other side.

From V4L2 perspective, what is needed is to create a way to somehow allow
passing a framebuffer between two V4L2 devices and between a V4L2 device
and GPU. The V4L2 device can either be an input or an output one.
The original idea were to add yet-another-mmap-mode at the VIDIOC streaming
ioctls, and keep using QBUF/DQBUF to handle it. However, as I've pointed
there, this would leed into sync issues on a shared buffer, causing flip
effects. Also, as the API is generic, it can be used also on generic computers,
like desktops, notebooks and tablets (even on arm-based designs), and it
may end to be actually implemented as a PCI2PCI transfer.

So, based at all I've seen, I'm pretty much convinced that the normal MMAP
way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
are not the best way to share data with framebuffers. We probably need
something that it will be an enhanced version of the VIDIOC_FBUF/VIDIOC_OVERLAY
ioctls. Unfortunately, we can't just add more stuff there, as there's no
reserved space. So, we'll probably add some VIDIOC_FBUF2 series of ioctl's.

It seems to me that the proper way to develop such API is to start working
with Xorg V4L driver, changing it to work with KMS and with the new API
(probably porting some parts of the Xorg driver to kernelspace).

One of the problems with a shared framebuffer is that an overlayed V4L stream
may, at the worse case, be sent to up to 4 different GPU's and/or displays.

Imagine a scenario like:

===+===
|  |  |
|  D1 +|---+ D2   |
| | V4L|   |  |
+-|+---|--|
| ||   |  |
|  D3 ++---+ D4   |
|  |  |
===


Where D1, D2, D3 and D4 are 4 different displays, and the same V4L framebuffer 
is
partially shared between them (the above is an example of a V4L input, although
the reverse scenario of having one frame buffer divided into 4 V4L outputs
also seems to be possible).

As the same image may be divided into 4 monitors, the buffer filling should be
synced with all of them, in order to avoid flipping effects. Also, the shared
buffer can't be re-used until all displays finish reading. From what I 
understood 
from the discussions with DRI people, the display API's currently has similar 
issues
of needing to wait for a buffer to be completely used before allowing it to be
re-used. According to them, this were solved there by dynamically allocating 
buffers. 
We may need to do 

Re: Summary of the V4L2 discussions during LDS - was: Re: Embedded Linux memory management interest group list

2011-05-14 Thread Hans Verkuil
On Saturday, May 14, 2011 12:19:18 Mauro Carvalho Chehab wrote:
 Em 18-04-2011 17:15, Jesse Barker escreveu:
  One of the big issues we've been faced with at Linaro is around GPU
  and multimedia device integration, in particular the memory management
  requirements for supporting them on ARM.  This next cycle, we'll be
  focusing on driving consensus around a unified memory management
  solution for embedded systems that support multiple architectures and
  SoCs.  This is listed as part of our working set of requirements for
  the next six-month cycle (in spite of the URL, this is not being
  treated as a graphics-specific topic - we also have participation from
  multimedia and kernel working group folks):
  
https://wiki.linaro.org/Cycles//TechnicalTopics/Graphics
 
 As part of the memory management needs, Linaro organized several discussions
 during Linaro Development Summit (LDS), at Budapest, and invited me and other
 members of the V4L and DRI community to discuss about the requirements.
 I wish to thank Linaro for its initiative.
 
 Basically, on several SoC designs, the GPU and the CPU are integrated into
 the same chipset and they can share the same memory for a framebuffer. Also,
 they may have some IP blocks that allow processing the framebuffer internally,
 to do things like enhancing the image and converting it into an mpeg stream.
 
 The desire, from the SoC developers, is that those operations should be
 done using zero-copy transfers.
 
 This resembles somewhat the idea of the VIDIOC_OVERLAY/VIDIOC_FBUF API, 
 that was used in the old days where CPUs weren't fast enough to process
 video without generating a huge load on it. So the overlay mode were created
 to allow direct PCI2PCI transfers from the video capture board into the
 display adapter, using XVideo extension, and removing the overload at the
 CPU due to a video stream. It were designed as a Kernel API for it, and an
 userspace X11 driver, that passes a framebuffer reference to the V4L driver,
 where it is used to program the DMA transfers to happen inside the 
 framebuffer.
 
 At the LDS, we had a 3-day discussions about how the buffer sharing should
 be handled, and Linaro is producing a blueprint plan to address the needs.
 We had also a discussion about V4L and KMS, allowing both communities to 
 better
 understand how things are supposed to work on the other side.
 
 From V4L2 perspective, what is needed is to create a way to somehow allow
 passing a framebuffer between two V4L2 devices and between a V4L2 device
 and GPU. The V4L2 device can either be an input or an output one.
 The original idea were to add yet-another-mmap-mode at the VIDIOC streaming
 ioctls, and keep using QBUF/DQBUF to handle it. However, as I've pointed
 there, this would leed into sync issues on a shared buffer, causing flip
 effects. Also, as the API is generic, it can be used also on generic 
 computers,
 like desktops, notebooks and tablets (even on arm-based designs), and it
 may end to be actually implemented as a PCI2PCI transfer.
 
 So, based at all I've seen, I'm pretty much convinced that the normal MMAP
 way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's)
 are not the best way to share data with framebuffers.

I agree with that, but it is a different story between two V4L2 devices. There
you obviously want to use the streaming ioctls and still share buffers.

 We probably need
 something that it will be an enhanced version of the 
 VIDIOC_FBUF/VIDIOC_OVERLAY
 ioctls. Unfortunately, we can't just add more stuff there, as there's no
 reserved space. So, we'll probably add some VIDIOC_FBUF2 series of ioctl's.

That will be useful as well to add better support for blending and Z-ordering
between overlays. The old API for that is very limited in that respect.

Regards,

Hans

 It seems to me that the proper way to develop such API is to start working
 with Xorg V4L driver, changing it to work with KMS and with the new API
 (probably porting some parts of the Xorg driver to kernelspace).
 
 One of the problems with a shared framebuffer is that an overlayed V4L stream
 may, at the worse case, be sent to up to 4 different GPU's and/or displays.
 
 Imagine a scenario like:
 
   ===+===
   |  |  |
   |  D1 +|---+ D2   |
   | | V4L|   |  |
   +-|+---|--|
   | ||   |  |
   |  D3 ++---+ D4   |
   |  |  |
   ===
 
 
 Where D1, D2, D3 and D4 are 4 different displays, and the same V4L 
 framebuffer is
 partially shared between them (the above is an example of a V4L input, 
 although
 the reverse scenario of having one frame buffer divided into 4 V4L outputs
 also seems to be possible).
 
 As the same image may be divided into 4 monitors,