RE: Mem2Mem V4L2 devices [RFC]

2009-10-14 Thread Marek Szyprowski
Hello,

On Thursday, October 08, 2009 11:26 PM Karicheri, Muralidharan wrote:

  Why not? In a typical camera scenario, application can feed one frame and 
  get two output frames (one
  for storing and another for sending over email (a lower resolution). I just 
  gave an example.
 
  You gave an example of the Y-type pipeline which start in real streaming
  device (camera) which is completely different thing. Y-type CAPTURE pipeline
  is quite common thing, which can be simply mapped to 2 different capture
  video nodes.
 
  In my previous mail I asked about Y-type pipeline which starts in memory. I
  don't think there is any common use case for such thing.
 
 Marek,
 
 You can't say that. This feature is currently supported in our internal 
 release which is
 being used by our customers. So for feature parity it is required to be 
 supported as
 we can't determine how many customers are using this feature. Besides in the 
 above
 scenario that I have mentioned, following happens.
 
 sensor - CCDC - Memory (video node)
 
 Memory - Previewer - Resizer1 - Memory
|- Resizer2 - Memory
 
 Typically application capture full resolution frame (Bayer RGB) to Memory and 
 then use Previewer
 and Resizer in memory to memory mode to do conversion to UYVY format. But 
 application use second
 resizer to get a lower resolution frame simultaneously. We would like to 
 expose this hardware
 capability to user application through this memory to memory device.

Ok. I understand that Your current custom API exports such functionality. I 
thought
a bit about this issue and found a solution how this can be implemented using 
one
video node approach. It would require additional custom ioctl but imho there is 
no
other way.

An application can open the /dev/videoX node 2 times. Then it can 'link' them 
with
this special ioctl, so the driver would know which instances are 'linked' 
together. 
Then the application queues source buffer to both instances, sets destination
format/size/colorspace/etc, and queues output buffers. Then calls stream on both
instances. The driver can detect if the 2 instances has been linked together and
if the source buffer is the same in both of them, it will use this special 
feature
of your hardware and run 2 resizers simultaneously. This sounds a bit 
complicated
(especially because the driver would need to play a bit with synchronization and
possible races...), but currently I see no other possibility to implement it on
top of one-video-node approach.

  Since only one capture queue per IO instance is possible in this model 
  (matched by buf type), I
 don't
  think we can scale it for 2 outputs case. Or is it possible to queue 2 
  output buffers of two
 different
  sizes to the same queue?
 
 This can be hacked by introducing yet another 'type' (for example
 SECOND_CAPTURE), but I don't like such solution. Anyway - would we really
 need Y-type mem2mem device?
 
 Yes. No hacking please! We should be able to do S_FMT for the second Resizer 
 output and dequeue
 the frame. Not sure how can we handle this in this model.

Currently I see no clean way of adding support for more than one output in one 
video node approach.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC] - Can we enhance the V4L2 API?

2009-10-08 Thread Marek Szyprowski
Hello,

On Wednesday, October 07, 2009 3:39 PM Karicheri, Muralidharan wrote:

  As we have seen in the discussion, this is not a streaming device, rather
  a transaction/conversion device which operate on a given frame to get a
 desired output frame. Each
  transaction may have it's own set of configuration context which will be
 applied to the hardware
  before starting the operation. This is unlike a streaming device, where
 most of the configuration is
  done prior to starting the streaming.
 
 From the application point of view an instance of such a device still is a
 streaming device. The application should not even know if
 any other apps are using the device or not (well, it may only notice the
 lower throughput or higher device latency, but this cannot
 be avoided). Application can queue input and output buffers, stream on and
 wait for the result.
 
 In a typical capture or display side streaming, AFAIK, there is only one 
 device io instance. While
 streaming is ON, if another application tries to do IO, driver returns 
 -EBUSY. I believe this is true
 for all drivers (Correct me if this is not true).When you say the memory to 
 memory device is able to
 allow multiple application to call STREAMON, this model is broken(Assuming 
 what I said above is true).
 
 May be I am missing something here. Is the following true? I think in your 
 model, each application
 gets a device instance that has it's own scaling factors and other 
 parameters. So streaming status is
 maintained for each IO instance. Each IO instance has it's own buffer queues. 
 If this is true then you
 are right. Streaming model is not broken.

This is exactly what I mean. Typical capture or display devices are single 
instance from the definition (I cannot imagine more than
one application streaming _directly_ from the camera interface). However, a 
multi-instance support for mem2mem device perfectly
makes sense and heavily improves the usability of it. 

 So following scenario holds good concurrently (api call sequence).
 
 App1 - open() - S_FMT - STREAMON-QBUF/DQBUF(n times)-STREAMOFF-close()
 App2 - open() - S_FMT - STREAMON-QBUF/DQBUF(n times)-STREAMOFF-close()
 
 App3 - open() - S_FMT - STREAMON-QBUF/DQBUF(n times)-STREAMOFF-close()

Exactly.
 
 So internal to driver, if there are multiple concurrent streamon requests, 
 and hardware is busy,
 subsequent requests waits until the first one is complete and driver 
 schedules requests from multiple
 IO queues. So this is essentially what we have in our internal implementation 
 (discussed during the
 linux plumbers mini summit) converted to v4l2 model.

Right, this is what we also have in our custom v4l2-incompatible drivers.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center



--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-08 Thread Marek Szyprowski
Hello,

On Wednesday, October 07, 2009 4:03 PM Karicheri, Muralidharan wrote:

 How the hardware is actually designed? I see two possibilities:
 
 1.
 [input buffer] --[dma engine] [resizer1] --[dma]- [mem output
 buffer1]
\- [resizer2] --[dma]- [mem output
 buffer2]
 
 This is the case.
 2.
 [input buffer] ---[dma engine1]- [resizer1] --[dma]- [mem output buffer1]
 \-[dma engine2]- [resizer2] --[dma]- [mem output buffer2]
 
 In the first case we would really have problems mapping it properly to
 video
 nodes. But we should think if there are any use cases of such design? (in
 terms of mem-2-mem device)
 
 Why not? In a typical camera scenario, application can feed one frame and get 
 two output frames (one
 for storing and another for sending over email (a lower resolution). I just 
 gave an example.

You gave an example of the Y-type pipeline which start in real streaming
device (camera) which is completely different thing. Y-type CAPTURE pipeline
is quite common thing, which can be simply mapped to 2 different capture
video nodes.

In my previous mail I asked about Y-type pipeline which starts in memory. I
don't think there is any common use case for such thing.

  I know that this Y-type design makes sense as a
 part of the pipeline from a sensor or decoder device. But I cannot find any
 useful use case for mem2mem version of it.
 
 The second case is much more trivial. One can just create two separate
 resizer
 devices (with their own nodes) or one resizer driver with two hardware
 resizers underneath it. In both cases application would simply queue the
 input
 buffer 2 times for both transactions.
 I am assuming we are using the One node implementation model suggested by 
 Ivan.
 
 At hardware, streaming should happen at the same time (only one bit in 
 register). So if we have second
 node for the same, then driver needs to match the IO instance of second 
 device with the corresponding
 request on first node and this takes us to the same complication as with 2 
 video nodes implementation.

Right.

 Since only one capture queue per IO instance is possible in this model 
 (matched by buf type), I don't
 think we can scale it for 2 outputs case. Or is it possible to queue 2 output 
 buffers of two different
 sizes to the same queue?

This can be hacked by introducing yet another 'type' (for example
SECOND_CAPTURE), but I don't like such solution. Anyway - would we really
need Y-type mem2mem device?

Best regards
--
Marek Szyprowski
Samsung Poland RD Center


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-07 Thread Marek Szyprowski
Hello,

On Tuesday, October 06, 2009 6:12 PM Hiremath, Vaibhav wrote:

  On Monday, October 05, 2009 8:27 PM Hiremath, Vaibhav wrote:
 
 [Hiremath, Vaibhav] IMO, this implementation is not streaming
model, we are trying to fit mem-to-mem
 forcefully to streaming.
   
Why this does not fit streaming? I see no problems with
  streaming
over mem2mem device with only one video node. You just queue
  input
and output buffers (they are distinguished by 'type' parameter)
  on
the same video node.
   
   [Hiremath, Vaibhav] Do we create separate queue of buffers based
  on type? I think we don't.
 
  Why not? I really see no problems implementing such driver,
  especially if this heavily increases the number of use cases where
  such
  device can be used.
 
 [Hiremath, Vaibhav] I thought of it and you are correct, it should be 
 possible. I was kind of biased
 and thinking in only one direction. Now I don't see any reason why we should 
 go for 2 device node
 approach. Earlier I was thinking of 2 device nodes for 2 queues, if it is 
 possible with one device
 node then I think we should align to single device node approach.
 
 Do you see any issues with it?

Currently, it looks that all issues are resolved. However, something might
arise during the implementation. If so, I will post it here of course.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC] - Can we enhance the V4L2 API?

2009-10-07 Thread Karicheri, Muralidharan
Marek,


 As we have seen in the discussion, this is not a streaming device, rather
 a transaction/conversion device which operate on a given frame to get a
desired output frame. Each
 transaction may have it's own set of configuration context which will be
applied to the hardware
 before starting the operation. This is unlike a streaming device, where
most of the configuration is
 done prior to starting the streaming.

From the application point of view an instance of such a device still is a
streaming device. The application should not even know if
any other apps are using the device or not (well, it may only notice the
lower throughput or higher device latency, but this cannot
be avoided). Application can queue input and output buffers, stream on and
wait for the result.

In a typical capture or display side streaming, AFAIK, there is only one device 
io instance. While streaming is ON, if another application tries to do IO, 
driver returns -EBUSY. I believe this is true for all drivers (Correct me if 
this is not true).When you say the memory to memory device is able to allow 
multiple application to call STREAMON, this model is broken(Assuming what I 
said above is true).

May be I am missing something here. Is the following true? I think in your 
model, each application gets a device instance that has it's own scaling 
factors and other parameters. So streaming status is maintained for each IO 
instance. Each IO instance has it's own buffer queues. If this is true then you 
are right. Streaming model is not broken.

So following scenario holds good concurrently (api call sequence).

App1 - open() - S_FMT - STREAMON-QBUF/DQBUF(n times)-STREAMOFF-close()
App2 - open() - S_FMT - STREAMON-QBUF/DQBUF(n times)-STREAMOFF-close()

App3 - open() - S_FMT - STREAMON-QBUF/DQBUF(n times)-STREAMOFF-close()

So internal to driver, if there are multiple concurrent streamon requests, and 
hardware is busy, subsequent requests waits until the first one is complete and 
driver schedules requests from multiple IO queues. So this is essentially what 
we have in our internal implementation (discussed during the linux plumbers 
mini summit) converted to v4l2 model.

 The changes done during streaming are controls like brightness,
 contrast, gain etc. The frames received by application are either
synchronized to an input source
 timing or application output frame based on a display timing. Also a
single IO instance is usually
 maintained at the driver where as in the case of memory to memory device,
hardware needs to switch
 contexts between operations. So we might need a different approach than
capture/output device.

All this is internal to the device driver, which can hide it from the
application.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center



--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-07 Thread Karicheri, Muralidharan
Marek,


How the hardware is actually designed? I see two possibilities:

1.
[input buffer] --[dma engine] [resizer1] --[dma]- [mem output
buffer1]
   \- [resizer2] --[dma]- [mem output
buffer2]

This is the case.
2.
[input buffer] ---[dma engine1]- [resizer1] --[dma]- [mem output buffer1]
\-[dma engine2]- [resizer2] --[dma]- [mem output buffer2]

In the first case we would really have problems mapping it properly to
video
nodes. But we should think if there are any use cases of such design? (in
terms of mem-2-mem device)

Why not? In a typical camera scenario, application can feed one frame and get 
two output frames (one for storing and another for sending over email (a lower 
resolution). I just gave an example. You would say that this can be done in two 
steps, but when hardware is capable of doing this parallel, why not driver 
provide the support?

 I know that this Y-type design makes sense as a
part of the pipeline from a sensor or decoder device. But I cannot find any
useful use case for mem2mem version of it.

The second case is much more trivial. One can just create two separate
resizer
devices (with their own nodes) or one resizer driver with two hardware
resizers underneath it. In both cases application would simply queue the
input
buffer 2 times for both transactions.
I am assuming we are using the One node implementation model suggested by Ivan.

At hardware, streaming should happen at the same time (only one bit in 
register). So if we have second node for the same, then driver needs to match 
the IO instance of second device with the corresponding request on first node 
and this takes us to the same complication as with 2 video nodes 
implementation. Since only one capture queue per IO instance is possible in 
this model (matched by buf type), I don't think we can scale it for 2 outputs 
case. Or is it possible to queue 2 output buffers of two different sizes to the 
same queue?  

Best regards
--
Marek Szyprowski
Samsung Poland RD Center



--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-06 Thread Marek Szyprowski
Hello,

On Monday, October 05, 2009 8:07 PM Hiremath, Vaibhav wrote:

 -Original Message-
 From: Hiremath, Vaibhav [mailto:hvaib...@ti.com]
 Sent: Monday, October 05, 2009 8:07 PM
 To: Marek Szyprowski; linux-media@vger.kernel.org
 Cc: kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak
 Subject: RE: Mem2Mem V4L2 devices [RFC]
 
  -Original Message-
  From: Marek Szyprowski [mailto:m.szyprow...@samsung.com]
  Sent: Monday, October 05, 2009 7:26 PM
  To: Hiremath, Vaibhav; linux-media@vger.kernel.org
  Cc: kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak; Marek
  Szyprowski
  Subject: RE: Mem2Mem V4L2 devices [RFC]
 
  Hello,
 
  On Monday, October 05, 2009 7:43 AM Hiremath, Vaibhav wrote:
 
   In terms of V4L2 framework such device would be both video
sink
and source at the same time. The main problem is how the video
  nodes
(/dev/videoX) should be assigned to such a device.
   
The simplest way of implementing mem2mem device in v4l2
  framework
would
use two video nodes (one for input and one for output). Such an
  idea
has
been already suggested on V4L2 mini-summit.
   [Hiremath, Vaibhav] We discussed 2 options during summit,
  
   1) Only one video device node, and configuring parameters using
  V4L2_BUF_TYPE_VIDEO_CAPTURE for input
   parameter and V4L2_BUF_TYPE_VIDEO_OUTPUT for output parameter.
  
   2) 2 separate video device node, one with
  V4L2_BUF_TYPE_VIDEO_CAPTURE and another with
   V4L2_BUF_TYPE_VIDEO_OUTPUT, as mentioned by you.
  
   The obvious and preferred option would be 2, because with option 1
  we could not able to achieve real
   streaming. And again we have to put constraint on application for
  fixed input buffer index.
 
  What do you mean by real streaming?
 
 [Hiremath, Vaibhav] I meant, after streamon, there will be just sequence of 
 queuing and de-queuing of
 buffers. With single node of operation, how are we deciding which is input 
 buffer and which one is
 output?

By the buffer-type parameter. The only difference is that you will queue both 
buffers into the same video node.

 We have to assume or put constraint on application that the 0th index will be 
 always input,
 irrespective of number of buffers requested.

No. The input buffers will be distinguished by the type parameter.

   [Hiremath, Vaibhav] Please note that we must put one limitation to
  application that, the buffers in
   both the video nodes are mapped one-to-one. This means that,
  
   Video0 (input)Video1 (output)
   Index-0   == index-0
   Index-1   == index-1
   Index-2   == index-2
  
   Do you see any other option to this? I think this constraint is
  obvious from application point of view
   in during streaming.
 
  This is correct. Every application should queue a corresponding
  output buffer for each queued input buffer.
  NOTE that the this while discussion is how make it possible to have
  2 different applications running at the same time, each of them
  queuing their own input and output buffers. It will look somehow
  like this:
 
  Video0 (input)  Video1 (output)
  App1, Index-0   == App1, index-0
  App2, Index-0   == App2, index-0
  App1, Index-1   == App1, index-1
  App2, Index-1   == App2, index-1
  App1, Index-2   == App1, index-2
  App2, Index-2   == App2, index-2
 
  Note, that the absolute order of the queue/dequeue might be
  different, but each application should get the right output buffer,
  which corresponds to the queued input buffer.
 
 [Hiremath, Vaibhav] We have to create separate queues for every device open 
 call. It would be
 difficult/complex for the driver to maintain special queue for request from 
 number of applications.

I know that this would be complex for every driver to maintain its special 
queues. But imho such an use case (multiple instance
support) is so important (especially for embedded applications) that it is 
worth to properly design an additional framework for
mem2mem v4l2 devices, so all the buffers handling will be hidden from the 
actual drivers.

   [Hiremath, Vaibhav] Initially I thought of having separate queue
  in driver which tries to make maximum
   usage of underneath hardware. Application just will queue the
  buffers and call streamon, driver
   internally queues it in his own queue and issues a resize
  operation (in this case) for all the queued
   buffers, releasing one-by-one to application. We have similar
  implementation internally, but not with
   standard V4L2 framework, it uses custom IOCTL's for everything.
 
  This is similar to what we have currently, however we want to move
  all our custom drivers into the generic kernel frameworks.
 
   But when we decided to provide User Space library with media
  controller, I thought of moving this
   burden to application layer. Application library will create an
  interface

RE: Mem2Mem V4L2 devices [RFC] - Can we enhance the V4L2 API?

2009-10-06 Thread Marek Szyprowski
Hello,

On October 06, 2009 12:31 AM Karicheri, Muralidharan wrote:

 Are we constrained to use the QBUF/DQBUF/STREAMON/STREAMOFF model for this 
 specific device (memory to
 memory)? What about adding new IOCTLs that can be used for this specific 
 device type that possibly can
 simplify the implementation?

Don't forget about the simplest V4L2 io model based on read() and write() 
calls. This io model fits very well into
transaction/conversion like device. There is an issue with blocking calls, as 
the applications would need to use threads in order to
do simple image conversion, but this can be easily avoided with non-blocking io 
and poll().

 As we have seen in the discussion, this is not a streaming device, rather
 a transaction/conversion device which operate on a given frame to get a 
 desired output frame. Each
 transaction may have it's own set of configuration context which will be 
 applied to the hardware
 before starting the operation. This is unlike a streaming device, where most 
 of the configuration is
 done prior to starting the streaming.

From the application point of view an instance of such a device still is a 
streaming device. The application should not even know if
any other apps are using the device or not (well, it may only notice the lower 
throughput or higher device latency, but this cannot
be avoided). Application can queue input and output buffers, stream on and wait 
for the result.

 The changes done during streaming are controls like brightness,
 contrast, gain etc. The frames received by application are either 
 synchronized to an input source
 timing or application output frame based on a display timing. Also a single 
 IO instance is usually
 maintained at the driver where as in the case of memory to memory device, 
 hardware needs to switch
 contexts between operations. So we might need a different approach than 
 capture/output device.

All this is internal to the device driver, which can hide it from the 
application.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-06 Thread Marek Szyprowski
Hello,

On Monday, October 05, 2009 8:27 PM Hiremath, Vaibhav wrote:

   [Hiremath, Vaibhav] IMO, this implementation is not streaming
  model, we are trying to fit mem-to-mem
   forcefully to streaming.
 
  Why this does not fit streaming? I see no problems with streaming
  over mem2mem device with only one video node. You just queue input
  and output buffers (they are distinguished by 'type' parameter) on
  the same video node.
 
 [Hiremath, Vaibhav] Do we create separate queue of buffers based on type? I 
 think we don't.

Why not? I really see no problems implementing such driver, especially if this 
heavily increases the number of use cases where such
device can be used.

 App1  App2App3... AppN
   ||  |   | |
---
   |
   /dev/video0
   |
   Resizer Driver
 
 Everyone will be doing streamon, and in normal use case every application 
 must be getting buffers from
 another module (another driver, codecs, DSP, etc...) in multiple streams, 0, 
 1,2,3,4N

Right.

 Every application will start streaming with (mostly) fixed scaling factor 
 which mostly never changes.

Right. The driver can store the scaling factors and other parameters in the 
private data of each opened instance of the /dev/video0
device.

 This one video node approach is possible only with constraint that, the 
 application will always queue
 only 2 buffers with one CAPTURE and one with OUTPUT type. He has to wait till 
 first/second gets
 finished, you can't queue multiple buffers (input and output) simultaneously.

Why do you think you cannot queue multiple buffers? IMHO can perfectly queue 
more than one input buffer, then queue the same number
of output buffers and then the device will process all the buffers.

 I do agree here with you that we need to investigate on whether we really 
 have such use-case. Does it
 make sense to put such constraint on application?

What constraint?

 What is the impact? Again in case of down-scaling,
 application may want to use same buffer as input, which is easily possible 
 with single node approach.

Right. But take into account that down-scaling is the one special case in which 
the operation can be performed in-place. Usually all
other types of operations (like color space conversion or rotation) require 2 
buffers. Please note that having only one video node
would not mean that all operations must be done in-place. As Ivan stated you 
can perfectly queue 2 separate input and output buffers
into the one video node and the driver can handle this correctly.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-06 Thread Marek Szyprowski
Hello,

On Monday, October 05, 2009 10:02 PM Karicheri, Muralidharan wrote:

 There is another use case where there are two Resizer hardware working om the 
 same input frame and
 give two different output frames of different resolution. How do we handle 
 this using the one video
 device approach you
 just described here?

How the hardware is actually designed? I see two possibilities:

1.
[input buffer] --[dma engine] [resizer1] --[dma]- [mem output buffer1]
   \- [resizer2] --[dma]- [mem output buffer2]

2.
[input buffer] ---[dma engine1]- [resizer1] --[dma]- [mem output buffer1]
\-[dma engine2]- [resizer2] --[dma]- [mem output buffer2]

In the first case we would really have problems mapping it properly to video
nodes. But we should think if there are any use cases of such design? (in
terms of mem-2-mem device) I know that this Y-type design makes sense as a
part of the pipeline from a sensor or decoder device. But I cannot find any
useful use case for mem2mem version of it.

The second case is much more trivial. One can just create two separate resizer
devices (with their own nodes) or one resizer driver with two hardware
resizers underneath it. In both cases application would simply queue the input
buffer 2 times for both transactions.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-06 Thread Ivan T. Ivanov
Hi, 

On Tue, 2009-10-06 at 08:23 +0200, Marek Szyprowski wrote:
 Hello,
 
 On Monday, October 05, 2009 8:27 PM Hiremath, Vaibhav wrote:
 
[Hiremath, Vaibhav] IMO, this implementation is not streaming
   model, we are trying to fit mem-to-mem
forcefully to streaming.
  
   Why this does not fit streaming? I see no problems with streaming
   over mem2mem device with only one video node. You just queue input
   and output buffers (they are distinguished by 'type' parameter) on
   the same video node.
  
  [Hiremath, Vaibhav] Do we create separate queue of buffers based on type? I 
  think we don't.
 
 Why not? I really see no problems implementing such driver, especially if 
 this heavily increases the number of use cases where such
 device can be used.
 
  App1App2App3... AppN
|  |  |   | |
 ---
  |
  /dev/video0
  |
  Resizer Driver
  
  Everyone will be doing streamon, and in normal use case every application 
  must be getting buffers from
  another module (another driver, codecs, DSP, etc...) in multiple streams, 
  0, 1,2,3,4N
 
 Right.
 
  Every application will start streaming with (mostly) fixed scaling factor 
  which mostly never changes.
 
 Right. The driver can store the scaling factors and other parameters in the 
 private data of each opened instance of the /dev/video0
 device.
 
  This one video node approach is possible only with constraint that, the 
  application will always queue
  only 2 buffers with one CAPTURE and one with OUTPUT type. He has to wait 
  till first/second gets
  finished, you can't queue multiple buffers (input and output) 
  simultaneously.
 
 Why do you think you cannot queue multiple buffers? IMHO can perfectly queue 
 more than one input buffer, then queue the same number
 of output buffers and then the device will process all the buffers.
 
  I do agree here with you that we need to investigate on whether we really 
  have such use-case. Does it
  make sense to put such constraint on application?
 
 What constraint?
 
  What is the impact? Again in case of down-scaling,
  application may want to use same buffer as input, which is easily possible 
  with single node approach.
 
 Right. But take into account that down-scaling is the one special case in 
 which the operation can be performed in-place. Usually all
 other types of operations (like color space conversion or rotation) require 2 
 buffers. Please note that having only one video node
 would not mean that all operations must be done in-place. As Ivan stated you 
 can perfectly queue 2 separate input and output buffers
 into the one video node and the driver can handle this correctly.
 

 i agree with you Marek.

 can i made one suggestion. as we all know some hardware can do in-place
 processing. i think it will be not too bad if user put same buffer 
 as input and output, or with some spare space between start address of
 input and output. from driver point of view there is no 
 difference, it will see 2 different buffers. in this case we also 
 can save time from mapping virtual to physical addresses.

 but in general, i think separate input and output buffers 
 (even overlapped), and single device node will simplify design 
 and implementation of such drivers. Also this will be more clear 
 and easily manageable from user space point of view.

 iivanov



 Best regards
 --
 Marek Szyprowski
 Samsung Poland RD Center
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-06 Thread Hiremath, Vaibhav

 -Original Message-
 From: Marek Szyprowski [mailto:m.szyprow...@samsung.com]
 Sent: Tuesday, October 06, 2009 11:53 AM
 To: Hiremath, Vaibhav; 'Ivan T. Ivanov'; linux-media@vger.kernel.org
 Cc: kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak; Marek
 Szyprowski
 Subject: RE: Mem2Mem V4L2 devices [RFC]
 
 Hello,
 
 On Monday, October 05, 2009 8:27 PM Hiremath, Vaibhav wrote:
 
[Hiremath, Vaibhav] IMO, this implementation is not streaming
   model, we are trying to fit mem-to-mem
forcefully to streaming.
  
   Why this does not fit streaming? I see no problems with
 streaming
   over mem2mem device with only one video node. You just queue
 input
   and output buffers (they are distinguished by 'type' parameter)
 on
   the same video node.
  
  [Hiremath, Vaibhav] Do we create separate queue of buffers based
 on type? I think we don't.
 
 Why not? I really see no problems implementing such driver,
 especially if this heavily increases the number of use cases where
 such
 device can be used.
 
[Hiremath, Vaibhav] I thought of it and you are correct, it should be possible. 
I was kind of biased and thinking in only one direction. Now I don't see any 
reason why we should go for 2 device node approach. Earlier I was thinking of 2 
device nodes for 2 queues, if it is possible with one device node then I think 
we should align to single device node approach.

Do you see any issues with it?

Thanks,
Vaibhav

  App1App2App3... AppN
|  |  |   | |
 ---
  |
  /dev/video0
  |
  Resizer Driver
 
  Everyone will be doing streamon, and in normal use case every
 application must be getting buffers from
  another module (another driver, codecs, DSP, etc...) in multiple
 streams, 0, 1,2,3,4N
snip
 case in which the operation can be performed in-place. Usually all
 other types of operations (like color space conversion or rotation)
 require 2 buffers. Please note that having only one video node
 would not mean that all operations must be done in-place. As Ivan
 stated you can perfectly queue 2 separate input and output buffers
 into the one video node and the driver can handle this correctly.
 
 Best regards
 --
 Marek Szyprowski
 Samsung Poland RD Center
 

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Hiremath, Vaibhav

 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Ivan T. Ivanov
 Sent: Friday, October 02, 2009 9:55 PM
 To: Marek Szyprowski
 Cc: linux-media@vger.kernel.org; kyungmin.p...@samsung.com; Tomasz
 Fujak; Pawel Osciak
 Subject: Re: Mem2Mem V4L2 devices [RFC]
 
 
 Hi Marek,
 
 
 On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
  Hello,
 
snip

  image format and size, while the existing v4l2 ioctls would only
 refer
  to the output buffer. Frankly speaking, we don't like this idea.
 
 I think that is not unusual one video device to define that it can
 support at the same time input and output operation.
 
 Lets take as example resizer device. it is always possible that it
 inform user space application that
 
 struct v4l2_capability.capabilities ==
   (V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_VIDEO_OUTPUT)
 
 User can issue S_FMT ioctl supplying
 
 struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE
 .pix  = width x height
 
 which will instruct this device to prepare its output for this
 resolution. after that user can issue S_FMT ioctl supplying
 
 struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT
 .pix  = width x height
 
 using only these ioctls should be enough to device driver
 to know down/up scale factor required.
 
 regarding color space struct v4l2_pix_format have field
 'pixelformat'
 which can be used to define input and output buffers content.
 so using only existing ioctl's user can have working resizer device.
 
 also please note that there is VIDIOC_S_CROP which can add
 additional
 flexibility of adding cropping on input or output.
 
[Hiremath, Vaibhav] I think this makes more sense in capture pipeline, for 
example,

Sensor/decoder - previewer - resizer - /dev/videoX


 last thing which should be done is to QBUF 2 buffers and call
 STREAMON.
 
[Hiremath, Vaibhav] IMO, this implementation is not streaming model, we are 
trying to fit mem-to-mem forcefully to streaming. We have to put some 
constraints - 

- Driver will treat index 0 as input always, irrespective of number of 
buffers queued.
- Or, application should not queue more that 2 buffers.
- Multi-channel use-case

I think we have to have 2 device nodes which are capable of streaming multiple 
buffers, both are queuing the buffers. The constraint would be the buffers must 
be mapped one-to-one.

User layer library would be important here to play major role in supporting 
multi-channel feature. I think we need to do some more investigation on this.

Thanks,
Vaibhav

 i think this will simplify a lot buffer synchronization.
 
 iivanov
 
 
 
  2. Input and output in the same video node would not be compatible
 with
  the upcoming media controller, with which we will get an ability
 to
  arrange devices into a custom pipeline. Piping together two
 separate
  input-output nodes to create a new mem2mem device would be
 difficult and
  unintuitive. And that not even considering multi-output devices.
 
  My idea is to get back to the 2 video nodes per device approach
 and
  introduce a new ioctl for matching input and output instances of
 the
  same device. When such an ioctl could be called is another
 question. I
  like the idea of restricting such a call to be issued after
 opening
  video nodes and before using them. Using this ioctl, a user
 application
  would be able to match output instance to an input one, by
 matching
  their corresponding file descriptors.
 
  What do you think of such a solution?
 
  Best regards
  --
  Marek Szyprowski
  Samsung Poland RD Center
 
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-
 media in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-
 media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Marek Szyprowski
Hello,

On Friday, October 02, 2009 6:25 PM Ivan T. Ivanov wrote:

 On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
  Hello,
 
  During the V4L2 mini-summit and the Media Controller RFC discussion on
  Linux Plumbers 2009 Conference a mem2mem video device has been mentioned
  a few times (usually in a context of a 'resizer device' which might be a
  part of Camera interface pipeline or work as a standalone device). We
  are doing a research how our custom video/multimedia drivers can fit
  into the V4L2 framework. Most of our multimedia devices work in mem2mem
  mode.
 
  I did a quick research and I found that currently in the V4L2 framework
  there is no device that processes video data in a memory-to-memory
  model. In terms of V4L2 framework such device would be both video sink
  and source at the same time. The main problem is how the video nodes
  (/dev/videoX) should be assigned to such a device.
 
  The simplest way of implementing mem2mem device in v4l2 framework would
  use two video nodes (one for input and one for output). Such an idea has
  been already suggested on V4L2 mini-summit. Each DMA engine (either
  input or output) that is available in the hardware should get its own
  video node. In this approach an application can write() source image to
  for example /dev/video0 and then read the processed output from for
  example /dev/video1. Source and destination format/params/other custom
  settings also can be easily set for either source or destination node.
  Besides a single image, user applications can also process video streams
  by calling stream_on(), qbuf() + dqbuf(), stream_off() simultaneously on
  both video nodes.
 
  This approach has a limitation however. As user applications would have
  to open 2 different file descriptors to perform the processing of a
  single image, the v4l2 driver would need to match read() calls done on
  one file descriptor with write() calls from the another. The same thing
  would happen with buffers enqueued with qbuf(). In practice, this would
  result in a driver that allows only one instance of /dev/video0 as well
  as /dev/video1 opened. Otherwise, it would not be possible to track
  which opened /dev/video0 instance matches which /dev/video1 one.
 
  The real limitation of this approach is the fact, that it is hardly
  possible to implement multi-instance support and application
  multiplexing on a video device. In a typical embedded system, in
  contrast to most video-source-only or video-sink-only devices, a mem2mem
  device is very often used by more than one application at a time. Be it
  either simple one-shot single video frame processing or stream
  processing. Just consider that the 'resizer' module might be used in
  many applications for scaling bitmaps (xserver video subsystem,
  gstreamer, jpeglib, etc) only.
 
  At the first glance one might think that implementing multi-instance
  support should be done in a userspace daemon instead of mem2mem drivers.
  However I have run into problems designing such a user space daemon.
  Usually, video buffers are passed to v4l2 device as a user pointer or
  are mmaped directly from the device. The main issue that cannot be
  easily resolved is passing video buffers from the client application to
  the daemon. The daemon would queue a request on the device and return
  results back to the client application after a transaction is finished.
  Passing userspace pointers between an application and the daemon cannot
  be done, as they are two different processes. Mmap-type buffers are
  similar in this aspect - at least 2 buffer copy operations are required
  (from client application to device input buffers mmaped in daemon's
  memory and then from device output buffers to client application).
  Buffer copying and process context switches add both latency and
  additional cpu workload. In our custom drivers for mem2mem multimedia
  devices we implemented a queue shared between all instances of an opened
  mem2mem device. Each instance is assigned to an open device file
  descriptor. The queue is serviced in the device context, thus maximizing
  the device throughput. This is achieved by scheduling the next
  transaction in the driver (kernel) context. This may not even require a
  context switch at all.
 
  Do you have any ideas how would this solution fit into the current v4l2
  design?
 
  Another solution that came into my mind that would not suffer from this
  limitation is to use the same video node for both writing input buffers
  and reading output buffers (or queuing both input and output buffers).
  Such a design causes more problems with the current v4l2 design however:
 
  1. How to set different color space or size for input and output buffer
  each? It could be solved by adding a set of ioctls to get/set source
  image format and size, while the existing v4l2 ioctls would only refer
  to the output buffer. Frankly speaking, we don't like this idea.
 
 I think that is not 

RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Marek Szyprowski
Hello,

On Monday, October 05, 2009 7:43 AM Hiremath, Vaibhav wrote:

 In terms of V4L2 framework such device would be both video
  sink
  and source at the same time. The main problem is how the video nodes
  (/dev/videoX) should be assigned to such a device.
 
  The simplest way of implementing mem2mem device in v4l2 framework
  would
  use two video nodes (one for input and one for output). Such an idea
  has
  been already suggested on V4L2 mini-summit.
 [Hiremath, Vaibhav] We discussed 2 options during summit,
 
 1) Only one video device node, and configuring parameters using 
 V4L2_BUF_TYPE_VIDEO_CAPTURE for input
 parameter and V4L2_BUF_TYPE_VIDEO_OUTPUT for output parameter.
 
 2) 2 separate video device node, one with V4L2_BUF_TYPE_VIDEO_CAPTURE and 
 another with
 V4L2_BUF_TYPE_VIDEO_OUTPUT, as mentioned by you.
 
 The obvious and preferred option would be 2, because with option 1 we could 
 not able to achieve real
 streaming. And again we have to put constraint on application for fixed input 
 buffer index.

What do you mean by real streaming?

 
  This approach has a limitation however. As user applications would
  have
  to open 2 different file descriptors to perform the processing of a
  single image, the v4l2 driver would need to match read() calls done
  on
  one file descriptor with write() calls from the another. The same
  thing
  would happen with buffers enqueued with qbuf(). In practice, this
  would
  result in a driver that allows only one instance of /dev/video0 as
  well
  as /dev/video1 opened. Otherwise, it would not be possible to track
  which opened /dev/video0 instance matches which /dev/video1 one.
 
 [Hiremath, Vaibhav] Please note that we must put one limitation to 
 application that, the buffers in
 both the video nodes are mapped one-to-one. This means that,
 
 Video0 (input)Video1 (output)
 Index-0   == index-0
 Index-1   == index-1
 Index-2   == index-2
 
 Do you see any other option to this? I think this constraint is obvious from 
 application point of view
 in during streaming.

This is correct. Every application should queue a corresponding output buffer 
for each queued input buffer.
NOTE that the this while discussion is how make it possible to have 2 different 
applications running at the same time, each of them
queuing their own input and output buffers. It will look somehow like this:

Video0 (input)  Video1 (output)
App1, Index-0   == App1, index-0
App2, Index-0   == App2, index-0
App1, Index-1   == App1, index-1
App2, Index-1   == App2, index-1
App1, Index-2   == App1, index-2
App2, Index-2   == App2, index-2

Note, that the absolute order of the queue/dequeue might be different, but each 
application should get the right output buffer,
which corresponds to the queued input buffer.

 [Hiremath, Vaibhav] Initially I thought of having separate queue in driver 
 which tries to make maximum
 usage of underneath hardware. Application just will queue the buffers and 
 call streamon, driver
 internally queues it in his own queue and issues a resize operation (in this 
 case) for all the queued
 buffers, releasing one-by-one to application. We have similar implementation 
 internally, but not with
 standard V4L2 framework, it uses custom IOCTL's for everything.

This is similar to what we have currently, however we want to move all our 
custom drivers into the generic kernel frameworks.

 But when we decided to provide User Space library with media controller, I 
 thought of moving this
 burden to application layer. Application library will create an interface and 
 queue and call streamon
 for all the buffers queued.
 
 Do you see any loopholes here? Am I missing any use-case scenario?

How do you want to pass buffers from your client applications through the user 
space library to the video nodes?

  Such a design causes more problems with the current v4l2 design
  however:
 
  1. How to set different color space or size for input and output
  buffer
  each? It could be solved by adding a set of ioctls to get/set source
  image format and size, while the existing v4l2 ioctls would only
  refer
  to the output buffer. Frankly speaking, we don't like this idea.
 
  2. Input and output in the same video node would not be compatible
  with
  the upcoming media controller, with which we will get an ability to
  arrange devices into a custom pipeline. Piping together two separate
  input-output nodes to create a new mem2mem device would be difficult
  and
  unintuitive. And that not even considering multi-output devices.
 
 [Hiremath, Vaibhav] irrespective of the 2 options I mentioned before the 
 media controller will come
 into picture, either for custom parameter configuration or creating/deleting 
 links.
 
 We are only discussing about buffer queue/de-queue and input output params 
 configuration and this has
 to happen 

RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Marek Szyprowski
Hello,

On Monday, October 05, 2009 7:59 AM Hiremath, Vaibhav wrote:

 -Original Message-
 From: linux-media-ow...@vger.kernel.org 
 [mailto:linux-media-ow...@vger.kernel.org] On Behalf Of
 Hiremath, Vaibhav
 Sent: Monday, October 05, 2009 7:59 AM
 To: Ivan T. Ivanov; Marek Szyprowski
 Cc: linux-media@vger.kernel.org; kyungmin.p...@samsung.com; Tomasz Fujak; 
 Pawel Osciak
 Subject: RE: Mem2Mem V4L2 devices [RFC]
 
 
  -Original Message-
  From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
  ow...@vger.kernel.org] On Behalf Of Ivan T. Ivanov
  Sent: Friday, October 02, 2009 9:55 PM
  To: Marek Szyprowski
  Cc: linux-media@vger.kernel.org; kyungmin.p...@samsung.com; Tomasz
  Fujak; Pawel Osciak
  Subject: Re: Mem2Mem V4L2 devices [RFC]
 
 
  Hi Marek,
 
 
  On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
   Hello,
  
 snip
 
   image format and size, while the existing v4l2 ioctls would only
  refer
   to the output buffer. Frankly speaking, we don't like this idea.
 
  I think that is not unusual one video device to define that it can
  support at the same time input and output operation.
 
  Lets take as example resizer device. it is always possible that it
  inform user space application that
 
  struct v4l2_capability.capabilities ==
  (V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_VIDEO_OUTPUT)
 
  User can issue S_FMT ioctl supplying
 
  struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE
.pix  = width x height
 
  which will instruct this device to prepare its output for this
  resolution. after that user can issue S_FMT ioctl supplying
 
  struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT
.pix  = width x height
 
  using only these ioctls should be enough to device driver
  to know down/up scale factor required.
 
  regarding color space struct v4l2_pix_format have field
  'pixelformat'
  which can be used to define input and output buffers content.
  so using only existing ioctl's user can have working resizer device.
 
  also please note that there is VIDIOC_S_CROP which can add
  additional
  flexibility of adding cropping on input or output.
 
 [Hiremath, Vaibhav] I think this makes more sense in capture pipeline, for 
 example,
 
 Sensor/decoder - previewer - resizer - /dev/videoX
 

I don't get this. In strictly capture pipeline we will get one video node 
anyway. 

However the question is how we should support a bit more complicated pipeline.

Just consider a resizer module and the pipeline:

sensor/decoder -[bus]- previewer - [memory] - resizer - [memory]

([bus] means some kind of internal bus that is completely interdependent from 
the system memory)

Mapping to video nodes is not so trivial. In fact this pipeline consist of 2 
independent (sub)pipelines connected by user space
application:

sensor/decoder -[bus]- previewer - [memory] -[user application]- [memory] - 
resizer - [memory]

For further analysis it should be cut into 2 separate pipelines: 

a. sensor/decoder -[bus]- previewer - [memory]
b. [memory] - resizer - [memory]

Again, mapping the first subpipeline is trivial:

sensor/decoder -[bus]- previewer - /dev/video0

But the last, can be mapped either as:

/dev/video1 - resizer - /dev/video1
(one video node approach)

or

/dev/video1 - resizer - /dev/video2
(2 video nodes approach).


So at the end the pipeline would look like this:

sensor/decoder -[bus]- previewer - /dev/video0 -[user application]- 
/dev/video1 - resizer - /dev/video2

or 

sensor/decoder -[bus]- previewer - /dev/video0 -[user application]- 
/dev/video1 - resizer - /dev/video1

  last thing which should be done is to QBUF 2 buffers and call
  STREAMON.
 
 [Hiremath, Vaibhav] IMO, this implementation is not streaming model, we are 
 trying to fit mem-to-mem
 forcefully to streaming.

Why this does not fit streaming? I see no problems with streaming over mem2mem 
device with only one video node. You just queue input
and output buffers (they are distinguished by 'type' parameter) on the same 
video node.

 We have to put some constraints -
 
   - Driver will treat index 0 as input always, irrespective of number of 
 buffers queued.
   - Or, application should not queue more that 2 buffers.
   - Multi-channel use-case
 
 I think we have to have 2 device nodes which are capable of streaming 
 multiple buffers, both are
 queuing the buffers.

In one video node approach there can be 2 buffer queues in one video node, for 
input and output respectively.

 The constraint would be the buffers must be mapped one-to-one.

Right, each queued input buffer must have corresponding output buffer.

Best regards
--
Marek Szyprowski
Samsung Poland RD Center


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Ivan T. Ivanov

Hi, 

On Mon, 2009-10-05 at 15:54 +0200, Marek Szyprowski wrote:
 Hello,
 
 On Friday, October 02, 2009 6:25 PM Ivan T. Ivanov wrote:
 
  On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
   Hello,
  
   During the V4L2 mini-summit and the Media Controller RFC discussion on
   Linux Plumbers 2009 Conference a mem2mem video device has been mentioned
   a few times (usually in a context of a 'resizer device' which might be a
   part of Camera interface pipeline or work as a standalone device). We
   are doing a research how our custom video/multimedia drivers can fit
   into the V4L2 framework. Most of our multimedia devices work in mem2mem
   mode.
  
   I did a quick research and I found that currently in the V4L2 framework
   there is no device that processes video data in a memory-to-memory
   model. In terms of V4L2 framework such device would be both video sink
   and source at the same time. The main problem is how the video nodes
   (/dev/videoX) should be assigned to such a device.
  
   The simplest way of implementing mem2mem device in v4l2 framework would
   use two video nodes (one for input and one for output). Such an idea has
   been already suggested on V4L2 mini-summit. Each DMA engine (either
   input or output) that is available in the hardware should get its own
   video node. In this approach an application can write() source image to
   for example /dev/video0 and then read the processed output from for
   example /dev/video1. Source and destination format/params/other custom
   settings also can be easily set for either source or destination node.
   Besides a single image, user applications can also process video streams
   by calling stream_on(), qbuf() + dqbuf(), stream_off() simultaneously on
   both video nodes.
  
   This approach has a limitation however. As user applications would have
   to open 2 different file descriptors to perform the processing of a
   single image, the v4l2 driver would need to match read() calls done on
   one file descriptor with write() calls from the another. The same thing
   would happen with buffers enqueued with qbuf(). In practice, this would
   result in a driver that allows only one instance of /dev/video0 as well
   as /dev/video1 opened. Otherwise, it would not be possible to track
   which opened /dev/video0 instance matches which /dev/video1 one.
  
   The real limitation of this approach is the fact, that it is hardly
   possible to implement multi-instance support and application
   multiplexing on a video device. In a typical embedded system, in
   contrast to most video-source-only or video-sink-only devices, a mem2mem
   device is very often used by more than one application at a time. Be it
   either simple one-shot single video frame processing or stream
   processing. Just consider that the 'resizer' module might be used in
   many applications for scaling bitmaps (xserver video subsystem,
   gstreamer, jpeglib, etc) only.
  
   At the first glance one might think that implementing multi-instance
   support should be done in a userspace daemon instead of mem2mem drivers.
   However I have run into problems designing such a user space daemon.
   Usually, video buffers are passed to v4l2 device as a user pointer or
   are mmaped directly from the device. The main issue that cannot be
   easily resolved is passing video buffers from the client application to
   the daemon. The daemon would queue a request on the device and return
   results back to the client application after a transaction is finished.
   Passing userspace pointers between an application and the daemon cannot
   be done, as they are two different processes. Mmap-type buffers are
   similar in this aspect - at least 2 buffer copy operations are required
   (from client application to device input buffers mmaped in daemon's
   memory and then from device output buffers to client application).
   Buffer copying and process context switches add both latency and
   additional cpu workload. In our custom drivers for mem2mem multimedia
   devices we implemented a queue shared between all instances of an opened
   mem2mem device. Each instance is assigned to an open device file
   descriptor. The queue is serviced in the device context, thus maximizing
   the device throughput. This is achieved by scheduling the next
   transaction in the driver (kernel) context. This may not even require a
   context switch at all.
  
   Do you have any ideas how would this solution fit into the current v4l2
   design?
  
   Another solution that came into my mind that would not suffer from this
   limitation is to use the same video node for both writing input buffers
   and reading output buffers (or queuing both input and output buffers).
   Such a design causes more problems with the current v4l2 design however:
  
   1. How to set different color space or size for input and output buffer
   each? It could be solved by adding a set of ioctls to get/set source
   image format 

RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Hiremath, Vaibhav

 -Original Message-
 From: Marek Szyprowski [mailto:m.szyprow...@samsung.com]
 Sent: Monday, October 05, 2009 7:26 PM
 To: Hiremath, Vaibhav; linux-media@vger.kernel.org
 Cc: kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak; Marek
 Szyprowski
 Subject: RE: Mem2Mem V4L2 devices [RFC]
 
 Hello,
 
 On Monday, October 05, 2009 7:43 AM Hiremath, Vaibhav wrote:
 
  In terms of V4L2 framework such device would be both video
   sink
   and source at the same time. The main problem is how the video
 nodes
   (/dev/videoX) should be assigned to such a device.
  
   The simplest way of implementing mem2mem device in v4l2
 framework
   would
   use two video nodes (one for input and one for output). Such an
 idea
   has
   been already suggested on V4L2 mini-summit.
  [Hiremath, Vaibhav] We discussed 2 options during summit,
 
  1) Only one video device node, and configuring parameters using
 V4L2_BUF_TYPE_VIDEO_CAPTURE for input
  parameter and V4L2_BUF_TYPE_VIDEO_OUTPUT for output parameter.
 
  2) 2 separate video device node, one with
 V4L2_BUF_TYPE_VIDEO_CAPTURE and another with
  V4L2_BUF_TYPE_VIDEO_OUTPUT, as mentioned by you.
 
  The obvious and preferred option would be 2, because with option 1
 we could not able to achieve real
  streaming. And again we have to put constraint on application for
 fixed input buffer index.
 
 What do you mean by real streaming?
 
[Hiremath, Vaibhav] I meant, after streamon, there will be just sequence of 
queuing and de-queuing of buffers. With single node of operation, how are we 
deciding which is input buffer and which one is output? We have to assume or 
put constraint on application that the 0th index will be always input, 
irrespective of number of buffers requested. 

In normal scenario (for example in codecs), the application will open the 
device once and start pumping the buffers, driver should queue the buffers as 
and when it comes directly to driver.

 
   This approach has a limitation however. As user applications
 would
   have
   to open 2 different file descriptors to perform the processing
 of a
   single image, the v4l2 driver would need to match read() calls
 done
   on
   one file descriptor with write() calls from the another. The
 same
   thing
   would happen with buffers enqueued with qbuf(). In practice,
 this
   would
   result in a driver that allows only one instance of /dev/video0
 as
   well
   as /dev/video1 opened. Otherwise, it would not be possible to
 track
   which opened /dev/video0 instance matches which /dev/video1 one.
  
  [Hiremath, Vaibhav] Please note that we must put one limitation to
 application that, the buffers in
  both the video nodes are mapped one-to-one. This means that,
 
  Video0 (input)  Video1 (output)
  Index-0 == index-0
  Index-1 == index-1
  Index-2 == index-2
 
  Do you see any other option to this? I think this constraint is
 obvious from application point of view
  in during streaming.
 
 This is correct. Every application should queue a corresponding
 output buffer for each queued input buffer.
 NOTE that the this while discussion is how make it possible to have
 2 different applications running at the same time, each of them
 queuing their own input and output buffers. It will look somehow
 like this:
 
 Video0 (input)Video1 (output)
 App1, Index-0 == App1, index-0
 App2, Index-0 == App2, index-0
 App1, Index-1 == App1, index-1
 App2, Index-1 == App2, index-1
 App1, Index-2 == App1, index-2
 App2, Index-2 == App2, index-2
 
 Note, that the absolute order of the queue/dequeue might be
 different, but each application should get the right output buffer,
 which corresponds to the queued input buffer.
 
[Hiremath, Vaibhav] We have to create separate queues for every device open 
call. It would be difficult/complex for the driver to maintain special queue 
for request from number of applications.

  [Hiremath, Vaibhav] Initially I thought of having separate queue
 in driver which tries to make maximum
  usage of underneath hardware. Application just will queue the
 buffers and call streamon, driver
  internally queues it in his own queue and issues a resize
 operation (in this case) for all the queued
  buffers, releasing one-by-one to application. We have similar
 implementation internally, but not with
  standard V4L2 framework, it uses custom IOCTL's for everything.
 
 This is similar to what we have currently, however we want to move
 all our custom drivers into the generic kernel frameworks.
 
  But when we decided to provide User Space library with media
 controller, I thought of moving this
  burden to application layer. Application library will create an
 interface and queue and call streamon
  for all the buffers queued.
 
  Do you see any loopholes here? Am I missing any use-case scenario?
 
 How do you want to pass buffers from your client

RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Hiremath, Vaibhav
 -Original Message-
 From: Marek Szyprowski [mailto:m.szyprow...@samsung.com]
 Sent: Monday, October 05, 2009 7:26 PM
 To: Hiremath, Vaibhav; 'Ivan T. Ivanov'; linux-media@vger.kernel.org
 Cc: kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak; Marek
 Szyprowski
 Subject: RE: Mem2Mem V4L2 devices [RFC]
 
 Hello,
 
 On Monday, October 05, 2009 7:59 AM Hiremath, Vaibhav wrote:
 
  -Original Message-
  From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of
  Hiremath, Vaibhav
  Sent: Monday, October 05, 2009 7:59 AM
  To: Ivan T. Ivanov; Marek Szyprowski
  Cc: linux-media@vger.kernel.org; kyungmin.p...@samsung.com; Tomasz
 Fujak; Pawel Osciak
  Subject: RE: Mem2Mem V4L2 devices [RFC]
 
 
   -Original Message-
   From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
   ow...@vger.kernel.org] On Behalf Of Ivan T. Ivanov
   Sent: Friday, October 02, 2009 9:55 PM
   To: Marek Szyprowski
   Cc: linux-media@vger.kernel.org; kyungmin.p...@samsung.com;
 Tomasz
   Fujak; Pawel Osciak
   Subject: Re: Mem2Mem V4L2 devices [RFC]
  
  
   Hi Marek,
  
  
   On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
Hello,
   
  snip
 
image format and size, while the existing v4l2 ioctls would
 only
   refer
to the output buffer. Frankly speaking, we don't like this
 idea.
  
   I think that is not unusual one video device to define that it
 can
   support at the same time input and output operation.
  
   Lets take as example resizer device. it is always possible that
 it
   inform user space application that
  
   struct v4l2_capability.capabilities ==
 (V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_VIDEO_OUTPUT)
  
   User can issue S_FMT ioctl supplying
  
   struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE
   .pix  = width x height
  
   which will instruct this device to prepare its output for this
   resolution. after that user can issue S_FMT ioctl supplying
  
   struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT
   .pix  = width x height
  
   using only these ioctls should be enough to device driver
   to know down/up scale factor required.
  
   regarding color space struct v4l2_pix_format have field
   'pixelformat'
   which can be used to define input and output buffers content.
   so using only existing ioctl's user can have working resizer
 device.
  
   also please note that there is VIDIOC_S_CROP which can add
   additional
   flexibility of adding cropping on input or output.
  
  [Hiremath, Vaibhav] I think this makes more sense in capture
 pipeline, for example,
 
  Sensor/decoder - previewer - resizer - /dev/videoX
 
 
 I don't get this. In strictly capture pipeline we will get one video
 node anyway.
 
 However the question is how we should support a bit more complicated
 pipeline.
 
 Just consider a resizer module and the pipeline:
 
 sensor/decoder -[bus]- previewer - [memory] - resizer - [memory]
 
[Hiremath, Vaibhav] For me this is not single pipeline, it has two separate 
links - 

1) sensor/decoder -[bus]- previewer - [memory]

2) [memory] - resizer - [memory]


 ([bus] means some kind of internal bus that is completely
 interdependent from the system memory)
 
 Mapping to video nodes is not so trivial. In fact this pipeline
 consist of 2 independent (sub)pipelines connected by user space
 application:
 
 sensor/decoder -[bus]- previewer - [memory] -[user application]-
 [memory] - resizer - [memory]
 
 For further analysis it should be cut into 2 separate pipelines:
 
 a. sensor/decoder -[bus]- previewer - [memory]
 b. [memory] - resizer - [memory]
 
[Hiremath, Vaibhav] Correct, I wouldn't call them as sub-pipeline. Application 
is linking them, so from driver point of view they are completely separate.

 Again, mapping the first subpipeline is trivial:
 
 sensor/decoder -[bus]- previewer - /dev/video0
 
[Hiremath, Vaibhav] Correct, it is separate streaming device.

 But the last, can be mapped either as:
 
 /dev/video1 - resizer - /dev/video1
 (one video node approach)
 
[Hiremath, Vaibhav] Please go through my last response where I have mentioned 
about buffer queuing constraints with this approach.

 or
 
 /dev/video1 - resizer - /dev/video2
 (2 video nodes approach).
 
 
 So at the end the pipeline would look like this:
 
 sensor/decoder -[bus]- previewer - /dev/video0 -[user
 application]- /dev/video1 - resizer - /dev/video2
 
 or
 
 sensor/decoder -[bus]- previewer - /dev/video0 -[user
 application]- /dev/video1 - resizer - /dev/video1
 
   last thing which should be done is to QBUF 2 buffers and call
   STREAMON.
  
  [Hiremath, Vaibhav] IMO, this implementation is not streaming
 model, we are trying to fit mem-to-mem
  forcefully to streaming.
 
 Why this does not fit streaming? I see no problems with streaming
 over mem2mem device with only one video node. You just queue input
 and output buffers (they are distinguished by 'type' parameter) on
 the same

RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Ivan T. Ivanov

Hi Vaibhav,

On Mon, 2009-10-05 at 23:57 +0530, Hiremath, Vaibhav wrote:
  -Original Message-
  From: Marek Szyprowski [mailto:m.szyprow...@samsung.com]
  Sent: Monday, October 05, 2009 7:26 PM
  To: Hiremath, Vaibhav; 'Ivan T. Ivanov'; linux-media@vger.kernel.org
  Cc: kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak; Marek
  Szyprowski
  Subject: RE: Mem2Mem V4L2 devices [RFC]
  
  Hello,
  
  On Monday, October 05, 2009 7:59 AM Hiremath, Vaibhav wrote:
  
   -Original Message-
   From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
  ow...@vger.kernel.org] On Behalf Of
   Hiremath, Vaibhav
   Sent: Monday, October 05, 2009 7:59 AM
   To: Ivan T. Ivanov; Marek Szyprowski
   Cc: linux-media@vger.kernel.org; kyungmin.p...@samsung.com; Tomasz
  Fujak; Pawel Osciak
   Subject: RE: Mem2Mem V4L2 devices [RFC]
  
  
-Original Message-
From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
ow...@vger.kernel.org] On Behalf Of Ivan T. Ivanov
Sent: Friday, October 02, 2009 9:55 PM
To: Marek Szyprowski
Cc: linux-media@vger.kernel.org; kyungmin.p...@samsung.com;
  Tomasz
Fujak; Pawel Osciak
Subject: Re: Mem2Mem V4L2 devices [RFC]
   
   
Hi Marek,
   
   
On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
 Hello,

   snip
  
 image format and size, while the existing v4l2 ioctls would
  only
refer
 to the output buffer. Frankly speaking, we don't like this
  idea.
   
I think that is not unusual one video device to define that it
  can
support at the same time input and output operation.
   
Lets take as example resizer device. it is always possible that
  it
inform user space application that
   
struct v4l2_capability.capabilities ==
(V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_VIDEO_OUTPUT)
   
User can issue S_FMT ioctl supplying
   
struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE
  .pix  = width x height
   
which will instruct this device to prepare its output for this
resolution. after that user can issue S_FMT ioctl supplying
   
struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT
  .pix  = width x height
   
using only these ioctls should be enough to device driver
to know down/up scale factor required.
   
regarding color space struct v4l2_pix_format have field
'pixelformat'
which can be used to define input and output buffers content.
so using only existing ioctl's user can have working resizer
  device.
   
also please note that there is VIDIOC_S_CROP which can add
additional
flexibility of adding cropping on input or output.
   
   [Hiremath, Vaibhav] I think this makes more sense in capture
  pipeline, for example,
  
   Sensor/decoder - previewer - resizer - /dev/videoX
  
  
  I don't get this. In strictly capture pipeline we will get one video
  node anyway.
  
  However the question is how we should support a bit more complicated
  pipeline.
  
  Just consider a resizer module and the pipeline:
  
  sensor/decoder -[bus]- previewer - [memory] - resizer - [memory]
  
 [Hiremath, Vaibhav] For me this is not single pipeline, it has two separate 
 links - 
 
 1) sensor/decoder -[bus]- previewer - [memory]
 
 2) [memory] - resizer - [memory]
 
 
  ([bus] means some kind of internal bus that is completely
  interdependent from the system memory)
  
  Mapping to video nodes is not so trivial. In fact this pipeline
  consist of 2 independent (sub)pipelines connected by user space
  application:
  
  sensor/decoder -[bus]- previewer - [memory] -[user application]-
  [memory] - resizer - [memory]
  
  For further analysis it should be cut into 2 separate pipelines:
  
  a. sensor/decoder -[bus]- previewer - [memory]
  b. [memory] - resizer - [memory]
  
 [Hiremath, Vaibhav] Correct, I wouldn't call them as sub-pipeline. 
 Application is linking them, so from driver point of view they are completely 
 separate.
 
  Again, mapping the first subpipeline is trivial:
  
  sensor/decoder -[bus]- previewer - /dev/video0
  
 [Hiremath, Vaibhav] Correct, it is separate streaming device.
 
  But the last, can be mapped either as:
  
  /dev/video1 - resizer - /dev/video1
  (one video node approach)
  
 [Hiremath, Vaibhav] Please go through my last response where I have mentioned 
 about buffer queuing constraints with this approach.
 
  or
  
  /dev/video1 - resizer - /dev/video2
  (2 video nodes approach).
  
  
  So at the end the pipeline would look like this:
  
  sensor/decoder -[bus]- previewer - /dev/video0 -[user
  application]- /dev/video1 - resizer - /dev/video2
  
  or
  
  sensor/decoder -[bus]- previewer - /dev/video0 -[user
  application]- /dev/video1 - resizer - /dev/video1
  
last thing which should be done is to QBUF 2 buffers and call
STREAMON.
   
   [Hiremath, Vaibhav] IMO, this implementation is not streaming
  model, we are trying to fit mem-to-mem

RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Hiremath, Vaibhav

 -Original Message-
 From: Ivan T. Ivanov [mailto:iiva...@mm-sol.com]
 Sent: Tuesday, October 06, 2009 12:27 AM
 To: Hiremath, Vaibhav
 Cc: Marek Szyprowski; linux-media@vger.kernel.org;
 kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak
 Subject: RE: Mem2Mem V4L2 devices [RFC]
 
 
snip
 last thing which should be done is to QBUF 2 buffers and
 call
 STREAMON.

[Hiremath, Vaibhav] IMO, this implementation is not streaming
   model, we are trying to fit mem-to-mem
forcefully to streaming.
  
   Why this does not fit streaming? I see no problems with
 streaming
   over mem2mem device with only one video node. You just queue
 input
   and output buffers (they are distinguished by 'type' parameter)
 on
   the same video node.
  
  [Hiremath, Vaibhav] Do we create separate queue of buffers based
 on type? I think we don't.
 
  App1App2App3... AppN
|  |  |   | |
 ---
  |
  /dev/video0
  |
  Resizer Driver
 
  why not? they can be per file handler input/output queue. and we
  can do time sharing use of resizer driver like Marek suggests.
 
[Hiremath, Vaibhav] Ivan,
File handle based queue and buffer type based queue are two different terms. 

Yes, definitely we have to create separate queues for each file handle to 
support multiple channels. But my question was for buffer type, CAPTURE and 
OUTPUT.

Thanks,
Vaibhav

 
 
  Everyone will be doing streamon, and in normal use case every
 application must be getting buffers from another module (another
 driver, codecs, DSP, etc...) in multiple streams, 0, 1,2,3,4N
 
  Every application will start streaming with (mostly) fixed scaling
 factor which mostly never changes. This one video node approach is
 possible only with constraint that, the application will always
 queue only 2 buffers with one CAPTURE and one with OUTPUT type.
 
 i don't see how 2 device node approach can help with this case.
 even in normal video capture device you should stop streaming
 when change buffer sizes.
 
  He has to wait till first/second gets finished, you can't queue
 multiple buffers (input and output) simultaneously.
 
 actually this should be possible.
 
 iivanov
 
 
  I do agree here with you that we need to investigate on whether we
 really have such use-case. Does it make sense to put such constraint
 on application? What is the impact? Again in case of down-scaling,
 application may want to use same buffer as input, which is easily
 possible with single node approach.
 
  Thanks,
  Vaibhav
 
We have to put some constraints -
   
- Driver will treat index 0 as input always,
 irrespective of
   number of buffers queued.
- Or, application should not queue more that 2 buffers.
- Multi-channel use-case
   
I think we have to have 2 device nodes which are capable of
   streaming multiple buffers, both are
queuing the buffers.
  
   In one video node approach there can be 2 buffer queues in one
 video
   node, for input and output respectively.
  
The constraint would be the buffers must be mapped one-to-one.
  
   Right, each queued input buffer must have corresponding output
   buffer.
  
   Best regards
   --
   Marek Szyprowski
   Samsung Poland RD Center
  
  
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Ivan T. Ivanov
On Tue, 2009-10-06 at 00:31 +0530, Hiremath, Vaibhav wrote:
  -Original Message-
  From: Ivan T. Ivanov [mailto:iiva...@mm-sol.com]
  Sent: Tuesday, October 06, 2009 12:27 AM
  To: Hiremath, Vaibhav
  Cc: Marek Szyprowski; linux-media@vger.kernel.org;
  kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak
  Subject: RE: Mem2Mem V4L2 devices [RFC]
  
  
 snip
  last thing which should be done is to QBUF 2 buffers and
  call
  STREAMON.
 
 [Hiremath, Vaibhav] IMO, this implementation is not streaming
model, we are trying to fit mem-to-mem
 forcefully to streaming.
   
Why this does not fit streaming? I see no problems with
  streaming
over mem2mem device with only one video node. You just queue
  input
and output buffers (they are distinguished by 'type' parameter)
  on
the same video node.
   
   [Hiremath, Vaibhav] Do we create separate queue of buffers based
  on type? I think we don't.
  
   App1  App2App3... AppN
 ||  |   | |
  ---
 |
 /dev/video0
 |
 Resizer Driver
  
   why not? they can be per file handler input/output queue. and we
   can do time sharing use of resizer driver like Marek suggests.
  
 [Hiremath, Vaibhav] Ivan,
 File handle based queue and buffer type based queue are two different terms. 

really? ;)

 
 Yes, definitely we have to create separate queues for each file handle to 
 support multiple channels. But my question was for buffer type, CAPTURE and 
 OUTPUT.
 

let me see. you concern is that for very big frames 1X Mpix, managing
separate buffers for input and output will be waste of space
for operations like downs calling. i know that such operations can be
done in-place ;). but what about up-scaling. this also should 
be possible, but with some very dirty hacks.

iivanov

 Thanks,
 Vaibhav
 
  
  
   Everyone will be doing streamon, and in normal use case every
  application must be getting buffers from another module (another
  driver, codecs, DSP, etc...) in multiple streams, 0, 1,2,3,4N
  
   Every application will start streaming with (mostly) fixed scaling
  factor which mostly never changes. This one video node approach is
  possible only with constraint that, the application will always
  queue only 2 buffers with one CAPTURE and one with OUTPUT type.
  
  i don't see how 2 device node approach can help with this case.
  even in normal video capture device you should stop streaming
  when change buffer sizes.
  
   He has to wait till first/second gets finished, you can't queue
  multiple buffers (input and output) simultaneously.
  
  actually this should be possible.
  
  iivanov
  
  
   I do agree here with you that we need to investigate on whether we
  really have such use-case. Does it make sense to put such constraint
  on application? What is the impact? Again in case of down-scaling,
  application may want to use same buffer as input, which is easily
  possible with single node approach.
  
   Thanks,
   Vaibhav
  
 We have to put some constraints -

   - Driver will treat index 0 as input always,
  irrespective of
number of buffers queued.
   - Or, application should not queue more that 2 buffers.
   - Multi-channel use-case

 I think we have to have 2 device nodes which are capable of
streaming multiple buffers, both are
 queuing the buffers.
   
In one video node approach there can be 2 buffer queues in one
  video
node, for input and output respectively.
   
 The constraint would be the buffers must be mapped one-to-one.
   
Right, each queued input buffer must have corresponding output
buffer.
   
Best regards
--
Marek Szyprowski
Samsung Poland RD Center
   
   
  
  
 

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Hiremath, Vaibhav


Thanks,
Vaibhav Hiremath
Platform Support Products
Texas Instruments Inc
Ph: +91-80-25099927

 -Original Message-
 From: Ivan T. Ivanov [mailto:iiva...@mm-sol.com]
 Sent: Tuesday, October 06, 2009 12:39 AM
 To: Hiremath, Vaibhav
 Cc: Marek Szyprowski; linux-media@vger.kernel.org;
 kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak
 Subject: RE: Mem2Mem V4L2 devices [RFC]
 
 On Tue, 2009-10-06 at 00:31 +0530, Hiremath, Vaibhav wrote:
   -Original Message-
   From: Ivan T. Ivanov [mailto:iiva...@mm-sol.com]
   Sent: Tuesday, October 06, 2009 12:27 AM
   To: Hiremath, Vaibhav
   Cc: Marek Szyprowski; linux-media@vger.kernel.org;
   kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak
   Subject: RE: Mem2Mem V4L2 devices [RFC]
  
  
  snip
   last thing which should be done is to QBUF 2 buffers and
   call
   STREAMON.
  
  [Hiremath, Vaibhav] IMO, this implementation is not
 streaming
 model, we are trying to fit mem-to-mem
  forcefully to streaming.

 Why this does not fit streaming? I see no problems with
   streaming
 over mem2mem device with only one video node. You just queue
   input
 and output buffers (they are distinguished by 'type'
 parameter)
   on
 the same video node.

[Hiremath, Vaibhav] Do we create separate queue of buffers
 based
   on type? I think we don't.
   
App1App2App3... AppN
  |  |  |   | |
   ---
|
/dev/video0
|
Resizer Driver
  
why not? they can be per file handler input/output queue. and
 we
can do time sharing use of resizer driver like Marek suggests.
  
  [Hiremath, Vaibhav] Ivan,
  File handle based queue and buffer type based queue are two
 different terms.
 
 really? ;)
 
 
  Yes, definitely we have to create separate queues for each file
 handle to support multiple channels. But my question was for buffer
 type, CAPTURE and OUTPUT.
 
 
 let me see. you concern is that for very big frames 1X Mpix,
 managing
 separate buffers for input and output will be waste of space
 for operations like downs calling. i know that such operations can
 be
 done in-place ;). but what about up-scaling. this also should
 be possible, but with some very dirty hacks.
 
[Hiremath, Vaibhav] Dirty hacks??? 
I think, for upscaling we have to have 2 separate buffers, I do not see any 
options here.

Thanks,
Vaibhav

 iivanov
 
  Thanks,
  Vaibhav
 
  
   
Everyone will be doing streamon, and in normal use case every
   application must be getting buffers from another module (another
   driver, codecs, DSP, etc...) in multiple streams, 0,
 1,2,3,4N
   
Every application will start streaming with (mostly) fixed
 scaling
   factor which mostly never changes. This one video node approach
 is
   possible only with constraint that, the application will always
   queue only 2 buffers with one CAPTURE and one with OUTPUT type.
  
   i don't see how 2 device node approach can help with this case.
   even in normal video capture device you should stop streaming
   when change buffer sizes.
  
He has to wait till first/second gets finished, you can't
 queue
   multiple buffers (input and output) simultaneously.
  
   actually this should be possible.
  
   iivanov
  
   
I do agree here with you that we need to investigate on
 whether we
   really have such use-case. Does it make sense to put such
 constraint
   on application? What is the impact? Again in case of down-
 scaling,
   application may want to use same buffer as input, which is
 easily
   possible with single node approach.
   
Thanks,
Vaibhav
   
  We have to put some constraints -
 
  - Driver will treat index 0 as input always,
   irrespective of
 number of buffers queued.
  - Or, application should not queue more that 2 buffers.
  - Multi-channel use-case
 
  I think we have to have 2 device nodes which are capable
 of
 streaming multiple buffers, both are
  queuing the buffers.

 In one video node approach there can be 2 buffer queues in
 one
   video
 node, for input and output respectively.

  The constraint would be the buffers must be mapped one-to-
 one.

 Right, each queued input buffer must have corresponding
 output
 buffer.

 Best regards
 --
 Marek Szyprowski
 Samsung Poland RD Center


   
  
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Karicheri, Muralidharan



 1. How to set different color space or size for input and output buffer
 each? It could be solved by adding a set of ioctls to get/set source
 image format and size, while the existing v4l2 ioctls would only refer
 to the output buffer. Frankly speaking, we don't like this idea.

I think that is not unusual one video device to define that it can
support at the same time input and output operation.

Lets take as example resizer device. it is always possible that it
inform user space application that

struct v4l2_capability.capabilities ==
   (V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_VIDEO_OUTPUT)

User can issue S_FMT ioctl supplying

struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE
 .pix  = width x height

which will instruct this device to prepare its output for this
resolution. after that user can issue S_FMT ioctl supplying

struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT
 .pix  = width x height

using only these ioctls should be enough to device driver
to know down/up scale factor required.

regarding color space struct v4l2_pix_format have field 'pixelformat'
which can be used to define input and output buffers content.
so using only existing ioctl's user can have working resizer device.

also please note that there is VIDIOC_S_CROP which can add additional
flexibility of adding cropping on input or output.

last thing which should be done is to QBUF 2 buffers and call STREAMON.

i think this will simplify a lot buffer synchronization.


Ivan,

There is another use case where there are two Resizer hardware working on the 
same input frame and give two different output frames of different resolution. 
How do we handle this using the one video device approach you
just described here?

Murali
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Ivan T. Ivanov
Hi, 


On Mon, 2009-10-05 at 15:02 -0500, Karicheri, Muralidharan wrote:
 
 
  1. How to set different color space or size for input and output buffer
  each? It could be solved by adding a set of ioctls to get/set source
  image format and size, while the existing v4l2 ioctls would only refer
  to the output buffer. Frankly speaking, we don't like this idea.
 
 I think that is not unusual one video device to define that it can
 support at the same time input and output operation.
 
 Lets take as example resizer device. it is always possible that it
 inform user space application that
 
 struct v4l2_capability.capabilities ==
  (V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_VIDEO_OUTPUT)
 
 User can issue S_FMT ioctl supplying
 
 struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE
.pix  = width x height
 
 which will instruct this device to prepare its output for this
 resolution. after that user can issue S_FMT ioctl supplying
 
 struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT
.pix  = width x height
 
 using only these ioctls should be enough to device driver
 to know down/up scale factor required.
 
 regarding color space struct v4l2_pix_format have field 'pixelformat'
 which can be used to define input and output buffers content.
 so using only existing ioctl's user can have working resizer device.
 
 also please note that there is VIDIOC_S_CROP which can add additional
 flexibility of adding cropping on input or output.
 
 last thing which should be done is to QBUF 2 buffers and call STREAMON.
 
 i think this will simplify a lot buffer synchronization.
 
 
 Ivan,
 
 There is another use case where there are two Resizer hardware working on the 
 same input frame and give two different output frames of different 
 resolution. How do we handle this using the one video device approach you
 just described here?

 what is the difference?
 
- you can have only one resizer device driver which will hide that 
  they are actually 2 hardware resizers. just operations will be
  faster ;).

- they are two device drivers (nodes) with similar characteristics.

in both cases input buffer can be the same. 

iivanov



 
 Murali

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-05 Thread Karicheri, Muralidharan


 Ivan,

 There is another use case where there are two Resizer hardware working on
the same input frame and give two different output frames of different
resolution. How do we handle this using the one video device approach you
 just described here?

 what is the difference?

- you can have only one resizer device driver which will hide that
  they are actually 2 hardware resizers. just operations will be
  faster ;).


In your implementation as mentioned above, there will be one queue for the 
OUTPUT buffer type and another queue for the CAPTURE buffer type right?
So if we have two Resizer outputs, then we would need two queues of the CAPTURE 
buffer type. When application calls QBUF, on the node, which queue will be used 
for the buffer? So this makes me believe we need to two capture nodes and one 
output node for this driver. 

- they are two device drivers (nodes) with similar characteristics.

in both cases input buffer can be the same.

iivanov




 Murali


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Mem2Mem V4L2 devices [RFC]

2009-10-04 Thread Hiremath, Vaibhav

 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Marek Szyprowski
 Sent: Friday, October 02, 2009 5:15 PM
 To: linux-media@vger.kernel.org
 Cc: kyungmin.p...@samsung.com; Tomasz Fujak; Pawel Osciak; Marek
 Szyprowski
 Subject: Mem2Mem V4L2 devices [RFC]
 
 Hello,
 
 During the V4L2 mini-summit and the Media Controller RFC discussion
 on
 Linux Plumbers 2009 Conference a mem2mem video device has been
 mentioned
 a few times (usually in a context of a 'resizer device' which might
 be a
 part of Camera interface pipeline or work as a standalone device).
 We
 are doing a research how our custom video/multimedia drivers can fit
 into the V4L2 framework. Most of our multimedia devices work in
 mem2mem
 mode.
 
 I did a quick research and I found that currently in the V4L2
 framework
 there is no device that processes video data in a memory-to-memory
 model. 
[Hiremath, Vaibhav] yes you are right; we do not have readily support available 
in V4L2 framework.

In terms of V4L2 framework such device would be both video
 sink
 and source at the same time. The main problem is how the video nodes
 (/dev/videoX) should be assigned to such a device.
 
 The simplest way of implementing mem2mem device in v4l2 framework
 would
 use two video nodes (one for input and one for output). Such an idea
 has
 been already suggested on V4L2 mini-summit. 
[Hiremath, Vaibhav] We discussed 2 options during summit, 

1) Only one video device node, and configuring parameters using 
V4L2_BUF_TYPE_VIDEO_CAPTURE for input parameter and V4L2_BUF_TYPE_VIDEO_OUTPUT 
for output parameter.

2) 2 separate video device node, one with V4L2_BUF_TYPE_VIDEO_CAPTURE and 
another with V4L2_BUF_TYPE_VIDEO_OUTPUT, as mentioned by you.

The obvious and preferred option would be 2, because with option 1 we could not 
able to achieve real streaming. And again we have to put constraint on 
application for fixed input buffer index.

 Each DMA engine (either
 input or output) that is available in the hardware should get its
 own
 video node. In this approach an application can write() source image
 to
 for example /dev/video0 and then read the processed output from for
 example /dev/video1. Source and destination format/params/other
 custom
 settings also can be easily set for either source or destination
 node.
 Besides a single image, user applications can also process video
 streams
 by calling stream_on(), qbuf() + dqbuf(), stream_off()
 simultaneously on
 both video nodes.
 
[Hiremath, Vaibhav] Correct.

 This approach has a limitation however. As user applications would
 have
 to open 2 different file descriptors to perform the processing of a
 single image, the v4l2 driver would need to match read() calls done
 on
 one file descriptor with write() calls from the another. The same
 thing
 would happen with buffers enqueued with qbuf(). In practice, this
 would
 result in a driver that allows only one instance of /dev/video0 as
 well
 as /dev/video1 opened. Otherwise, it would not be possible to track
 which opened /dev/video0 instance matches which /dev/video1 one.
 
[Hiremath, Vaibhav] Please note that we must put one limitation to application 
that, the buffers in both the video nodes are mapped one-to-one. This means 
that, 

Video0 (input)  Video1 (output)
Index-0 == index-0
Index-1 == index-1
Index-2 == index-2

Do you see any other option to this? I think this constraint is obvious from 
application point of view in during streaming.

 The real limitation of this approach is the fact, that it is hardly
 possible to implement multi-instance support and application
 multiplexing on a video device. In a typical embedded system, in
 contrast to most video-source-only or video-sink-only devices, a
 mem2mem
 device is very often used by more than one application at a time. Be
 it
 either simple one-shot single video frame processing or stream
 processing. Just consider that the 'resizer' module might be used in
 many applications for scaling bitmaps (xserver video subsystem,
 gstreamer, jpeglib, etc) only.
 
[Hiremath, Vaibhav] Correct.

 At the first glance one might think that implementing multi-instance
 support should be done in a userspace daemon instead of mem2mem
 drivers.
 However I have run into problems designing such a user space daemon.
 Usually, video buffers are passed to v4l2 device as a user pointer
 or
 are mmaped directly from the device. The main issue that cannot be
 easily resolved is passing video buffers from the client application
 to
 the daemon. The daemon would queue a request on the device and
 return
 results back to the client application after a transaction is
 finished.
 Passing userspace pointers between an application and the daemon
 cannot
 be done, as they are two different processes. Mmap-type buffers are
 similar in this aspect - at least 2 buffer copy operations are
 required
 (from client 

Re: Mem2Mem V4L2 devices [RFC]

2009-10-02 Thread Ivan T. Ivanov

Hi Marek, 


On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
 Hello,
 
 During the V4L2 mini-summit and the Media Controller RFC discussion on 
 Linux Plumbers 2009 Conference a mem2mem video device has been mentioned 
 a few times (usually in a context of a 'resizer device' which might be a 
 part of Camera interface pipeline or work as a standalone device). We 
 are doing a research how our custom video/multimedia drivers can fit 
 into the V4L2 framework. Most of our multimedia devices work in mem2mem 
 mode. 
 
 I did a quick research and I found that currently in the V4L2 framework 
 there is no device that processes video data in a memory-to-memory 
 model. In terms of V4L2 framework such device would be both video sink 
 and source at the same time. The main problem is how the video nodes 
 (/dev/videoX) should be assigned to such a device. 
 
 The simplest way of implementing mem2mem device in v4l2 framework would 
 use two video nodes (one for input and one for output). Such an idea has 
 been already suggested on V4L2 mini-summit. Each DMA engine (either 
 input or output) that is available in the hardware should get its own 
 video node. In this approach an application can write() source image to 
 for example /dev/video0 and then read the processed output from for 
 example /dev/video1. Source and destination format/params/other custom 
 settings also can be easily set for either source or destination node. 
 Besides a single image, user applications can also process video streams 
 by calling stream_on(), qbuf() + dqbuf(), stream_off() simultaneously on 
 both video nodes. 
 
 This approach has a limitation however. As user applications would have 
 to open 2 different file descriptors to perform the processing of a 
 single image, the v4l2 driver would need to match read() calls done on 
 one file descriptor with write() calls from the another. The same thing 
 would happen with buffers enqueued with qbuf(). In practice, this would 
 result in a driver that allows only one instance of /dev/video0 as well 
 as /dev/video1 opened. Otherwise, it would not be possible to track 
 which opened /dev/video0 instance matches which /dev/video1 one. 
 
 The real limitation of this approach is the fact, that it is hardly 
 possible to implement multi-instance support and application 
 multiplexing on a video device. In a typical embedded system, in 
 contrast to most video-source-only or video-sink-only devices, a mem2mem 
 device is very often used by more than one application at a time. Be it 
 either simple one-shot single video frame processing or stream 
 processing. Just consider that the 'resizer' module might be used in 
 many applications for scaling bitmaps (xserver video subsystem, 
 gstreamer, jpeglib, etc) only. 
 
 At the first glance one might think that implementing multi-instance 
 support should be done in a userspace daemon instead of mem2mem drivers. 
 However I have run into problems designing such a user space daemon. 
 Usually, video buffers are passed to v4l2 device as a user pointer or 
 are mmaped directly from the device. The main issue that cannot be 
 easily resolved is passing video buffers from the client application to 
 the daemon. The daemon would queue a request on the device and return 
 results back to the client application after a transaction is finished. 
 Passing userspace pointers between an application and the daemon cannot 
 be done, as they are two different processes. Mmap-type buffers are 
 similar in this aspect - at least 2 buffer copy operations are required 
 (from client application to device input buffers mmaped in daemon's 
 memory and then from device output buffers to client application). 
 Buffer copying and process context switches add both latency and 
 additional cpu workload. In our custom drivers for mem2mem multimedia 
 devices we implemented a queue shared between all instances of an opened 
 mem2mem device. Each instance is assigned to an open device file 
 descriptor. The queue is serviced in the device context, thus maximizing 
 the device throughput. This is achieved by scheduling the next 
 transaction in the driver (kernel) context. This may not even require a 
 context switch at all. 
 
 Do you have any ideas how would this solution fit into the current v4l2 
 design? 
 
 Another solution that came into my mind that would not suffer from this 
 limitation is to use the same video node for both writing input buffers 
 and reading output buffers (or queuing both input and output buffers). 
 Such a design causes more problems with the current v4l2 design however: 
 
 1. How to set different color space or size for input and output buffer 
 each? It could be solved by adding a set of ioctls to get/set source 
 image format and size, while the existing v4l2 ioctls would only refer 
 to the output buffer. Frankly speaking, we don't like this idea. 

I think that is not unusual one video device to define that it can
support at