Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

2012-06-09 Thread Clark, Rob
On Fri, Jun 8, 2012 at 3:56 PM, Erik Gilling konk...@android.com wrote:
 I guess my other thought is that implicit vs explicit is not
 mutually exclusive, though I'd guess there'd be interesting
 deadlocks to have to debug if both were in use _at the same
 time_. :-)

 I think this is an approach worth investigating.  I'd like a way to
 either opt out of implicit sync or have a way to check if a dma-buf
 has an attached fence and detach it.  Actually, that could work really
 well. Consider:

 * Each dma_buf has a single fence slot
 * on submission
   * the driver will extract the fence from the dma_buf and queue a wait on it.
   * the driver will replace that fence with it's own complettion
 fence before the job submission ioctl returns.
 * dma_buf will have two userspace ioctls:
   * DETACH: will return the fence as an FD to userspace and clear the
 fence slot in the dma_buf
   * ATTACH: takes a fence FD from userspace and attaches it to the
 dma_buf fence slot.  Returns an error if the fence slot is non-empty.

 In the android case, we can do a detach after every submission and an
 attach right before.

btw, I like this idea for implicit and explicit sync to coexist

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 3/3] dma_buf: Add documentation for the new cpu access support

2012-03-05 Thread Clark, Rob
On Fri, Mar 2, 2012 at 6:23 PM, Sakari Ailus sakari.ai...@iki.fi wrote:
 Hi Daniel,

 Thanks for the patch.

 On Thu, Mar 01, 2012 at 04:36:01PM +0100, Daniel Vetter wrote:
 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
 ---
  Documentation/dma-buf-sharing.txt |  102 
 +++-
  1 files changed, 99 insertions(+), 3 deletions(-)

 diff --git a/Documentation/dma-buf-sharing.txt 
 b/Documentation/dma-buf-sharing.txt
 index 225f96d..f12542b 100644
 --- a/Documentation/dma-buf-sharing.txt
 +++ b/Documentation/dma-buf-sharing.txt
 @@ -32,8 +32,12 @@ The buffer-user
  *IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details]
  For this first version, A buffer shared using the dma_buf sharing API:
  - *may* be exported to user space using mmap *ONLY* by exporter, outside 
 of
 -   this framework.
 -- may be used *ONLY* by importers that do not need CPU access to the buffer.
 +  this framework.
 +- with this new iteration of the dma-buf api cpu access from the kernel has 
 been
 +  enable, see below for the details.
 +
 +dma-buf operations for device dma only
 +--

  The dma_buf buffer sharing API usage contains the following steps:

 @@ -219,7 +223,99 @@ NOTES:
     If the exporter chooses not to allow an attach() operation once a
     map_dma_buf() API has been called, it simply returns an error.

 -Miscellaneous notes:
 +Kernel cpu access to a dma-buf buffer object
 +
 +
 +The motivation to allow cpu access from the kernel to a dma-buf object from 
 the
 +importers side are:
 +- fallback operations, e.g. if the devices is connected to a usb bus and the
 +  kernel needs to shuffle the data around first before sending it away.
 +- full transperancy for existing users on the importer side, i.e. userspace
 +  should not notice the difference between a normal object from that 
 subsystem
 +  and an imported one backed by a dma-buf. This is really important for drm
 +  opengl drivers that expect to still use all the existing upload/download
 +  paths.
 +
 +Access to a dma_buf from the kernel context involves three steps:
 +
 +1. Prepare access, which invalidate any necessary caches and make the object
 +   available for cpu access.
 +2. Access the object page-by-page with the dma_buf map apis
 +3. Finish access, which will flush any necessary cpu caches and free 
 reserved
 +   resources.

 Where it should be decided which operations are being done to the buffer
 when it is passed to user space and back to kernel space?

 How about spliting these operations to those done on the first time the
 buffer is passed to the user space (mapping to kernel address space, for
 example) and those required every time buffer is passed from kernel to user
 and back (cache flusing)?

 I'm asking since any unnecessary time-consuming operations, especially as
 heavy as mapping the buffer, should be avoidable in subsystems dealing
 with streaming video, cameras etc., i.e. non-GPU users.


Well, this is really something for the buffer exporter to deal with..
since there is no way for an importer to create a userspace mmap'ing
of the buffer.  A lot of these expensive operations go away if you
don't even create a userspace virtual mapping in the first place ;-)

BR,
-R


 +1. Prepare acces
 +
 +   Before an importer can acces a dma_buf object with the cpu from the 
 kernel
 +   context, it needs to notice the exporter of the access that is about to
 +   happen.
 +
 +   Interface:
 +      int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 +                                size_t start, size_t len,
 +                                enum dma_data_direction direction)
 +
 +   This allows the exporter to ensure that the memory is actually available 
 for
 +   cpu access - the exporter might need to allocate or swap-in and pin the
 +   backing storage. The exporter also needs to ensure that cpu access is
 +   coherent for the given range and access direction. The range and access
 +   direction can be used by the exporter to optimize the cache flushing, 
 i.e.
 +   access outside of the range or with a different direction (read instead 
 of
 +   write) might return stale or even bogus data (e.g. when the exporter 
 needs to
 +   copy the data to temporaray storage).
 +
 +   This step might fail, e.g. in oom conditions.
 +
 +2. Accessing the buffer
 +
 +   To support dma_buf objects residing in highmem cpu access is page-based 
 using
 +   an api similar to kmap. Accessing a dma_buf is done in aligned chunks of
 +   PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which 
 returns
 +   a pointer in kernel virtual address space. Afterwards the chunk needs to 
 be
 +   unmapped again. There is no limit on how often a given chunk can be 
 mapped
 +   and unmmapped, i.e. the importer does not need to call begin_cpu_access 
 again
 +   before mapping the same chunk again.
 +
 +   Interfaces:
 +      void 

Re: Kernel Display and Video API Consolidation mini-summit at ELC 2012 - Notes

2012-02-22 Thread Clark, Rob
On Wed, Feb 22, 2012 at 10:36 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 On Wed, 22 Feb 2012 17:24:24 +0100, Daniel Vetter dan...@ffwll.ch wrote:
 On Wed, Feb 22, 2012 at 04:03:21PM +, James Simmons wrote:
  Fbcon scrolling at be painful at HD or better modes. Fbcon needs 3
  possible accels; copyarea, imageblit, and fillrect. The first two could be
  hooked from the TTM layer. Its something I plan to experiment to see if
  its worth it.

 Let's bite into this ;-) I know that fbcon scrolling totally sucks on big
 screens, but I also think it's a total waste of time to fix this. Imo
 fbcon has 2 use-cases:
 - display an OOSP.
 - allow me to run fsck (or any other desaster-recovery stuff).
 3. Show panics.

 Ensuring that nothing prevents the switch to fbcon and displaying the
 panic message is the reason why we haven't felt inclined to accelerate
 fbcon - it just gets messy for no real gain.

and when doing 2d accel on a 3d core..  it basically amounts to
putting a shader compiler in the kernel.   Wh!

 For example: https://bugs.freedesktop.org/attachment.cgi?id=48933
 which doesn't handle flushing of pending updates via the GPU when
 writing with the CPU during interrupts (i.e. a panic).
 -Chris

 --
 Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel Display and Video API Consolidation mini-summit at ELC 2012 - Notes

2012-02-18 Thread Clark, Rob
On Fri, Feb 17, 2012 at 1:42 PM, Adam Jackson a...@redhat.com wrote:
 On 2/16/12 6:25 PM, Laurent Pinchart wrote:

   Helper functions will be implemented in the subsystems to convert
 between
   that generic structure and the various subsystem-specific structures.


 I guess.  I don't really see a reason not to unify the structs too, but then
 I don't have binary blobs to pretend to be ABI-compatible with.


this is just for where timing struct is exposed to userspace

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)

2012-02-05 Thread Clark, Rob
On Sat, Feb 4, 2012 at 5:43 AM, Sakari Ailus sakari.ai...@iki.fi wrote:
 Hi Rob,

 Clark, Rob wrote:
 On Mon, Jan 30, 2012 at 4:01 PM, Sakari Ailus sakari.ai...@iki.fi wrote:

 So to summarize I understand your constraints - gpu drivers have worked
 like v4l a few years ago. The thing I'm trying to achieve with this
 constant yelling is just to raise awereness for these issues so that
 people aren't suprised when drm starts pulling tricks on dma_bufs.

 I think we should be able to mark dma_bufs non-relocatable so also DRM can
 work with these buffers. Or alternatively, as Laurent proposed, V4L2 be
 prepared for moving the buffers around. Are there other reasons to do so
 than paging them out of system memory to make room for something else?

 fwiw, from GPU perspective, the DRM device wouldn't be actively
 relocating buffers just for the fun of it.  I think it is more that we
 want to give the GPU driver the flexibility to relocate when it really
 needs to.  For example, maybe user has camera app running, then puts
 it in the background and opens firefox which tries to allocate a big
 set of pixmaps putting pressure on GPU memory..

 I guess the root issue is who is doing the IOMMU programming for the
 camera driver.  I guess if this is something built in to the camera
 driver then when it calls dma_buf_map() it probably wants some hint
 that the backing pages haven't moved so in the common case (ie. buffer
 hasn't moved) it doesn't have to do anything expensive.

 On omap4 v4l2+drm example I have running, it is actually the DRM
 driver doing the IOMMU programming.. so v4l2 camera really doesn't
 need to care about it.  (And the IOMMU programming here is pretty

 This part sounds odd to me. Well, I guess it _could_ be done that way,
 but the ISP IOMMU could be as well different as the one in DRM. That's
 the case on OMAP 3, for example.

Yes, this is a difference between OMAP4 and OMAP3..  although I think
the intention is that OMAP3 type scenarios, if the IOMMU mapping was
done through the dma mapping API, then it could still be done (and
cached) by the exporter.

 fast.)  But I suppose this maybe doesn't represent all cases.  I
 suppose if a camera didn't really sit behind an IOMMU but uses
 something more like a DMA descriptor list would want to know if it
 needed to regenerate it's descriptor list.  Or likewise if camera has
 an IOMMU that isn't really using the IOMMU framework (although maybe
 that is easier to solve).  But I think a hint returned from
 dma_buf_map() would do the job?

 An alternative to IOMMU I think in practice would mean CMA-allocated
 buffers.

 I need to think about this a bit and understand how this would really
 work to properly comment this.

 For example, how does one mlock() something that isn't mapped to process
 memory --- think of a dma buffer not mapped to the user space process
 address space?

The scatter list that the exporter gives you should be locked/pinned
already so importer should not need to call mlock()

BR,
-R

 Cheers,

 --
 Sakari Ailus
 sakari.ai...@iki.fi
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)

2012-02-02 Thread Clark, Rob
On Thu, Feb 2, 2012 at 4:19 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 Hi Rob,

 On Tuesday 31 January 2012 16:38:35 Clark, Rob wrote:
 On Mon, Jan 30, 2012 at 4:01 PM, Sakari Ailus sakari.ai...@iki.fi wrote:
  So to summarize I understand your constraints - gpu drivers have worked
  like v4l a few years ago. The thing I'm trying to achieve with this
  constant yelling is just to raise awereness for these issues so that
  people aren't suprised when drm starts pulling tricks on dma_bufs.
 
  I think we should be able to mark dma_bufs non-relocatable so also DRM
  can work with these buffers. Or alternatively, as Laurent proposed, V4L2
  be prepared for moving the buffers around. Are there other reasons to do
  so than paging them out of system memory to make room for something
  else?

 fwiw, from GPU perspective, the DRM device wouldn't be actively
 relocating buffers just for the fun of it.  I think it is more that we
 want to give the GPU driver the flexibility to relocate when it really
 needs to.  For example, maybe user has camera app running, then puts
 it in the background and opens firefox which tries to allocate a big
 set of pixmaps putting pressure on GPU memory..

 On an embedded system putting the camera application in the background will
 usually stop streaming, so buffers will be unmapped. On other systems, or even
 on some embedded systems, that will not be the case though.

 I'm perfectly fine with relocating buffers when needed. What I want is to
 avoid unmapping and remapping them for every frame if they haven't moved. I'm
 sure we can come up with an API to handle that.

 I guess the root issue is who is doing the IOMMU programming for the camera
 driver. I guess if this is something built in to the camera driver then when
 it calls dma_buf_map() it probably wants some hint that the backing pages
 haven't moved so in the common case (ie. buffer hasn't moved) it doesn't
 have to do anything expensive.

 It will likely depend on the camera hardware. For the OMAP3 ISP, the driver
 calls the IOMMU API explictly, but if I understand it correctly there's a plan
 to move IOMMU support to the DMA API.

 On omap4 v4l2+drm example I have running, it is actually the DRM driver
 doing the IOMMU programming.. so v4l2 camera really doesn't need to care
 about it.  (And the IOMMU programming here is pretty fast.)  But I suppose
 this maybe doesn't represent all cases. I suppose if a camera didn't really
 sit behind an IOMMU but uses something more like a DMA descriptor list would
 want to know if it needed to regenerate it's descriptor list. Or likewise if
 camera has an IOMMU that isn't really using the IOMMU framework (although
 maybe that is easier to solve).  But I think a hint returned from
 dma_buf_map() would do the job?

 I see at least three possible solutions to this problem.

 1. At dma_buf_unmap() time, the exporter will tell the importer that the
 buffer will move, and that it should be unmapped from whatever the importer
 mapped it to. That's probably the easiest solution to implement on the
 importer's side, but I expect it to be difficult for the exporter to know at
 dma_buf_unmap() time if the buffer will need to be moved or not.

 2. Adding a callback to request the importer to unmap the buffer. This might
 be racy, and locking might be difficult to handle.

 3. At dma_buf_unmap() time, keep importer's mappings around. The exporter is
 then free to move the buffer if needed, in which case the mappings will be
 invalid. This shouldn't be a problem in theory, as the buffer isn't being used
 by the importer at that time, but can cause stability issues when dealing with
 rogue hardware as this would punch holes in the IOMMU fence. At dma_buf_map()
 time the exporter would tell the importer whether the buffer moved or not. If
 it moved, the importer will tear down the mappings it kept, and create new
 ones.

I was leaning towards door #3.. rogue hw is a good point, but I think
that would be an issue in general if hw kept accessing the buffer when
it wasn't supposed to.

BR,
-R

 Variations around those 3 possible solutions are possible.

 --
 Regards,

 Laurent Pinchart
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)

2012-02-02 Thread Clark, Rob
On Thu, Feb 2, 2012 at 2:23 PM, Daniel Vetter dan...@ffwll.ch wrote:
 On Thu, Feb 2, 2012 at 11:19, Laurent Pinchart 
 laurent.pinch...@ideasonboard.com wrote:
 On omap4 v4l2+drm example I have running, it is actually the DRM driver
 doing the IOMMU programming.. so v4l2 camera really doesn't need to care
 about it.  (And the IOMMU programming here is pretty fast.)  But I suppose
 this maybe doesn't represent all cases. I suppose if a camera didn't really
 sit behind an IOMMU but uses something more like a DMA descriptor list would
 want to know if it needed to regenerate it's descriptor list. Or likewise if
 camera has an IOMMU that isn't really using the IOMMU framework (although
 maybe that is easier to solve).  But I think a hint returned from
 dma_buf_map() would do the job?

 I see at least three possible solutions to this problem.

 1. At dma_buf_unmap() time, the exporter will tell the importer that the
 buffer will move, and that it should be unmapped from whatever the importer
 mapped it to. That's probably the easiest solution to implement on the
 importer's side, but I expect it to be difficult for the exporter to know at
 dma_buf_unmap() time if the buffer will need to be moved or not.

 2. Adding a callback to request the importer to unmap the buffer. This might
 be racy, and locking might be difficult to handle.

 3. At dma_buf_unmap() time, keep importer's mappings around. The exporter is
 then free to move the buffer if needed, in which case the mappings will be
 invalid. This shouldn't be a problem in theory, as the buffer isn't being 
 used
 by the importer at that time, but can cause stability issues when dealing 
 with
 rogue hardware as this would punch holes in the IOMMU fence. At dma_buf_map()
 time the exporter would tell the importer whether the buffer moved or not. If
 it moved, the importer will tear down the mappings it kept, and create new
 ones.

 Variations around those 3 possible solutions are possible.

 While preparing my fosdem presentation about dma_buf I've thought quite a
 bit what we still need for forceful unmap support/persistent
 mappings/dynamic dma_buf/whatever you want to call it. And it's a lot, and
 we have quite a few lower hanging fruits to reap (like cpu access and mmap
 support for importer). So I propose instead:

 4. Just hang onto the device mappings for as long as it's convenient and/or
 necessary and feel guilty about it.

for v4l2/vb2, I'd like to at least request some sort of
BUF_PREPARE_IS_EXPENSIVE flag, so we don't penalize devices where
remapping is not expensive.  Ie. the camera driver could set this flag
so vb2 core knows not unmap()/re-map() between frames.

In my case, for v4l2 + encoder, I really need the unmapping/remapping
between frames, at least if there is anything else going on competing
for buffers.  But in my case, the exporter remaps to a contiguous
(sorta) virtual address that the camera can see, so there is no
expensive mapping on the importer side of things.


BR,
-R


 The reason is that going fully static isn't worse than a half-baked
 dynamic version of dma_buf, but the half-baked dynamic one has the
 downside that we can ignore the issue and feel good about things ;-)

 Cheers, Daniel
 --
 Daniel Vetter
 daniel.vet...@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)

2012-01-31 Thread Clark, Rob
On Mon, Jan 30, 2012 at 4:01 PM, Sakari Ailus sakari.ai...@iki.fi wrote:

 So to summarize I understand your constraints - gpu drivers have worked
 like v4l a few years ago. The thing I'm trying to achieve with this
 constant yelling is just to raise awereness for these issues so that
 people aren't suprised when drm starts pulling tricks on dma_bufs.

 I think we should be able to mark dma_bufs non-relocatable so also DRM can
 work with these buffers. Or alternatively, as Laurent proposed, V4L2 be
 prepared for moving the buffers around. Are there other reasons to do so
 than paging them out of system memory to make room for something else?

fwiw, from GPU perspective, the DRM device wouldn't be actively
relocating buffers just for the fun of it.  I think it is more that we
want to give the GPU driver the flexibility to relocate when it really
needs to.  For example, maybe user has camera app running, then puts
it in the background and opens firefox which tries to allocate a big
set of pixmaps putting pressure on GPU memory..

I guess the root issue is who is doing the IOMMU programming for the
camera driver.  I guess if this is something built in to the camera
driver then when it calls dma_buf_map() it probably wants some hint
that the backing pages haven't moved so in the common case (ie. buffer
hasn't moved) it doesn't have to do anything expensive.

On omap4 v4l2+drm example I have running, it is actually the DRM
driver doing the IOMMU programming.. so v4l2 camera really doesn't
need to care about it.  (And the IOMMU programming here is pretty
fast.)  But I suppose this maybe doesn't represent all cases.  I
suppose if a camera didn't really sit behind an IOMMU but uses
something more like a DMA descriptor list would want to know if it
needed to regenerate it's descriptor list.  Or likewise if camera has
an IOMMU that isn't really using the IOMMU framework (although maybe
that is easier to solve).  But I think a hint returned from
dma_buf_map() would do the job?

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator

2012-01-27 Thread Clark, Rob
2012/1/27 Marek Szyprowski m.szyprow...@samsung.com:
 Hi Ohad,

 On Friday, January 27, 2012 10:44 AM Ohad Ben-Cohen wrote:

 With v19, I can't seem to allocate big regions anymore (e.g. 101MiB).
 In particular, this seems to fail:

 On Thu, Jan 26, 2012 at 11:00 AM, Marek Szyprowski
 m.szyprow...@samsung.com wrote:
  +static int cma_activate_area(unsigned long base_pfn, unsigned long count)
  +{
  +       unsigned long pfn = base_pfn;
  +       unsigned i = count  pageblock_order;
  +       struct zone *zone;
  +
  +       WARN_ON_ONCE(!pfn_valid(pfn));
  +       zone = page_zone(pfn_to_page(pfn));
  +
  +       do {
  +               unsigned j;
  +               base_pfn = pfn;
  +               for (j = pageblock_nr_pages; j; --j, pfn++) {
  +                       WARN_ON_ONCE(!pfn_valid(pfn));
  +                       if (page_zone(pfn_to_page(pfn)) != zone)
  +                               return -EINVAL;

 The above WARN_ON_ONCE is triggered, and then the conditional is
 asserted (page_zone() retuns a Movable zone, whereas zone is
 Normal) and the function fails.

 This happens to me on OMAP4 with your 3.3-rc1-cma-v19 branch (and a
 bunch of remoteproc/rpmsg patches).

 Do big allocations work for you ?

 I've tested it with 256MiB on Exynos4 platform. Could you check if the
 problem also appears on 3.2-cma-v19 branch (I've uploaded it a few hours
 ago) and 3.2-cma-v18? Both are available on our public repo:
 git://git.infradead.org/users/kmpark/linux-samsung/

 The above code has not been changed since v16, so I'm really surprised
 that it causes problems. Maybe the memory configuration or layout has
 been changed in 3.3-rc1 for OMAP4?

is highmem still an issue?  I remember hitting this WARN_ON_ONCE() but
went away after I switched to a 2g/2g vm split (which avoids highmem)

BR,
-R

 Best regards
 --
 Marek Szyprowski
 Samsung Poland RD Center




 ___
 Linaro-mm-sig mailing list
 linaro-mm-...@lists.linaro.org
 http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: V4L2 Overlay mode replacement by dma-buf - was: Re: [PATCH 05/10] v4l: add buffer exporting via dmabuf

2012-01-23 Thread Clark, Rob
On Mon, Jan 23, 2012 at 10:57 AM, Mauro Carvalho Chehab
mche...@redhat.com wrote:

 2) The userspace API changes to properly support for dma buffers.

 If you're not ready to discuss (2), that's ok, but I'd like to follow
 the discussions for it with care, not only for reviewing the actual
 patches, but also since I want to be sure that it will address the
 needs for xawtv and for the Xorg v4l driver.


 The support of dmabuf could be easily added to framebuffer API.
 I expect that it would not be difficult to add it to Xv.

You might want to have a look at my dri2video proposal a while back.
I plan some minor changes to make the api for multi-planar formats
look a bit more like how addfb2 ended up (ie. array of handles,
offsets, and pitches), but you could get the basic idea from:

http://patchwork.freedesktop.org/patch/7939/

 A texture based API is likely needed, at least for it to work with
 modern PC GPU's.

I suspect we will end up w/ an eglImage extension to go dmabuf fd -
eglImage, and perhaps handle barriers and userspace mappings.  That
should, I think, be the best approach to best hide/abstract all the
GPU crazy games from the rest of the world.

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)

2012-01-23 Thread Clark, Rob
On Mon, Jan 23, 2012 at 4:54 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 Hi Daniel,

 On Monday 23 January 2012 11:35:01 Daniel Vetter wrote:
 On Mon, Jan 23, 2012 at 10:48, Laurent Pinchart wrote:
  On Monday 23 January 2012 10:06:57 Marek Szyprowski wrote:
  On Friday, January 20, 2012 5:29 PM Laurent Pinchart wrote:
   On Friday 20 January 2012 17:20:22 Tomasz Stanislawski wrote:
 IMO, One way to do this is adding field 'struct device *dev' to
 struct vb2_queue. This field should be filled by a driver prior
 to calling vb2_queue_init.

 I haven't looked into the details, but that sounds good to me. Do
 we have use cases where a queue is allocated before knowing which
 physical device it will be used for ?
   
I don't think so. In case of S5P drivers, vb2_queue_init is called
while opening /dev/videoX.
   
BTW. This struct device may help vb2 to produce logs with more
descriptive client annotation.
   
What happens if such a device is NULL. It would happen for vmalloc
allocator used by VIVI?
  
   Good question. Should dma-buf accept NULL devices ? Or should vivi
   pass its V4L2 device to vb2 ?
 
  I assume you suggested using struct video_device-dev entry in such
  case. It will not work. DMA-mapping API requires some parameters to be
  set for the client device, like for example dma mask. struct
  video_device contains only an artificial struct device entry, which has
  no relation to any physical device and cannot be used for calling
  DMA-mapping functions.
 
  Performing dma_map_* operations with such artificial struct device
  doesn't make any sense. It also slows down things significantly due to
  cache flushing (forced by dma-mapping) which should be avoided if the
  buffer is accessed only with CPU (like it is done by vb2-vmalloc style
  drivers).
 
  I agree that mapping the buffer to the physical device doesn't make any
  sense, as there's simple no physical device to map the buffer to. In
  that case we could simply skip the dma_map/dma_unmap calls.

 See my other mail, dma_buf v1 does not support cpu access.

 v1 is in the kernel now, let's start discussing v2 ;-)

 So if you don't have a device around, you can't use it in it's current form.

  Note, however, that dma-buf v1 explicitly does not support CPU access by
  the importer.
 
  IMHO this case perfectly shows the design mistake that have been made.
  The current version simply tries to do too much.
 
  Each client of dma_buf should 'map' the provided sgtable/scatterlist on
  its own. Only the client device driver has all knowledge to make a
  proper 'mapping'. Real physical devices usually will use dma_map_sg()
  for such operation, while some virtual ones will only create a kernel
  mapping for the provided scatterlist (like vivi with vmalloc memory
  module).
 
  I tend to agree with that. Depending on the importer device, drivers
  could then map/unmap the buffer around each DMA access, or keep a
  mapping and sync the buffer.

 Again we've discussed adding a syncing op to the interface that would allow
 keeping around mappings. The thing is that this also requires an unmap
 callback or something similar, so that the exporter can inform the importer
 that the memory just moved around. And the exporter _needs_ to be able to do
 that, hence also the language in the doc that importers need to braked all
 uses with a map/unmap and can't sit forever on a dma_buf mapping.

 Not all exporters need to be able to move buffers around. If I'm not mistaken,
 only DRM exporters need such a feature (which obviously makes it an important
 feature). Does the exporter need to be able to do so at any time ? Buffers
 can't obviously be moved around when they're used by an activa DMA, so I
 expect the exporter to be able to wait. How long can it wait ?

Offhand I think it would usually be a request from userspace (in some
cases page faults (although I think only if there is hw de-tiling?),
or command submission to gpu involving some buffer(s) that are not
currently mapped) that would trigger the exporter to want to be able
to evict something.  So could be blocked or something else
evicted/moved instead.  Although perhaps not ideal for performance.
(app/toolkit writers seem to have a love of temporary pixmaps, so
x11/ddx driver can chew thru a huge number of new buffer allocations
in very short amount of time)

 I'm not sure I would like a callback approach. If we add a sync operation, the
 exporter could signal to the importer that it must unmap the buffer by
 returning an appropriate value from the sync operation. Would that be usable
 for DRM ?

It does seem a bit over-complicated..  and deadlock prone.  Is there a
reason the importer couldn't just unmap when DMA is completed, and the
exporter give some hint on next map() that the buffer hasn't actually
moved?

BR,
-R

 Another option would be to keep the mapping around, and check in the importer
 if the buffer has moved. If 

Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-11-08 Thread Clark, Rob
On Thu, Nov 3, 2011 at 3:04 AM, Marek Szyprowski
m.szyprow...@samsung.com wrote:
 Hello,

 I'm sorry for a late reply, but after Kernel Summit/ELC I have some comments.

 On Friday, October 14, 2011 5:35 PM Daniel Vetter wrote:

 On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote:
  +/**
  + * struct dma_buf_ops - operations possible on struct dma_buf
  + * @create: creates a struct dma_buf of a fixed size. Actual allocation
  + *            does not happen here.
 
  The 'create' ops is not present in dma_buf_ops.
 
  + * @attach: allows different devices to 'attach' themselves to the given
  + *            buffer. It might return -EBUSY to signal that backing 
  storage
  + *            is already allocated and incompatible with the requirements
  + *            of requesting device. [optional]
  + * @detach: detach a given device from this buffer. [optional]
  + * @get_scatterlist: returns list of scatter pages allocated, increases
  + *                     usecount of the buffer. Requires atleast one 
  attach to be
  + *                     called before. Returned sg list should already be 
  mapped
  + *                     into _device_ address space.
 
  You must add a comment that this call 'may sleep'.
 
  I like the get_scatterlist idea. It allows the exported to create a
  valid scatterlist for a client in a elegant way.
 
  I do not like this whole attachment idea. The problem is that
  currently there is no support in DMA framework for allocation for
  multiple devices. As long as no such a support exists, there is no
  generic way to handle attribute negotiations and buffer allocations
  that involve multiple devices. So the exporter drivers would have to
  implement more or less hacky solutions to handle memory requirements
  and choosing the device that allocated memory.
 
  Currently, AFAIK there is even no generic way for a driver to
  acquire its own DMA memory requirements.
 
  Therefore all logic hidden beneath 'attachment' is pointless. I
  think that support for attach/detach (and related stuff) should be
  postponed until support for multi-device allocation is added to DMA
  framework.

 Imo we clearly need this to make the multi-device-driver with insane dma
 requirements work on arm. And rewriting the buffer handling in
 participating subsystem twice isn't really a great plan. I envision that
 on platforms where we need this madness, the driver must call back to the
 dma subsytem to create a dma_buf. The dma subsytem should be already aware
 of all the requirements and hence should be able to handle them..

  I don't say the attachment list idea is wrong but adding attachment
  stuff creates an illusion that problem of multi-device allocations
  is somehow magically solved. We should not force the developers of
  exporter drivers to solve the problem that is not solvable yet.

 Well, this is why we need to create a decent support infrastructure for
 platforms (= arm madness) that needs this, so that device drivers and
 subsystem don't need to invent that wheel on their own. Which as you point
 out, they actually can't.

 The real question is whether it is possible to create any generic support
 infrastructure. I really doubt. IMHO this is something that will be hacked for
 each 'product release' and will never read the mainline...

  The other problem are the APIs. For example, the V4L2 subsystem
  assumes that memory is allocated after successful VIDIOC_REQBUFS
  with V4L2_MEMORY_MMAP memory type. Therefore attach would be
  automatically followed by get_scatterlist, blocking possibility of
  any buffer migrations in future.

 Well, pardon to break the news, but v4l needs to rework the buffer
 handling. If you want to share buffers with a gpu driver, you _have_ to
 life with the fact that gpus do fully dynamic buffer management, meaning:
 - buffers get allocated and destroyed on the fly, meaning static reqbuf
   just went out the window (we obviously cache buffer objects and reuse
   them for performance, as long as the processing pipeline doesn't really
   change).
 - buffers get moved around in memory, meaning you either need full-blown
   sync-objects with a callback to drivers to tear-down mappings on-demand,
   or every driver needs to guarnatee to call put_scatterlist in a
   reasonable short time. The latter is probably the more natural thing for
   v4l devices.

 I'm really not convinced if it is possible to go for the completely dynamic
 buffer management, especially if we are implementing a proof-of-concept
 solution. Please notice the following facts:

 1. all v4l2 drivers do the 'static' buffer management - memory is being
 allocated on REQBUF() call and then mapped permanently into both userspace
 and dma (io) address space.

Is this strictly true if we are introducing a new 'enum v4l2_memory'
for dmabuf's?  Shouldn't that give us some flexibility, especially if
the v4l2 device is only the importer, not the allocator, of the
memory.

and a couple 

Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator

2011-10-10 Thread Clark, Rob
On Mon, Oct 10, 2011 at 1:58 AM, Ohad Ben-Cohen o...@wizery.com wrote:
 On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann a...@arndb.de wrote:
 IMHO it would be good to merge the entire series into 3.2, since
 the ARM portion fixes an important bug (double mapping of memory
 ranges with conflicting attributes) that we've lived with for far
 too long, but it really depends on how everyone sees the risk
 for regressions here. If something breaks in unfixable ways before
 the 3.2 release, we can always revert the patches and have another
 try later.

 I didn't thoroughly review the patches, but I did try them out (to be
 precise, I tried v15) on an OMAP4 PandaBoard, and really liked the
 result.

 The interfaces seem clean and convenient and things seem to work (I
 used a private CMA pool with rpmsg and remoteproc, but also noticed
 that several other drivers were utilizing the global pool). And with
 this in hand we can finally ditch the old reserve+ioremap approach.

 So from a user perspective, I sure do hope this patch set gets into
 3.2; hopefully we can just fix anything that would show up during the
 3.2 cycle.

 Marek, Michal (and everyone involved!), thanks so much for pushing
 this! Judging from the history of this patch set and the areas that it
 touches (and from the number of LWN articles ;) it looks like a
 considerable feat.

 FWIW, feel free to add my

 Tested-by: Ohad Ben-Cohen o...@wizery.com

Marek, I guess I forgot to mention earlier, but I've been using CMA
for a couple of weeks now with omapdrm driver, so you can also add my:

Tested-by: Rob Clark r...@ti.com

BR,
-R

 (small and optional comment: I think it'd be nice if
 dma_declare_contiguous would fail if called too late, otherwise users
 of that misconfigured device will end up using the global pool without
 easily knowing that something went wrong)

 Thanks,
 Ohad.

 ___
 Linaro-mm-sig mailing list
 linaro-mm-...@lists.linaro.org
 http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] Buffer sharing proof-of-concept

2011-08-04 Thread Clark, Rob
On Thu, Aug 4, 2011 at 3:58 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 On Wed, Aug 3, 2011 at 17:12, Jordan Crouse jcro...@codeaurora.org wrote:
 On 08/03/2011 03:33 AM, Tom Cooksey wrote:
 Passing buffer meta-data around was also discussed yesterday. Again, the
 general consensus seemed to be that this data should be kept out of the
 kernel. The userspace application should know what the buffer format
 etc. is and can provide that information to the relevant device APIs
 when is passes in the buffer.

 True, but APIs change slowly. Some APIs *cough* OpenMAX *cough* are damn
 near immutable over the life time of a average software release. A blob of
 data attached to a buffer can evolve far more rapidly and be far more
 extensible and much more vendor specific. This isn't an new idea, I think
 the DRM/GEM guys have tossed it around too.

 Erh, no. For sharing gem buffers between process (i.e. between direct
 rendering clients and the compositor, whatever that is) we just hand
 around the gem id in the kernel. Some more stuff gets passed around in
 userspace in a generic way (e.g. DRI2 passes the buffer type (depth,
 stencil, color, ...) and the stride), but that's it.

 Everything else is driver specific and mostly not even passed around
 explicitly and just agreed upon implicitly. E.g. running the wrong
 XvMC decoder lib for your Xorg Intel driver will result in garbage on
 the screen. There's a bit more leeway between Mesa and the Xorg driver
 because they're released independantly, but it's very ad-hoc (i.e.
 oops, that buffer doesn't fit the requirements of the new code, must
 be an old Xorg driver, so switch to the compat paths in Mesa).

 But my main fear with the blob attached to the buffer idea is that
 sooner or later it'll be part of the kernel/userspace interface of the
 buffer sharing api (hey, it's there, why not use it?). And the
 timeframe for deprecating the kernel abi is 5-10 years and yes I've
 tried to dodge that and got shot at.

hmm, there would be a dmabuf-private ptr in struct dmabuf.  Normally
that should be for private data of the buffer allocator, but I guess
it could be (ab)used for under the hood communication between drivers
a platform specific way.  It does seem a bit hacky, but at least it
does not need to be exposed to userspace.

(Or maybe a better option is just 'rm -rf omx' ;-))

BR,
-R

 Imo a better approach is to spec
 (_after_ the kernel buffer sharing works) a low-level userspace api
 that drivers need to implement (like the EGL Mesa extensions used to
 make Wayland work on gem drivers).
 -Daniel
 --
 Daniel Vetter
 daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch

 ___
 Linaro-mm-sig mailing list
 linaro-mm-...@lists.linaro.org
 http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] Buffer sharing proof-of-concept

2011-08-04 Thread Clark, Rob
On Thu, Aug 4, 2011 at 7:34 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 On Thu, Aug 4, 2011 at 13:14, Clark, Rob r...@ti.com wrote:
 hmm, there would be a dmabuf-private ptr in struct dmabuf.  Normally
 that should be for private data of the buffer allocator, but I guess
 it could be (ab)used for under the hood communication between drivers
 a platform specific way.  It does seem a bit hacky, but at least it
 does not need to be exposed to userspace.

 An idea that just crossed my mind: I think we should seperate two
 kinds of meta-data about a shared piece of data (dmabuf):
 - logical metadata about it's contents, like strides, number of
 dimensions, pixel format/vbo layout, ... Imo that stuff doesn't belong
 into the buffer sharing simply because it's an a) awful mess and b)
 gem doesn't know it. To recap: only userspace knows this stuff and has
 to make sense of the data in the buffer by either setting up correct
 gpu command streams or telling kms what format this thing it needs to
 scan out has.


for sure, I think we've ruled out putting this sort of stuff in
'struct dmabuf'.. (notwithstanding any data stuffed away in a 'void *
priv' on some platform or another)

 - metadata about the physical layout: tiling layout, memory bank
 interleaving, page size for the iommu/contiguous buffer. As far as I
 can tell (i.e. please correct) for embedded systems this just depends
 on the (in)saneness of to iommu/bus/memory controller sitting between
 the ic block and it's data. So it would be great if we could
 completely hide this from drivers (and userspace) an shovel it into
 the dma subsystem (as private data). Unfortunately at least on Intel
 tiling needs to be known by the iommu code, the core gem kernel driver
 code and the userspace drivers. Otoh using tiled buffers for sharing
 is maybe a bit ambitious for the first cut. So maybe we can just
 ignore tiling which largely just leaves handling iommus restrictions
 (or their complete lack) which looks doable.

btw, on intel (or desktop platforms in general), could another device
(say a USB webcam) DMA directly to a tiled buffer via the GART... ie.
assuming you had some way to pre-fault some pages into the GART before
the DMA happened.

I was sort of expecting 'struct dmabuf' to basically just be a
scatterlist and some fxn ptrs, nothing about TILING.. not sure if we
need an fxn ptr to ask the buffer allocator to generate some
pages/addresses that some other DMA engine could write to (so you
could do something like pre-faulting the buffer into some sort of
GART) and again release the pages/addresses when DMA completes.

BR,
-R

 (Or maybe a better option is just 'rm -rf omx' ;-))

 Yeah ;-)
 -Daniel
 --
 Daniel Vetter
 daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 1/6] drivers: base: add shared buffer framework

2011-08-02 Thread Clark, Rob
On Tue, Aug 2, 2011 at 4:49 AM, Marek Szyprowski
m.szyprow...@samsung.com wrote:
 From: Tomasz Stanislawski t.stanisl...@samsung.com


 +/**
 + * shrbuf_import() - obtain shrbuf structure from a file descriptor
 + * @fd:        file descriptor
 + *
 + * The function obtains an instance of a  shared buffer from a file
 descriptor
 + * Call sb-put when imported buffer is not longer needed
 + *
 + * Returns pointer to a shared buffer or error pointer on failure
 + */
 +struct shrbuf *shrbuf_import(int fd)
 +{
 +    struct file *file;
 +    struct shrbuf *sb;
 +
 +    /* obtain a file, assure that it will not be released */
 +    file = fget(fd);
 +    /* check if descriptor is incorrect */
 +    if (!file)
 +        return ERR_PTR(-EBADF);
 +    /* check if dealing with shrbuf-file */
 +    if (file-f_op != shrbuf_fops) {


Hmm.. I was liking the idea of letting the buffer allocator provide
the fops, so it could deal w/ mmap'ing and that sort of thing.
Although this reminds me that we would need a sane way to detect if
someone tries to pass in a non-umm/dmabuf/shrbuf/whatever fd.


 +        fput(file);
 +        return ERR_PTR(-EINVAL);
 +    }
 +    /* add user of shared buffer */
 +    sb = file-private_data;
 +    sb-get(sb);
 +    /* release the file */
 +    fput(file);
 +
 +    return sb;
 +}


 +/**
 + * struct shrbuf - shared buffer instance
 + * @get:    increase number of a buffer's users
 + * @put:    decrease number of a buffer's user, release resources if needed
 + * @dma_addr:    start address of a contiguous buffer
 + * @size:    size of a contiguous buffer
 + *
 + * Both get/put methods are required. The structure is dedicated for
 + * embedding. The fields dma_addr and size are used for proof-of-concept
 + * purpose. They will be substituted by scatter-gatter lists.
 + */
 +struct shrbuf {
 +    void (*get)(struct shrbuf *);
 +    void (*put)(struct shrbuf *);

Hmm, is fput()/fget() and fops-release() not enough?

Ie. original buffer allocator provides fops, incl the fops-release(),
which may in turn be decrementing an internal ref cnt used by the
allocating driver..  so if your allocating driver was the GPU, it's
release fxn might be calling drm_gem_object_unreference_unlocked()..
and I guess there must be something similar for videobuf2.

(Previous comment about letting the allocating driver implement fops
notwithstanding.. but I guess there must be some good way to deal with
that.)

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] drm: add overlays as first class KMS objects

2011-05-14 Thread Clark, Rob
On Fri, May 13, 2011 at 8:02 PM, Jesse Barnes jbar...@virtuousgeek.org wrote:
 On Fri, 13 May 2011 18:16:30 +0200
 Daniel Vetter daniel.vet...@ffwll.ch wrote:

 Hi Jesse,

 Discussion here in Budapest with v4l and embedded graphics folks was
 extremely fruitful. A few quick things to take away - I'll try to dig
 through all
 the stuff I've learned more in-depth later (probably in a blog post or two):

Hi Daniel, thanks for writing this up

 - embedded graphics is insane. The output routing/blending/whatever
   currently shipping hw can do is crazy and kms as-is is nowhere near up
   to snuff to support this. We've discussed omap4 and a ti chip targeted at
   video surveillance as use cases. I'll post block diagrams and explanations
   some when later.

 Yeah I expected that; even just TVs can have really funky restrictions
 about z order and blend capability.

 - we should immediately stop to call anything an overlay. It's a confusing
   concept that has a different meaning in every subsystem and for every hw
   manufacturer. More sensible names are dma fifo engines for things that 
 slurp
   in planes and make them available to the display subsystem. Blend engines
   for blocks that take multiple input pipes and overlay/underlay/blend them
   together. Display subsytem/controller for the aggregate thing including
   encoders/resizers/outputs and especially the crazy routing network that
   connects everything.

 How about just display plane then?  Specifically in the context of
 display output hardware...

display plane could be a good name.. actually in omap4 case it is a
single dma engine that is multiplexing fetches for however many
attached video pipes.. that is perhaps an implementation detail, but
it makes display plane sound nicer as a name



 1) Splitting the crtc object into two objects: crtc with associated output 
 mode
 (pixel clock, encoders/connectors) and dma engines (possibly multiple) that
 feed it. omap 4 has essentially just 4 dma engines that can be freely 
 assigned
 to the available outputs, so a distinction between normal crtcs and overlay
 engines just does not make sense. There's the major open question of where
 to put the various attributes to set up the output pipeline. Also some of 
 these
 attributes might need to be changed atomicly together with pageflips on
 a bunch of dma engines all associated with the same crtc on the next vsync,
 e.g. output position of an overlaid video buffer.

 Yeah, that's a good goal, and pretty much what I had in mind here.
 However, breaking the existing interface is a non-starter, so either we
 need a new CRTC object altogether, or we preserve the idea of a
 primary plane (whatever that means for a given platform) that's tied
 to each CRTC, which each additional plane described in a separate
 structure.  Z order and blend restrictions will have to be communicated
 separately I think...

In the cases I can think of, you'll always have a primary plane, so
userspace need not explicitly specify it.  But I think you want the
driver to pick which display plane to be automatically hooked between
the primary fb and crtc, or at least this should be the case if some
new bit is set in driver_features to indicate the driver supports
multiple display planes per crtc.

BR,
-R

 Thanks,
 --
 Jesse Barnes, Intel Open Source Technology Center

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Yet another memory provider: can linaro organize a meeting?

2011-03-25 Thread Clark, Rob
On Wed, Mar 16, 2011 at 3:14 AM, Kyungmin Park kmp...@infradead.org wrote:

 Rough schedules.

 1. Warsaw meetings (3/16~3/18): mostly v4l2 person and some SoC vendors
  Make a consensence at media developers. and share the information.
  Please note that it's v4l2 brainstorming meeting. so memory
 management is not the main issue.
 2. ELC (4/11~4/13): DRM, DRI and v4l2 person.

Fyi, I should be at ELC, at least for a day or two.. it would be nice,
as Andy suggested on other thread, to carve out a timeslot to discuss
in advance, because I'm not sure that I'll be able to be there the
entire time..

BR,
-R

  Discuss GEM/TTM is acceptable for non-X86 system and find out the
 which modules are acceptable.
  We studied the GEM for our environment. but it's too huge and not
 much benefit for us since current frameworks are enough.
  The missing is that no generic memory passing mechanism. We need the
 generic memory passing interface. that's all.
 3. Linaro (5/9~5/13): ARM, SoC vendors and v4l2 persons.
  I hope several person are anticipated and made a small step for final goal.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [st-ericsson] v4l2 vs omx for camera

2011-02-24 Thread Clark, Rob
On Thu, Feb 24, 2011 at 7:10 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 On Thursday 24 February 2011 14:04:19 Hans Verkuil wrote:
 On Thursday, February 24, 2011 13:29:56 Linus Walleij wrote:
  2011/2/23 Sachin Gupta sachin.gu...@linaro.org:
   The imaging coprocessor in today's platforms have a general purpose DSP
   attached to it I have seen some work being done to use this DSP for
   graphics/audio processing in case the camera use case is not being
   tried or also if the camera usecases does not consume the full
   bandwidth of this dsp.I am not sure how v4l2 would fit in such an
   architecture,
 
  Earlier in this thread I discussed TI:s DSPbridge.
 
  In drivers/staging/tidspbridge
  http://omappedia.org/wiki/DSPBridge_Project
  you find the TI hackers happy at work with providing a DSP accelerator
  subsystem.
 
  Isn't it possible for a V4L2 component to use this interface (or
  something more evolved, generic) as backend for assorted DSP offloading?
 
  So using one kernel framework does not exclude using another one
  at the same time. Whereas something like DSPbridge will load firmware
  into DSP accelerators and provide control/datapath for that, this can
  in turn be used by some camera or codec which in turn presents a
  V4L2 or ALSA interface.

 Yes, something along those lines can be done.

 While normally V4L2 talks to hardware it is perfectly fine to talk to a DSP
 instead.

 The hardest part will be to identify the missing V4L2 API pieces and design
 and add them. I don't think the actual driver code will be particularly
 hard. It should be nothing more than a thin front-end for the DSP. Of
 course, that's just theory at the moment :-)

 The problem is that someone has to do the actual work for the initial
 driver. And I expect that it will be a substantial amount of work. Future
 drivers should be *much* easier, though.

 A good argument for doing this work is that this API can hide which parts
 of the video subsystem are hardware and which are software. The
 application really doesn't care how it is organized. What is done in
 hardware on one SoC might be done on a DSP instead on another SoC. But the
 end result is pretty much the same.

 I think the biggest issue we will have here is that part of the inter-
 processors communication stack lives in userspace in most recent SoCs (OMAP4
 comes to mind for instance). This will make implementing a V4L2 driver that
 relies on IPC difficult.

 It's probably time to start seriously thinking about userspace
 drivers/librairies/middlewares/frameworks/whatever, at least to clearly tell
 chip vendors what the Linux community expects.


I suspect more of the IPC framework needs to move down to the kernel..
this is the only way I can see to move the virt-phys address
translation to a trusted layer.  I'm not sure how others would feel
about pushing more if the IPC stack down to the kernel, but at least
it would make it easier for a v4l2 driver to leverage the
coprocessors..

BR,
-R

 --
 Regards,

 Laurent Pinchart

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [st-ericsson] v4l2 vs omx for camera

2011-02-24 Thread Clark, Rob
On Thu, Feb 24, 2011 at 2:19 PM, Edward Hervey bilb...@gmail.com wrote:

  What *needs* to be solved is an API for data allocation/passing at the
 kernel level which v4l2,omx,X,GL,vdpau,vaapi,... can use and that
 userspace (like GStreamer) can pass around, monitor and know about.

yes yes yes yes!!

vaapi/vdpau is half way there, as they cover sharing buffers with
X/GL..  but sadly they ignore camera.  There are a few other
inconveniences with vaapi and possibly vdpau.. at least we'd prefer to
have an API the covered decoding config data like SPS/PPS and not just
slice data since config data NALU's are already decoded by our
accelerators..

  That is a *massive* challenge on its own. The choice of using
 GStreamer or not ... is what you want to do once that challenge is
 solved.

  Regards,

    Edward

 P.S. GStreamer for Android already works :
 http://www.elinux.org/images/a/a4/Android_and_Gstreamer.ppt


yeah, I'm aware of that.. someone please convince google to pick it up
and drop stagefright so we can only worry about a single framework
between android and linux  (and then I look forward to playing with
pitivi on an android phone :-))

BR,
-R

 ___
 gstreamer-devel mailing list
 gstreamer-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [st-ericsson] v4l2 vs omx for camera

2011-02-24 Thread Clark, Rob
On Thu, Feb 24, 2011 at 7:17 AM, Hans Verkuil hverk...@xs4all.nl wrote:
 There are two parts to this: first of all you need a way to allocate large
 buffers. The CMA patch series is available (but not yet merged) that does 
 this.
 I'm not sure of the latest status of this series.

 The other part is that everyone can use and share these buffers. There isn't
 anything for this yet. We have discussed this in the past and we need 
 something
 generic for this that all subsystems can use. It's not a good idea to tie this
 to any specific framework like GEM. Instead any subsystem should be able to 
 use
 the same subsystem-independent buffer pool API.

yeah, doesn't need to be GEM.. but should at least inter-operate so we
can share buffers with the display/gpu..

[snip]
 But maybe it would be nice to have a way to have sensor driver on the
 linux side, pipelined with hw and imaging drivers on a co-processor
 for various algorithms and filters with configuration all exposed to
 userspace thru MCF.. I'm not immediately sure how this would work, but
 it sounds nice at least ;-)

 MCF? What does that stand for?


sorry, v4l2 media controller framework

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [st-ericsson] v4l2 vs omx for camera

2011-02-21 Thread Clark, Rob
On Fri, Feb 18, 2011 at 10:39 AM, Robert Fekete
robert.fek...@linaro.org wrote:
 Hi,

 In order to expand this knowledge outside of Linaro I took the Liberty of
 inviting both linux-media@vger.kernel.org and
 gstreamer-de...@lists.freedesktop.org. For any newcomer I really recommend
 to do some catch-up reading on
 http://lists.linaro.org/pipermail/linaro-dev/2011-February/thread.html
 (v4l2 vs omx for camera thread) before making any comments. And sign up
 for Linaro-dev while you are at it :-)

 To make a long story short:
 Different vendors provide custom OpenMax solutions for say Camera/ISP. In
 the Linux eco-system there is V4L2 doing much of this work already and is
 evolving with mediacontroller as well. Then there is the integration in
 Gstreamer...Which solution is the best way forward. Current discussions so
 far puts V4L2 greatly in favor of OMX.
 Please have in mind that OpenMAX as a concept is more like GStreamer in many
 senses. The question is whether Camera drivers should have OMX or V4L2 as
 the driver front end? This may perhaps apply to video codecs as well. Then
 there is how to in best of ways make use of this in GStreamer in order to
 achieve no copy highly efficient multimedia pipelines. Is gst-omx the way
 forward?

just fwiw, there were some patches to make v4l2src work with userptr
buffers in case the camera has an mmu and can handle any random
non-physically-contiguous buffer..  so there is in theory no reason
why a gst capture pipeline could not be zero copy and capture directly
into buffers allocated from the display

Certainly a more general way to allocate buffers that any of the hw
blocks (display, imaging, video encoders/decoders, 3d/2d hw, etc)
could use, and possibly share across-process for some zero copy DRI
style rendering, would be nice.  Perhaps V4L2_MEMORY_GEM?


 Let the discussion continue...


 On 17 February 2011 14:48, Laurent Pinchart
 laurent.pinch...@ideasonboard.com wrote:

 On Thursday 10 February 2011 08:47:15 Hans Verkuil wrote:
  On Thursday, February 10, 2011 08:17:31 Linus Walleij wrote:
   On Wed, Feb 9, 2011 at 8:44 PM, Harald Gustafsson wrote:
OMX main purpose is to handle multimedia hardware and offer an
interface to that HW that looks identical indenpendent of the vendor
delivering that hardware, much like the v4l2 or USB subsystems tries
to
do. And yes optimally it should be implemented in drivers/omx in
Linux
and a user space library on top of that.
  
   Thanks for clarifying this part, it was unclear to me. The reason
   being
   that it seems OMX does not imply userspace/kernelspace separation, and
   I was thinking more of it as a userspace lib. Now my understanding is
   that if e.g. OpenMAX defines a certain data structure, say for a PCM
   frame or whatever, then that exact struct is supposed to be used by
   the
   kernelspace/userspace interface, and defined in the include file
   exported
   by the kernel.
  
It might be that some alignment also needs to be made between 4vl2
and
other OS's implementation, to ease developing drivers for many OSs
(sorry I don't know these details, but you ST-E guys should know).
  
   The basic conflict I would say is that Linux has its own API+ABI,
   which
   is defined by V4L and ALSA through a community process without much
   thought about any existing standard APIs. (In some cases also
   predating
   them.)
  
By the way IL is about to finalize version 1.2 of OpenMAX IL which
is
more than a years work of aligning all vendors and fixing unclear
and
buggy parts.
  
   I suspect that the basic problem with Khronos OpenMAX right now is
   how to handle communities - for example the X consortium had
   something like the same problem a while back, only member companies
   could partake in the standard process, and they need of course to pay
   an upfront fee for that, and the majority of these companies didn't
   exactly send Linux community members to the meetings.
  
   And now all the companies who took part in OpenMAX somehow
   end up having to do a lot of upfront community work if they want
   to drive the API:s in a certain direction, discuss it again with the
   V4L
   and ALSA maintainers and so on. Which takes a lot of time and
   patience with uncertain outcome, since this process is autonomous
   from Khronos. Nobody seems to be doing this, I javen't seen a single
   patch aimed at trying to unify the APIs so far. I don't know if it'd
   be
   welcome.
  
   This coupled with strict delivery deadlines and a marketing will
   to state conformance to OpenMAX of course leads companies into
   solutions breaking the Linux kernelspace API to be able to present
   this.

 From my experience with OMX, one of the issues is that companies usually
 extend the API to fullfill their platform's needs, without going through
 any
 standardization process. Coupled with the lack of open and free reference
 implementation and test tools, this