Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices
On Fri, Jun 8, 2012 at 3:56 PM, Erik Gilling konk...@android.com wrote: I guess my other thought is that implicit vs explicit is not mutually exclusive, though I'd guess there'd be interesting deadlocks to have to debug if both were in use _at the same time_. :-) I think this is an approach worth investigating. I'd like a way to either opt out of implicit sync or have a way to check if a dma-buf has an attached fence and detach it. Actually, that could work really well. Consider: * Each dma_buf has a single fence slot * on submission * the driver will extract the fence from the dma_buf and queue a wait on it. * the driver will replace that fence with it's own complettion fence before the job submission ioctl returns. * dma_buf will have two userspace ioctls: * DETACH: will return the fence as an FD to userspace and clear the fence slot in the dma_buf * ATTACH: takes a fence FD from userspace and attaches it to the dma_buf fence slot. Returns an error if the fence slot is non-empty. In the android case, we can do a detach after every submission and an attach right before. btw, I like this idea for implicit and explicit sync to coexist BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 3/3] dma_buf: Add documentation for the new cpu access support
On Fri, Mar 2, 2012 at 6:23 PM, Sakari Ailus sakari.ai...@iki.fi wrote: Hi Daniel, Thanks for the patch. On Thu, Mar 01, 2012 at 04:36:01PM +0100, Daniel Vetter wrote: Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- Documentation/dma-buf-sharing.txt | 102 +++- 1 files changed, 99 insertions(+), 3 deletions(-) diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 225f96d..f12542b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -32,8 +32,12 @@ The buffer-user *IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] For this first version, A buffer shared using the dma_buf sharing API: - *may* be exported to user space using mmap *ONLY* by exporter, outside of - this framework. -- may be used *ONLY* by importers that do not need CPU access to the buffer. + this framework. +- with this new iteration of the dma-buf api cpu access from the kernel has been + enable, see below for the details. + +dma-buf operations for device dma only +-- The dma_buf buffer sharing API usage contains the following steps: @@ -219,7 +223,99 @@ NOTES: If the exporter chooses not to allow an attach() operation once a map_dma_buf() API has been called, it simply returns an error. -Miscellaneous notes: +Kernel cpu access to a dma-buf buffer object + + +The motivation to allow cpu access from the kernel to a dma-buf object from the +importers side are: +- fallback operations, e.g. if the devices is connected to a usb bus and the + kernel needs to shuffle the data around first before sending it away. +- full transperancy for existing users on the importer side, i.e. userspace + should not notice the difference between a normal object from that subsystem + and an imported one backed by a dma-buf. This is really important for drm + opengl drivers that expect to still use all the existing upload/download + paths. + +Access to a dma_buf from the kernel context involves three steps: + +1. Prepare access, which invalidate any necessary caches and make the object + available for cpu access. +2. Access the object page-by-page with the dma_buf map apis +3. Finish access, which will flush any necessary cpu caches and free reserved + resources. Where it should be decided which operations are being done to the buffer when it is passed to user space and back to kernel space? How about spliting these operations to those done on the first time the buffer is passed to the user space (mapping to kernel address space, for example) and those required every time buffer is passed from kernel to user and back (cache flusing)? I'm asking since any unnecessary time-consuming operations, especially as heavy as mapping the buffer, should be avoidable in subsystems dealing with streaming video, cameras etc., i.e. non-GPU users. Well, this is really something for the buffer exporter to deal with.. since there is no way for an importer to create a userspace mmap'ing of the buffer. A lot of these expensive operations go away if you don't even create a userspace virtual mapping in the first place ;-) BR, -R +1. Prepare acces + + Before an importer can acces a dma_buf object with the cpu from the kernel + context, it needs to notice the exporter of the access that is about to + happen. + + Interface: + int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, + size_t start, size_t len, + enum dma_data_direction direction) + + This allows the exporter to ensure that the memory is actually available for + cpu access - the exporter might need to allocate or swap-in and pin the + backing storage. The exporter also needs to ensure that cpu access is + coherent for the given range and access direction. The range and access + direction can be used by the exporter to optimize the cache flushing, i.e. + access outside of the range or with a different direction (read instead of + write) might return stale or even bogus data (e.g. when the exporter needs to + copy the data to temporaray storage). + + This step might fail, e.g. in oom conditions. + +2. Accessing the buffer + + To support dma_buf objects residing in highmem cpu access is page-based using + an api similar to kmap. Accessing a dma_buf is done in aligned chunks of + PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which returns + a pointer in kernel virtual address space. Afterwards the chunk needs to be + unmapped again. There is no limit on how often a given chunk can be mapped + and unmmapped, i.e. the importer does not need to call begin_cpu_access again + before mapping the same chunk again. + + Interfaces: + void
Re: Kernel Display and Video API Consolidation mini-summit at ELC 2012 - Notes
On Wed, Feb 22, 2012 at 10:36 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: On Wed, 22 Feb 2012 17:24:24 +0100, Daniel Vetter dan...@ffwll.ch wrote: On Wed, Feb 22, 2012 at 04:03:21PM +, James Simmons wrote: Fbcon scrolling at be painful at HD or better modes. Fbcon needs 3 possible accels; copyarea, imageblit, and fillrect. The first two could be hooked from the TTM layer. Its something I plan to experiment to see if its worth it. Let's bite into this ;-) I know that fbcon scrolling totally sucks on big screens, but I also think it's a total waste of time to fix this. Imo fbcon has 2 use-cases: - display an OOSP. - allow me to run fsck (or any other desaster-recovery stuff). 3. Show panics. Ensuring that nothing prevents the switch to fbcon and displaying the panic message is the reason why we haven't felt inclined to accelerate fbcon - it just gets messy for no real gain. and when doing 2d accel on a 3d core.. it basically amounts to putting a shader compiler in the kernel. Wh! For example: https://bugs.freedesktop.org/attachment.cgi?id=48933 which doesn't handle flushing of pending updates via the GPU when writing with the CPU during interrupts (i.e. a panic). -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel Display and Video API Consolidation mini-summit at ELC 2012 - Notes
On Fri, Feb 17, 2012 at 1:42 PM, Adam Jackson a...@redhat.com wrote: On 2/16/12 6:25 PM, Laurent Pinchart wrote: Helper functions will be implemented in the subsystems to convert between that generic structure and the various subsystem-specific structures. I guess. I don't really see a reason not to unify the structs too, but then I don't have binary blobs to pretend to be ABI-compatible with. this is just for where timing struct is exposed to userspace BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)
On Sat, Feb 4, 2012 at 5:43 AM, Sakari Ailus sakari.ai...@iki.fi wrote: Hi Rob, Clark, Rob wrote: On Mon, Jan 30, 2012 at 4:01 PM, Sakari Ailus sakari.ai...@iki.fi wrote: So to summarize I understand your constraints - gpu drivers have worked like v4l a few years ago. The thing I'm trying to achieve with this constant yelling is just to raise awereness for these issues so that people aren't suprised when drm starts pulling tricks on dma_bufs. I think we should be able to mark dma_bufs non-relocatable so also DRM can work with these buffers. Or alternatively, as Laurent proposed, V4L2 be prepared for moving the buffers around. Are there other reasons to do so than paging them out of system memory to make room for something else? fwiw, from GPU perspective, the DRM device wouldn't be actively relocating buffers just for the fun of it. I think it is more that we want to give the GPU driver the flexibility to relocate when it really needs to. For example, maybe user has camera app running, then puts it in the background and opens firefox which tries to allocate a big set of pixmaps putting pressure on GPU memory.. I guess the root issue is who is doing the IOMMU programming for the camera driver. I guess if this is something built in to the camera driver then when it calls dma_buf_map() it probably wants some hint that the backing pages haven't moved so in the common case (ie. buffer hasn't moved) it doesn't have to do anything expensive. On omap4 v4l2+drm example I have running, it is actually the DRM driver doing the IOMMU programming.. so v4l2 camera really doesn't need to care about it. (And the IOMMU programming here is pretty This part sounds odd to me. Well, I guess it _could_ be done that way, but the ISP IOMMU could be as well different as the one in DRM. That's the case on OMAP 3, for example. Yes, this is a difference between OMAP4 and OMAP3.. although I think the intention is that OMAP3 type scenarios, if the IOMMU mapping was done through the dma mapping API, then it could still be done (and cached) by the exporter. fast.) But I suppose this maybe doesn't represent all cases. I suppose if a camera didn't really sit behind an IOMMU but uses something more like a DMA descriptor list would want to know if it needed to regenerate it's descriptor list. Or likewise if camera has an IOMMU that isn't really using the IOMMU framework (although maybe that is easier to solve). But I think a hint returned from dma_buf_map() would do the job? An alternative to IOMMU I think in practice would mean CMA-allocated buffers. I need to think about this a bit and understand how this would really work to properly comment this. For example, how does one mlock() something that isn't mapped to process memory --- think of a dma buffer not mapped to the user space process address space? The scatter list that the exporter gives you should be locked/pinned already so importer should not need to call mlock() BR, -R Cheers, -- Sakari Ailus sakari.ai...@iki.fi -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)
On Thu, Feb 2, 2012 at 4:19 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Rob, On Tuesday 31 January 2012 16:38:35 Clark, Rob wrote: On Mon, Jan 30, 2012 at 4:01 PM, Sakari Ailus sakari.ai...@iki.fi wrote: So to summarize I understand your constraints - gpu drivers have worked like v4l a few years ago. The thing I'm trying to achieve with this constant yelling is just to raise awereness for these issues so that people aren't suprised when drm starts pulling tricks on dma_bufs. I think we should be able to mark dma_bufs non-relocatable so also DRM can work with these buffers. Or alternatively, as Laurent proposed, V4L2 be prepared for moving the buffers around. Are there other reasons to do so than paging them out of system memory to make room for something else? fwiw, from GPU perspective, the DRM device wouldn't be actively relocating buffers just for the fun of it. I think it is more that we want to give the GPU driver the flexibility to relocate when it really needs to. For example, maybe user has camera app running, then puts it in the background and opens firefox which tries to allocate a big set of pixmaps putting pressure on GPU memory.. On an embedded system putting the camera application in the background will usually stop streaming, so buffers will be unmapped. On other systems, or even on some embedded systems, that will not be the case though. I'm perfectly fine with relocating buffers when needed. What I want is to avoid unmapping and remapping them for every frame if they haven't moved. I'm sure we can come up with an API to handle that. I guess the root issue is who is doing the IOMMU programming for the camera driver. I guess if this is something built in to the camera driver then when it calls dma_buf_map() it probably wants some hint that the backing pages haven't moved so in the common case (ie. buffer hasn't moved) it doesn't have to do anything expensive. It will likely depend on the camera hardware. For the OMAP3 ISP, the driver calls the IOMMU API explictly, but if I understand it correctly there's a plan to move IOMMU support to the DMA API. On omap4 v4l2+drm example I have running, it is actually the DRM driver doing the IOMMU programming.. so v4l2 camera really doesn't need to care about it. (And the IOMMU programming here is pretty fast.) But I suppose this maybe doesn't represent all cases. I suppose if a camera didn't really sit behind an IOMMU but uses something more like a DMA descriptor list would want to know if it needed to regenerate it's descriptor list. Or likewise if camera has an IOMMU that isn't really using the IOMMU framework (although maybe that is easier to solve). But I think a hint returned from dma_buf_map() would do the job? I see at least three possible solutions to this problem. 1. At dma_buf_unmap() time, the exporter will tell the importer that the buffer will move, and that it should be unmapped from whatever the importer mapped it to. That's probably the easiest solution to implement on the importer's side, but I expect it to be difficult for the exporter to know at dma_buf_unmap() time if the buffer will need to be moved or not. 2. Adding a callback to request the importer to unmap the buffer. This might be racy, and locking might be difficult to handle. 3. At dma_buf_unmap() time, keep importer's mappings around. The exporter is then free to move the buffer if needed, in which case the mappings will be invalid. This shouldn't be a problem in theory, as the buffer isn't being used by the importer at that time, but can cause stability issues when dealing with rogue hardware as this would punch holes in the IOMMU fence. At dma_buf_map() time the exporter would tell the importer whether the buffer moved or not. If it moved, the importer will tear down the mappings it kept, and create new ones. I was leaning towards door #3.. rogue hw is a good point, but I think that would be an issue in general if hw kept accessing the buffer when it wasn't supposed to. BR, -R Variations around those 3 possible solutions are possible. -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)
On Thu, Feb 2, 2012 at 2:23 PM, Daniel Vetter dan...@ffwll.ch wrote: On Thu, Feb 2, 2012 at 11:19, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: On omap4 v4l2+drm example I have running, it is actually the DRM driver doing the IOMMU programming.. so v4l2 camera really doesn't need to care about it. (And the IOMMU programming here is pretty fast.) But I suppose this maybe doesn't represent all cases. I suppose if a camera didn't really sit behind an IOMMU but uses something more like a DMA descriptor list would want to know if it needed to regenerate it's descriptor list. Or likewise if camera has an IOMMU that isn't really using the IOMMU framework (although maybe that is easier to solve). But I think a hint returned from dma_buf_map() would do the job? I see at least three possible solutions to this problem. 1. At dma_buf_unmap() time, the exporter will tell the importer that the buffer will move, and that it should be unmapped from whatever the importer mapped it to. That's probably the easiest solution to implement on the importer's side, but I expect it to be difficult for the exporter to know at dma_buf_unmap() time if the buffer will need to be moved or not. 2. Adding a callback to request the importer to unmap the buffer. This might be racy, and locking might be difficult to handle. 3. At dma_buf_unmap() time, keep importer's mappings around. The exporter is then free to move the buffer if needed, in which case the mappings will be invalid. This shouldn't be a problem in theory, as the buffer isn't being used by the importer at that time, but can cause stability issues when dealing with rogue hardware as this would punch holes in the IOMMU fence. At dma_buf_map() time the exporter would tell the importer whether the buffer moved or not. If it moved, the importer will tear down the mappings it kept, and create new ones. Variations around those 3 possible solutions are possible. While preparing my fosdem presentation about dma_buf I've thought quite a bit what we still need for forceful unmap support/persistent mappings/dynamic dma_buf/whatever you want to call it. And it's a lot, and we have quite a few lower hanging fruits to reap (like cpu access and mmap support for importer). So I propose instead: 4. Just hang onto the device mappings for as long as it's convenient and/or necessary and feel guilty about it. for v4l2/vb2, I'd like to at least request some sort of BUF_PREPARE_IS_EXPENSIVE flag, so we don't penalize devices where remapping is not expensive. Ie. the camera driver could set this flag so vb2 core knows not unmap()/re-map() between frames. In my case, for v4l2 + encoder, I really need the unmapping/remapping between frames, at least if there is anything else going on competing for buffers. But in my case, the exporter remaps to a contiguous (sorta) virtual address that the camera can see, so there is no expensive mapping on the importer side of things. BR, -R The reason is that going fully static isn't worse than a half-baked dynamic version of dma_buf, but the half-baked dynamic one has the downside that we can ignore the issue and feel good about things ;-) Cheers, Daniel -- Daniel Vetter daniel.vet...@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)
On Mon, Jan 30, 2012 at 4:01 PM, Sakari Ailus sakari.ai...@iki.fi wrote: So to summarize I understand your constraints - gpu drivers have worked like v4l a few years ago. The thing I'm trying to achieve with this constant yelling is just to raise awereness for these issues so that people aren't suprised when drm starts pulling tricks on dma_bufs. I think we should be able to mark dma_bufs non-relocatable so also DRM can work with these buffers. Or alternatively, as Laurent proposed, V4L2 be prepared for moving the buffers around. Are there other reasons to do so than paging them out of system memory to make room for something else? fwiw, from GPU perspective, the DRM device wouldn't be actively relocating buffers just for the fun of it. I think it is more that we want to give the GPU driver the flexibility to relocate when it really needs to. For example, maybe user has camera app running, then puts it in the background and opens firefox which tries to allocate a big set of pixmaps putting pressure on GPU memory.. I guess the root issue is who is doing the IOMMU programming for the camera driver. I guess if this is something built in to the camera driver then when it calls dma_buf_map() it probably wants some hint that the backing pages haven't moved so in the common case (ie. buffer hasn't moved) it doesn't have to do anything expensive. On omap4 v4l2+drm example I have running, it is actually the DRM driver doing the IOMMU programming.. so v4l2 camera really doesn't need to care about it. (And the IOMMU programming here is pretty fast.) But I suppose this maybe doesn't represent all cases. I suppose if a camera didn't really sit behind an IOMMU but uses something more like a DMA descriptor list would want to know if it needed to regenerate it's descriptor list. Or likewise if camera has an IOMMU that isn't really using the IOMMU framework (although maybe that is easier to solve). But I think a hint returned from dma_buf_map() would do the job? BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
2012/1/27 Marek Szyprowski m.szyprow...@samsung.com: Hi Ohad, On Friday, January 27, 2012 10:44 AM Ohad Ben-Cohen wrote: With v19, I can't seem to allocate big regions anymore (e.g. 101MiB). In particular, this seems to fail: On Thu, Jan 26, 2012 at 11:00 AM, Marek Szyprowski m.szyprow...@samsung.com wrote: +static int cma_activate_area(unsigned long base_pfn, unsigned long count) +{ + unsigned long pfn = base_pfn; + unsigned i = count pageblock_order; + struct zone *zone; + + WARN_ON_ONCE(!pfn_valid(pfn)); + zone = page_zone(pfn_to_page(pfn)); + + do { + unsigned j; + base_pfn = pfn; + for (j = pageblock_nr_pages; j; --j, pfn++) { + WARN_ON_ONCE(!pfn_valid(pfn)); + if (page_zone(pfn_to_page(pfn)) != zone) + return -EINVAL; The above WARN_ON_ONCE is triggered, and then the conditional is asserted (page_zone() retuns a Movable zone, whereas zone is Normal) and the function fails. This happens to me on OMAP4 with your 3.3-rc1-cma-v19 branch (and a bunch of remoteproc/rpmsg patches). Do big allocations work for you ? I've tested it with 256MiB on Exynos4 platform. Could you check if the problem also appears on 3.2-cma-v19 branch (I've uploaded it a few hours ago) and 3.2-cma-v18? Both are available on our public repo: git://git.infradead.org/users/kmpark/linux-samsung/ The above code has not been changed since v16, so I'm really surprised that it causes problems. Maybe the memory configuration or layout has been changed in 3.3-rc1 for OMAP4? is highmem still an issue? I remember hitting this WARN_ON_ONCE() but went away after I switched to a 2g/2g vm split (which avoids highmem) BR, -R Best regards -- Marek Szyprowski Samsung Poland RD Center ___ Linaro-mm-sig mailing list linaro-mm-...@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: V4L2 Overlay mode replacement by dma-buf - was: Re: [PATCH 05/10] v4l: add buffer exporting via dmabuf
On Mon, Jan 23, 2012 at 10:57 AM, Mauro Carvalho Chehab mche...@redhat.com wrote: 2) The userspace API changes to properly support for dma buffers. If you're not ready to discuss (2), that's ok, but I'd like to follow the discussions for it with care, not only for reviewing the actual patches, but also since I want to be sure that it will address the needs for xawtv and for the Xorg v4l driver. The support of dmabuf could be easily added to framebuffer API. I expect that it would not be difficult to add it to Xv. You might want to have a look at my dri2video proposal a while back. I plan some minor changes to make the api for multi-planar formats look a bit more like how addfb2 ended up (ie. array of handles, offsets, and pitches), but you could get the basic idea from: http://patchwork.freedesktop.org/patch/7939/ A texture based API is likely needed, at least for it to work with modern PC GPU's. I suspect we will end up w/ an eglImage extension to go dmabuf fd - eglImage, and perhaps handle barriers and userspace mappings. That should, I think, be the best approach to best hide/abstract all the GPU crazy games from the rest of the world. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv1 2/4] v4l:vb2: add support for shared buffer (dma_buf)
On Mon, Jan 23, 2012 at 4:54 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Daniel, On Monday 23 January 2012 11:35:01 Daniel Vetter wrote: On Mon, Jan 23, 2012 at 10:48, Laurent Pinchart wrote: On Monday 23 January 2012 10:06:57 Marek Szyprowski wrote: On Friday, January 20, 2012 5:29 PM Laurent Pinchart wrote: On Friday 20 January 2012 17:20:22 Tomasz Stanislawski wrote: IMO, One way to do this is adding field 'struct device *dev' to struct vb2_queue. This field should be filled by a driver prior to calling vb2_queue_init. I haven't looked into the details, but that sounds good to me. Do we have use cases where a queue is allocated before knowing which physical device it will be used for ? I don't think so. In case of S5P drivers, vb2_queue_init is called while opening /dev/videoX. BTW. This struct device may help vb2 to produce logs with more descriptive client annotation. What happens if such a device is NULL. It would happen for vmalloc allocator used by VIVI? Good question. Should dma-buf accept NULL devices ? Or should vivi pass its V4L2 device to vb2 ? I assume you suggested using struct video_device-dev entry in such case. It will not work. DMA-mapping API requires some parameters to be set for the client device, like for example dma mask. struct video_device contains only an artificial struct device entry, which has no relation to any physical device and cannot be used for calling DMA-mapping functions. Performing dma_map_* operations with such artificial struct device doesn't make any sense. It also slows down things significantly due to cache flushing (forced by dma-mapping) which should be avoided if the buffer is accessed only with CPU (like it is done by vb2-vmalloc style drivers). I agree that mapping the buffer to the physical device doesn't make any sense, as there's simple no physical device to map the buffer to. In that case we could simply skip the dma_map/dma_unmap calls. See my other mail, dma_buf v1 does not support cpu access. v1 is in the kernel now, let's start discussing v2 ;-) So if you don't have a device around, you can't use it in it's current form. Note, however, that dma-buf v1 explicitly does not support CPU access by the importer. IMHO this case perfectly shows the design mistake that have been made. The current version simply tries to do too much. Each client of dma_buf should 'map' the provided sgtable/scatterlist on its own. Only the client device driver has all knowledge to make a proper 'mapping'. Real physical devices usually will use dma_map_sg() for such operation, while some virtual ones will only create a kernel mapping for the provided scatterlist (like vivi with vmalloc memory module). I tend to agree with that. Depending on the importer device, drivers could then map/unmap the buffer around each DMA access, or keep a mapping and sync the buffer. Again we've discussed adding a syncing op to the interface that would allow keeping around mappings. The thing is that this also requires an unmap callback or something similar, so that the exporter can inform the importer that the memory just moved around. And the exporter _needs_ to be able to do that, hence also the language in the doc that importers need to braked all uses with a map/unmap and can't sit forever on a dma_buf mapping. Not all exporters need to be able to move buffers around. If I'm not mistaken, only DRM exporters need such a feature (which obviously makes it an important feature). Does the exporter need to be able to do so at any time ? Buffers can't obviously be moved around when they're used by an activa DMA, so I expect the exporter to be able to wait. How long can it wait ? Offhand I think it would usually be a request from userspace (in some cases page faults (although I think only if there is hw de-tiling?), or command submission to gpu involving some buffer(s) that are not currently mapped) that would trigger the exporter to want to be able to evict something. So could be blocked or something else evicted/moved instead. Although perhaps not ideal for performance. (app/toolkit writers seem to have a love of temporary pixmaps, so x11/ddx driver can chew thru a huge number of new buffer allocations in very short amount of time) I'm not sure I would like a callback approach. If we add a sync operation, the exporter could signal to the importer that it must unmap the buffer by returning an appropriate value from the sync operation. Would that be usable for DRM ? It does seem a bit over-complicated.. and deadlock prone. Is there a reason the importer couldn't just unmap when DMA is completed, and the exporter give some hint on next map() that the buffer hasn't actually moved? BR, -R Another option would be to keep the mapping around, and check in the importer if the buffer has moved. If
Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism
On Thu, Nov 3, 2011 at 3:04 AM, Marek Szyprowski m.szyprow...@samsung.com wrote: Hello, I'm sorry for a late reply, but after Kernel Summit/ELC I have some comments. On Friday, October 14, 2011 5:35 PM Daniel Vetter wrote: On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote: +/** + * struct dma_buf_ops - operations possible on struct dma_buf + * @create: creates a struct dma_buf of a fixed size. Actual allocation + * does not happen here. The 'create' ops is not present in dma_buf_ops. + * @attach: allows different devices to 'attach' themselves to the given + * buffer. It might return -EBUSY to signal that backing storage + * is already allocated and incompatible with the requirements + * of requesting device. [optional] + * @detach: detach a given device from this buffer. [optional] + * @get_scatterlist: returns list of scatter pages allocated, increases + * usecount of the buffer. Requires atleast one attach to be + * called before. Returned sg list should already be mapped + * into _device_ address space. You must add a comment that this call 'may sleep'. I like the get_scatterlist idea. It allows the exported to create a valid scatterlist for a client in a elegant way. I do not like this whole attachment idea. The problem is that currently there is no support in DMA framework for allocation for multiple devices. As long as no such a support exists, there is no generic way to handle attribute negotiations and buffer allocations that involve multiple devices. So the exporter drivers would have to implement more or less hacky solutions to handle memory requirements and choosing the device that allocated memory. Currently, AFAIK there is even no generic way for a driver to acquire its own DMA memory requirements. Therefore all logic hidden beneath 'attachment' is pointless. I think that support for attach/detach (and related stuff) should be postponed until support for multi-device allocation is added to DMA framework. Imo we clearly need this to make the multi-device-driver with insane dma requirements work on arm. And rewriting the buffer handling in participating subsystem twice isn't really a great plan. I envision that on platforms where we need this madness, the driver must call back to the dma subsytem to create a dma_buf. The dma subsytem should be already aware of all the requirements and hence should be able to handle them.. I don't say the attachment list idea is wrong but adding attachment stuff creates an illusion that problem of multi-device allocations is somehow magically solved. We should not force the developers of exporter drivers to solve the problem that is not solvable yet. Well, this is why we need to create a decent support infrastructure for platforms (= arm madness) that needs this, so that device drivers and subsystem don't need to invent that wheel on their own. Which as you point out, they actually can't. The real question is whether it is possible to create any generic support infrastructure. I really doubt. IMHO this is something that will be hacked for each 'product release' and will never read the mainline... The other problem are the APIs. For example, the V4L2 subsystem assumes that memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP memory type. Therefore attach would be automatically followed by get_scatterlist, blocking possibility of any buffer migrations in future. Well, pardon to break the news, but v4l needs to rework the buffer handling. If you want to share buffers with a gpu driver, you _have_ to life with the fact that gpus do fully dynamic buffer management, meaning: - buffers get allocated and destroyed on the fly, meaning static reqbuf just went out the window (we obviously cache buffer objects and reuse them for performance, as long as the processing pipeline doesn't really change). - buffers get moved around in memory, meaning you either need full-blown sync-objects with a callback to drivers to tear-down mappings on-demand, or every driver needs to guarnatee to call put_scatterlist in a reasonable short time. The latter is probably the more natural thing for v4l devices. I'm really not convinced if it is possible to go for the completely dynamic buffer management, especially if we are implementing a proof-of-concept solution. Please notice the following facts: 1. all v4l2 drivers do the 'static' buffer management - memory is being allocated on REQBUF() call and then mapped permanently into both userspace and dma (io) address space. Is this strictly true if we are introducing a new 'enum v4l2_memory' for dmabuf's? Shouldn't that give us some flexibility, especially if the v4l2 device is only the importer, not the allocator, of the memory. and a couple
Re: [Linaro-mm-sig] [PATCHv16 0/9] Contiguous Memory Allocator
On Mon, Oct 10, 2011 at 1:58 AM, Ohad Ben-Cohen o...@wizery.com wrote: On Fri, Oct 7, 2011 at 6:27 PM, Arnd Bergmann a...@arndb.de wrote: IMHO it would be good to merge the entire series into 3.2, since the ARM portion fixes an important bug (double mapping of memory ranges with conflicting attributes) that we've lived with for far too long, but it really depends on how everyone sees the risk for regressions here. If something breaks in unfixable ways before the 3.2 release, we can always revert the patches and have another try later. I didn't thoroughly review the patches, but I did try them out (to be precise, I tried v15) on an OMAP4 PandaBoard, and really liked the result. The interfaces seem clean and convenient and things seem to work (I used a private CMA pool with rpmsg and remoteproc, but also noticed that several other drivers were utilizing the global pool). And with this in hand we can finally ditch the old reserve+ioremap approach. So from a user perspective, I sure do hope this patch set gets into 3.2; hopefully we can just fix anything that would show up during the 3.2 cycle. Marek, Michal (and everyone involved!), thanks so much for pushing this! Judging from the history of this patch set and the areas that it touches (and from the number of LWN articles ;) it looks like a considerable feat. FWIW, feel free to add my Tested-by: Ohad Ben-Cohen o...@wizery.com Marek, I guess I forgot to mention earlier, but I've been using CMA for a couple of weeks now with omapdrm driver, so you can also add my: Tested-by: Rob Clark r...@ti.com BR, -R (small and optional comment: I think it'd be nice if dma_declare_contiguous would fail if called too late, otherwise users of that misconfigured device will end up using the global pool without easily knowing that something went wrong) Thanks, Ohad. ___ Linaro-mm-sig mailing list linaro-mm-...@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] Buffer sharing proof-of-concept
On Thu, Aug 4, 2011 at 3:58 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote: On Wed, Aug 3, 2011 at 17:12, Jordan Crouse jcro...@codeaurora.org wrote: On 08/03/2011 03:33 AM, Tom Cooksey wrote: Passing buffer meta-data around was also discussed yesterday. Again, the general consensus seemed to be that this data should be kept out of the kernel. The userspace application should know what the buffer format etc. is and can provide that information to the relevant device APIs when is passes in the buffer. True, but APIs change slowly. Some APIs *cough* OpenMAX *cough* are damn near immutable over the life time of a average software release. A blob of data attached to a buffer can evolve far more rapidly and be far more extensible and much more vendor specific. This isn't an new idea, I think the DRM/GEM guys have tossed it around too. Erh, no. For sharing gem buffers between process (i.e. between direct rendering clients and the compositor, whatever that is) we just hand around the gem id in the kernel. Some more stuff gets passed around in userspace in a generic way (e.g. DRI2 passes the buffer type (depth, stencil, color, ...) and the stride), but that's it. Everything else is driver specific and mostly not even passed around explicitly and just agreed upon implicitly. E.g. running the wrong XvMC decoder lib for your Xorg Intel driver will result in garbage on the screen. There's a bit more leeway between Mesa and the Xorg driver because they're released independantly, but it's very ad-hoc (i.e. oops, that buffer doesn't fit the requirements of the new code, must be an old Xorg driver, so switch to the compat paths in Mesa). But my main fear with the blob attached to the buffer idea is that sooner or later it'll be part of the kernel/userspace interface of the buffer sharing api (hey, it's there, why not use it?). And the timeframe for deprecating the kernel abi is 5-10 years and yes I've tried to dodge that and got shot at. hmm, there would be a dmabuf-private ptr in struct dmabuf. Normally that should be for private data of the buffer allocator, but I guess it could be (ab)used for under the hood communication between drivers a platform specific way. It does seem a bit hacky, but at least it does not need to be exposed to userspace. (Or maybe a better option is just 'rm -rf omx' ;-)) BR, -R Imo a better approach is to spec (_after_ the kernel buffer sharing works) a low-level userspace api that drivers need to implement (like the EGL Mesa extensions used to make Wayland work on gem drivers). -Daniel -- Daniel Vetter daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Linaro-mm-sig mailing list linaro-mm-...@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] Buffer sharing proof-of-concept
On Thu, Aug 4, 2011 at 7:34 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote: On Thu, Aug 4, 2011 at 13:14, Clark, Rob r...@ti.com wrote: hmm, there would be a dmabuf-private ptr in struct dmabuf. Normally that should be for private data of the buffer allocator, but I guess it could be (ab)used for under the hood communication between drivers a platform specific way. It does seem a bit hacky, but at least it does not need to be exposed to userspace. An idea that just crossed my mind: I think we should seperate two kinds of meta-data about a shared piece of data (dmabuf): - logical metadata about it's contents, like strides, number of dimensions, pixel format/vbo layout, ... Imo that stuff doesn't belong into the buffer sharing simply because it's an a) awful mess and b) gem doesn't know it. To recap: only userspace knows this stuff and has to make sense of the data in the buffer by either setting up correct gpu command streams or telling kms what format this thing it needs to scan out has. for sure, I think we've ruled out putting this sort of stuff in 'struct dmabuf'.. (notwithstanding any data stuffed away in a 'void * priv' on some platform or another) - metadata about the physical layout: tiling layout, memory bank interleaving, page size for the iommu/contiguous buffer. As far as I can tell (i.e. please correct) for embedded systems this just depends on the (in)saneness of to iommu/bus/memory controller sitting between the ic block and it's data. So it would be great if we could completely hide this from drivers (and userspace) an shovel it into the dma subsystem (as private data). Unfortunately at least on Intel tiling needs to be known by the iommu code, the core gem kernel driver code and the userspace drivers. Otoh using tiled buffers for sharing is maybe a bit ambitious for the first cut. So maybe we can just ignore tiling which largely just leaves handling iommus restrictions (or their complete lack) which looks doable. btw, on intel (or desktop platforms in general), could another device (say a USB webcam) DMA directly to a tiled buffer via the GART... ie. assuming you had some way to pre-fault some pages into the GART before the DMA happened. I was sort of expecting 'struct dmabuf' to basically just be a scatterlist and some fxn ptrs, nothing about TILING.. not sure if we need an fxn ptr to ask the buffer allocator to generate some pages/addresses that some other DMA engine could write to (so you could do something like pre-faulting the buffer into some sort of GART) and again release the pages/addresses when DMA completes. BR, -R (Or maybe a better option is just 'rm -rf omx' ;-)) Yeah ;-) -Daniel -- Daniel Vetter daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 1/6] drivers: base: add shared buffer framework
On Tue, Aug 2, 2011 at 4:49 AM, Marek Szyprowski m.szyprow...@samsung.com wrote: From: Tomasz Stanislawski t.stanisl...@samsung.com +/** + * shrbuf_import() - obtain shrbuf structure from a file descriptor + * @fd: file descriptor + * + * The function obtains an instance of a shared buffer from a file descriptor + * Call sb-put when imported buffer is not longer needed + * + * Returns pointer to a shared buffer or error pointer on failure + */ +struct shrbuf *shrbuf_import(int fd) +{ + struct file *file; + struct shrbuf *sb; + + /* obtain a file, assure that it will not be released */ + file = fget(fd); + /* check if descriptor is incorrect */ + if (!file) + return ERR_PTR(-EBADF); + /* check if dealing with shrbuf-file */ + if (file-f_op != shrbuf_fops) { Hmm.. I was liking the idea of letting the buffer allocator provide the fops, so it could deal w/ mmap'ing and that sort of thing. Although this reminds me that we would need a sane way to detect if someone tries to pass in a non-umm/dmabuf/shrbuf/whatever fd. + fput(file); + return ERR_PTR(-EINVAL); + } + /* add user of shared buffer */ + sb = file-private_data; + sb-get(sb); + /* release the file */ + fput(file); + + return sb; +} +/** + * struct shrbuf - shared buffer instance + * @get: increase number of a buffer's users + * @put: decrease number of a buffer's user, release resources if needed + * @dma_addr: start address of a contiguous buffer + * @size: size of a contiguous buffer + * + * Both get/put methods are required. The structure is dedicated for + * embedding. The fields dma_addr and size are used for proof-of-concept + * purpose. They will be substituted by scatter-gatter lists. + */ +struct shrbuf { + void (*get)(struct shrbuf *); + void (*put)(struct shrbuf *); Hmm, is fput()/fget() and fops-release() not enough? Ie. original buffer allocator provides fops, incl the fops-release(), which may in turn be decrementing an internal ref cnt used by the allocating driver.. so if your allocating driver was the GPU, it's release fxn might be calling drm_gem_object_unreference_unlocked().. and I guess there must be something similar for videobuf2. (Previous comment about letting the allocating driver implement fops notwithstanding.. but I guess there must be some good way to deal with that.) BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] drm: add overlays as first class KMS objects
On Fri, May 13, 2011 at 8:02 PM, Jesse Barnes jbar...@virtuousgeek.org wrote: On Fri, 13 May 2011 18:16:30 +0200 Daniel Vetter daniel.vet...@ffwll.ch wrote: Hi Jesse, Discussion here in Budapest with v4l and embedded graphics folks was extremely fruitful. A few quick things to take away - I'll try to dig through all the stuff I've learned more in-depth later (probably in a blog post or two): Hi Daniel, thanks for writing this up - embedded graphics is insane. The output routing/blending/whatever currently shipping hw can do is crazy and kms as-is is nowhere near up to snuff to support this. We've discussed omap4 and a ti chip targeted at video surveillance as use cases. I'll post block diagrams and explanations some when later. Yeah I expected that; even just TVs can have really funky restrictions about z order and blend capability. - we should immediately stop to call anything an overlay. It's a confusing concept that has a different meaning in every subsystem and for every hw manufacturer. More sensible names are dma fifo engines for things that slurp in planes and make them available to the display subsystem. Blend engines for blocks that take multiple input pipes and overlay/underlay/blend them together. Display subsytem/controller for the aggregate thing including encoders/resizers/outputs and especially the crazy routing network that connects everything. How about just display plane then? Specifically in the context of display output hardware... display plane could be a good name.. actually in omap4 case it is a single dma engine that is multiplexing fetches for however many attached video pipes.. that is perhaps an implementation detail, but it makes display plane sound nicer as a name 1) Splitting the crtc object into two objects: crtc with associated output mode (pixel clock, encoders/connectors) and dma engines (possibly multiple) that feed it. omap 4 has essentially just 4 dma engines that can be freely assigned to the available outputs, so a distinction between normal crtcs and overlay engines just does not make sense. There's the major open question of where to put the various attributes to set up the output pipeline. Also some of these attributes might need to be changed atomicly together with pageflips on a bunch of dma engines all associated with the same crtc on the next vsync, e.g. output position of an overlaid video buffer. Yeah, that's a good goal, and pretty much what I had in mind here. However, breaking the existing interface is a non-starter, so either we need a new CRTC object altogether, or we preserve the idea of a primary plane (whatever that means for a given platform) that's tied to each CRTC, which each additional plane described in a separate structure. Z order and blend restrictions will have to be communicated separately I think... In the cases I can think of, you'll always have a primary plane, so userspace need not explicitly specify it. But I think you want the driver to pick which display plane to be automatically hooked between the primary fb and crtc, or at least this should be the case if some new bit is set in driver_features to indicate the driver supports multiple display planes per crtc. BR, -R Thanks, -- Jesse Barnes, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Yet another memory provider: can linaro organize a meeting?
On Wed, Mar 16, 2011 at 3:14 AM, Kyungmin Park kmp...@infradead.org wrote: Rough schedules. 1. Warsaw meetings (3/16~3/18): mostly v4l2 person and some SoC vendors Make a consensence at media developers. and share the information. Please note that it's v4l2 brainstorming meeting. so memory management is not the main issue. 2. ELC (4/11~4/13): DRM, DRI and v4l2 person. Fyi, I should be at ELC, at least for a day or two.. it would be nice, as Andy suggested on other thread, to carve out a timeslot to discuss in advance, because I'm not sure that I'll be able to be there the entire time.. BR, -R Discuss GEM/TTM is acceptable for non-X86 system and find out the which modules are acceptable. We studied the GEM for our environment. but it's too huge and not much benefit for us since current frameworks are enough. The missing is that no generic memory passing mechanism. We need the generic memory passing interface. that's all. 3. Linaro (5/9~5/13): ARM, SoC vendors and v4l2 persons. I hope several person are anticipated and made a small step for final goal. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [st-ericsson] v4l2 vs omx for camera
On Thu, Feb 24, 2011 at 7:10 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: On Thursday 24 February 2011 14:04:19 Hans Verkuil wrote: On Thursday, February 24, 2011 13:29:56 Linus Walleij wrote: 2011/2/23 Sachin Gupta sachin.gu...@linaro.org: The imaging coprocessor in today's platforms have a general purpose DSP attached to it I have seen some work being done to use this DSP for graphics/audio processing in case the camera use case is not being tried or also if the camera usecases does not consume the full bandwidth of this dsp.I am not sure how v4l2 would fit in such an architecture, Earlier in this thread I discussed TI:s DSPbridge. In drivers/staging/tidspbridge http://omappedia.org/wiki/DSPBridge_Project you find the TI hackers happy at work with providing a DSP accelerator subsystem. Isn't it possible for a V4L2 component to use this interface (or something more evolved, generic) as backend for assorted DSP offloading? So using one kernel framework does not exclude using another one at the same time. Whereas something like DSPbridge will load firmware into DSP accelerators and provide control/datapath for that, this can in turn be used by some camera or codec which in turn presents a V4L2 or ALSA interface. Yes, something along those lines can be done. While normally V4L2 talks to hardware it is perfectly fine to talk to a DSP instead. The hardest part will be to identify the missing V4L2 API pieces and design and add them. I don't think the actual driver code will be particularly hard. It should be nothing more than a thin front-end for the DSP. Of course, that's just theory at the moment :-) The problem is that someone has to do the actual work for the initial driver. And I expect that it will be a substantial amount of work. Future drivers should be *much* easier, though. A good argument for doing this work is that this API can hide which parts of the video subsystem are hardware and which are software. The application really doesn't care how it is organized. What is done in hardware on one SoC might be done on a DSP instead on another SoC. But the end result is pretty much the same. I think the biggest issue we will have here is that part of the inter- processors communication stack lives in userspace in most recent SoCs (OMAP4 comes to mind for instance). This will make implementing a V4L2 driver that relies on IPC difficult. It's probably time to start seriously thinking about userspace drivers/librairies/middlewares/frameworks/whatever, at least to clearly tell chip vendors what the Linux community expects. I suspect more of the IPC framework needs to move down to the kernel.. this is the only way I can see to move the virt-phys address translation to a trusted layer. I'm not sure how others would feel about pushing more if the IPC stack down to the kernel, but at least it would make it easier for a v4l2 driver to leverage the coprocessors.. BR, -R -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [st-ericsson] v4l2 vs omx for camera
On Thu, Feb 24, 2011 at 2:19 PM, Edward Hervey bilb...@gmail.com wrote: What *needs* to be solved is an API for data allocation/passing at the kernel level which v4l2,omx,X,GL,vdpau,vaapi,... can use and that userspace (like GStreamer) can pass around, monitor and know about. yes yes yes yes!! vaapi/vdpau is half way there, as they cover sharing buffers with X/GL.. but sadly they ignore camera. There are a few other inconveniences with vaapi and possibly vdpau.. at least we'd prefer to have an API the covered decoding config data like SPS/PPS and not just slice data since config data NALU's are already decoded by our accelerators.. That is a *massive* challenge on its own. The choice of using GStreamer or not ... is what you want to do once that challenge is solved. Regards, Edward P.S. GStreamer for Android already works : http://www.elinux.org/images/a/a4/Android_and_Gstreamer.ppt yeah, I'm aware of that.. someone please convince google to pick it up and drop stagefright so we can only worry about a single framework between android and linux (and then I look forward to playing with pitivi on an android phone :-)) BR, -R ___ gstreamer-devel mailing list gstreamer-de...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [st-ericsson] v4l2 vs omx for camera
On Thu, Feb 24, 2011 at 7:17 AM, Hans Verkuil hverk...@xs4all.nl wrote: There are two parts to this: first of all you need a way to allocate large buffers. The CMA patch series is available (but not yet merged) that does this. I'm not sure of the latest status of this series. The other part is that everyone can use and share these buffers. There isn't anything for this yet. We have discussed this in the past and we need something generic for this that all subsystems can use. It's not a good idea to tie this to any specific framework like GEM. Instead any subsystem should be able to use the same subsystem-independent buffer pool API. yeah, doesn't need to be GEM.. but should at least inter-operate so we can share buffers with the display/gpu.. [snip] But maybe it would be nice to have a way to have sensor driver on the linux side, pipelined with hw and imaging drivers on a co-processor for various algorithms and filters with configuration all exposed to userspace thru MCF.. I'm not immediately sure how this would work, but it sounds nice at least ;-) MCF? What does that stand for? sorry, v4l2 media controller framework BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [st-ericsson] v4l2 vs omx for camera
On Fri, Feb 18, 2011 at 10:39 AM, Robert Fekete robert.fek...@linaro.org wrote: Hi, In order to expand this knowledge outside of Linaro I took the Liberty of inviting both linux-media@vger.kernel.org and gstreamer-de...@lists.freedesktop.org. For any newcomer I really recommend to do some catch-up reading on http://lists.linaro.org/pipermail/linaro-dev/2011-February/thread.html (v4l2 vs omx for camera thread) before making any comments. And sign up for Linaro-dev while you are at it :-) To make a long story short: Different vendors provide custom OpenMax solutions for say Camera/ISP. In the Linux eco-system there is V4L2 doing much of this work already and is evolving with mediacontroller as well. Then there is the integration in Gstreamer...Which solution is the best way forward. Current discussions so far puts V4L2 greatly in favor of OMX. Please have in mind that OpenMAX as a concept is more like GStreamer in many senses. The question is whether Camera drivers should have OMX or V4L2 as the driver front end? This may perhaps apply to video codecs as well. Then there is how to in best of ways make use of this in GStreamer in order to achieve no copy highly efficient multimedia pipelines. Is gst-omx the way forward? just fwiw, there were some patches to make v4l2src work with userptr buffers in case the camera has an mmu and can handle any random non-physically-contiguous buffer.. so there is in theory no reason why a gst capture pipeline could not be zero copy and capture directly into buffers allocated from the display Certainly a more general way to allocate buffers that any of the hw blocks (display, imaging, video encoders/decoders, 3d/2d hw, etc) could use, and possibly share across-process for some zero copy DRI style rendering, would be nice. Perhaps V4L2_MEMORY_GEM? Let the discussion continue... On 17 February 2011 14:48, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: On Thursday 10 February 2011 08:47:15 Hans Verkuil wrote: On Thursday, February 10, 2011 08:17:31 Linus Walleij wrote: On Wed, Feb 9, 2011 at 8:44 PM, Harald Gustafsson wrote: OMX main purpose is to handle multimedia hardware and offer an interface to that HW that looks identical indenpendent of the vendor delivering that hardware, much like the v4l2 or USB subsystems tries to do. And yes optimally it should be implemented in drivers/omx in Linux and a user space library on top of that. Thanks for clarifying this part, it was unclear to me. The reason being that it seems OMX does not imply userspace/kernelspace separation, and I was thinking more of it as a userspace lib. Now my understanding is that if e.g. OpenMAX defines a certain data structure, say for a PCM frame or whatever, then that exact struct is supposed to be used by the kernelspace/userspace interface, and defined in the include file exported by the kernel. It might be that some alignment also needs to be made between 4vl2 and other OS's implementation, to ease developing drivers for many OSs (sorry I don't know these details, but you ST-E guys should know). The basic conflict I would say is that Linux has its own API+ABI, which is defined by V4L and ALSA through a community process without much thought about any existing standard APIs. (In some cases also predating them.) By the way IL is about to finalize version 1.2 of OpenMAX IL which is more than a years work of aligning all vendors and fixing unclear and buggy parts. I suspect that the basic problem with Khronos OpenMAX right now is how to handle communities - for example the X consortium had something like the same problem a while back, only member companies could partake in the standard process, and they need of course to pay an upfront fee for that, and the majority of these companies didn't exactly send Linux community members to the meetings. And now all the companies who took part in OpenMAX somehow end up having to do a lot of upfront community work if they want to drive the API:s in a certain direction, discuss it again with the V4L and ALSA maintainers and so on. Which takes a lot of time and patience with uncertain outcome, since this process is autonomous from Khronos. Nobody seems to be doing this, I javen't seen a single patch aimed at trying to unify the APIs so far. I don't know if it'd be welcome. This coupled with strict delivery deadlines and a marketing will to state conformance to OpenMAX of course leads companies into solutions breaking the Linux kernelspace API to be able to present this. From my experience with OMX, one of the issues is that companies usually extend the API to fullfill their platform's needs, without going through any standardization process. Coupled with the lack of open and free reference implementation and test tools, this