RE: [RFC] Global video buffers pool / Samsung SoC's
Hello, On Wednesday, November 11, 2009 8:13 AM Harald Welte wrote: Hi Guennadi and others, first of all sorry for breaking the thread, but I am new to this list and could not find the message-id of the original mails nor a .mbox format archive for the list :( As I was one of the people giving comments to Guennadi's talk at ELCE, let me give some feedback here, too. I'm currently helping the Samsung System LSI Linux kernel team with bringing their various ports for their ARM SoCs mainline. So far we have excluded much of the multimedia related parts due to the complexity and lack of kernel infrastructure. Let me briefly describe the SoCs in question: They have an ARM9, ARM11 or Cortex-A8 core and multiple video input and output paths, such as * camera interface * 2d acceleration engine * 3d acceleration engine * post-processor (colorspace conversion, scaling, rotating) * LCM output for classic digital RGB+sync interfaces * TV scaler * TV encoder * HDMI interface (simple serial-HDMI with DMA from/to system memory) * Transport Stream interface (MPEG-transport stream input with PID filter which can DMA to system memory * MIPI-HSI LCM output device * Multi-Function codec for H.264 and other stuff * Hardware JPEG codec. plus even some more that I might have missed. One of the issues is that, at least in many current and upcoming products, all those integrated peripherals can only use physically contiguous memory. For the classic output path (e.g. Xorg+EXA+XAA+3D), that is fine. The framebuffer driver can simply allocate some large chunk of physical system memory at boot time, map that into userspace and be happy. This includes things like Xvideo support in the Xserver. Also, HDMI output and TV output can be handled inside X or switch to a new KMS model. However, the input side looks quite different, On the one hand, we have the camera driver, but possibly HDMI input and transport stream input, are less easy. also, given the plethora of such subsytems in a device, you definitely don't want to have one static big boot-time allocation for each of those devices. You don't want to waste that much memory all the time just in case at some time you start an application that actually needs this. Also, it is unlikely that all of the subsystems will operate at the same time. So having an in-kernel allocator for physically contiguous memory is something that is needed to properly support this hardware. At boot time you allocate one big pool, from which you then on-demand allocate and free physically contiguous buffers, even at much later time. Furthermore, think of something like the JPEG codec acceleration, which you also want to use zero-copy from userspace. So userpsace (like libjpeg for decode, or a camera application for encode)would also need to be able to allocate such a buffer inside the kernel for input and output data of the codec, mmap it, put its jpeg data into it and then run the actual codec. How would that relate to the proposed global video buffers pool? Well, I think before thinking strictly about video buffers for camera chips, we have to think much more generically! We have been working on multimedia drivers for Samsung SoCs for a while and we already got through all these problems. The current version of our drivers use private ioctls, zero-copy user space memory access and our custom memory manager. Our solution is described in the following thread: http://article.gmane.org/gmane.linux.ports.arm.kernel/66463 We posted it as a base (or reference) for the discussion on Global Video Buffers Pool. We also found that most of the multimedia devices (FIMC, JPEG, Rotator/Scaler, MFC, Post Processor, TVOUT, maybe others) can be successfully wrapped into V4L2 framework. We only need to extend the framework a bit, but this is doable and has been discussed on V4L2 mini summit on LPC 2009. The most important issue is how the device that only processes multimedia data from one buffer in system memory to another should be implemented in V4L2 framework. Quite long, but successful discussion can be found here: http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/10668/ We are currently implementing a reference test driver for v4l2 mem2mem device. The other important issues that came up while preparing multimedia drivers for v4l2 framework is the proper support for multi-plane buffers (like these required by MFC on newer Samsung SoCs). Here are more details: http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/11212/ Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Global video buffers pool / Samsung SoC's
On Wed, 11 Nov 2009, Harald Welte wrote: Hi Guennadi and others, first of all sorry for breaking the thread, but I am new to this list and could not find the message-id of the original mails nor a .mbox format archive for the list :( As I was one of the people giving comments to Guennadi's talk at ELCE, let me give some feedback here, too. Adding the author of the RFC to CC. I'm currently helping the Samsung System LSI Linux kernel team with bringing their various ports for their ARM SoCs mainline. So far we have excluded much of the multimedia related parts due to the complexity and lack of kernel infrastructure. Let me briefly describe the SoCs in question: They have an ARM9, ARM11 or Cortex-A8 core and multiple video input and output paths, such as * camera interface * 2d acceleration engine * 3d acceleration engine * post-processor (colorspace conversion, scaling, rotating) * LCM output for classic digital RGB+sync interfaces * TV scaler * TV encoder * HDMI interface (simple serial-HDMI with DMA from/to system memory) * Transport Stream interface (MPEG-transport stream input with PID filter which can DMA to system memory * MIPI-HSI LCM output device * Multi-Function codec for H.264 and other stuff * Hardware JPEG codec. plus even some more that I might have missed. One of the issues is that, at least in many current and upcoming products, all those integrated peripherals can only use physically contiguous memory. For the classic output path (e.g. Xorg+EXA+XAA+3D), that is fine. The framebuffer driver can simply allocate some large chunk of physical system memory at boot time, map that into userspace and be happy. This includes things like Xvideo support in the Xserver. Also, HDMI output and TV output can be handled inside X or switch to a new KMS model. However, the input side looks quite different, On the one hand, we have the camera driver, but possibly HDMI input and transport stream input, are less easy. also, given the plethora of such subsytems in a device, you definitely don't want to have one static big boot-time allocation for each of those devices. You don't want to waste that much memory all the time just in case at some time you start an application that actually needs this. Also, it is unlikely that all of the subsystems will operate at the same time. So having an in-kernel allocator for physically contiguous memory is something that is needed to properly support this hardware. At boot time you allocate one big pool, from which you then on-demand allocate and free physically contiguous buffers, even at much later time. Furthermore, think of something like the JPEG codec acceleration, which you also want to use zero-copy from userspace. So userpsace (like libjpeg for decode, or a camera application for encode)would also need to be able to allocate such a buffer inside the kernel for input and output data of the codec, mmap it, put its jpeg data into it and then run the actual codec. How would that relate to the proposed global video buffers pool? Well, I think before thinking strictly about video buffers for camera chips, we have to think much more generically! Also, has anyone investigated if GEM or TTM could be used in unmodified or modified form for this? After all, they are intended to allocate (and possibly map) video buffers... Don't think I can contribute much to the actual matter of the discussion, yes, there is a problem, the RFC is trying to address it, there have been attempts to implement similar things before (as you write above), so, it just has to eventually be done. One question to your SoCs though - do they have SRAM? usable and sufficient for graphics buffers? In any case any such implementation will have to be able to handle RAMs other than main system memory too, including card memory, NUMA, sparse RAM, etc., which is probably obvious anyway. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/ -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Global video buffers pool / Samsung SoC's
Hi Guennadi and others, first of all sorry for breaking the thread, but I am new to this list and could not find the message-id of the original mails nor a .mbox format archive for the list :( As I was one of the people giving comments to Guennadi's talk at ELCE, let me give some feedback here, too. I'm currently helping the Samsung System LSI Linux kernel team with bringing their various ports for their ARM SoCs mainline. So far we have excluded much of the multimedia related parts due to the complexity and lack of kernel infrastructure. Let me briefly describe the SoCs in question: They have an ARM9, ARM11 or Cortex-A8 core and multiple video input and output paths, such as * camera interface * 2d acceleration engine * 3d acceleration engine * post-processor (colorspace conversion, scaling, rotating) * LCM output for classic digital RGB+sync interfaces * TV scaler * TV encoder * HDMI interface (simple serial-HDMI with DMA from/to system memory) * Transport Stream interface (MPEG-transport stream input with PID filter which can DMA to system memory * MIPI-HSI LCM output device * Multi-Function codec for H.264 and other stuff * Hardware JPEG codec. plus even some more that I might have missed. One of the issues is that, at least in many current and upcoming products, all those integrated peripherals can only use physically contiguous memory. For the classic output path (e.g. Xorg+EXA+XAA+3D), that is fine. The framebuffer driver can simply allocate some large chunk of physical system memory at boot time, map that into userspace and be happy. This includes things like Xvideo support in the Xserver. Also, HDMI output and TV output can be handled inside X or switch to a new KMS model. However, the input side looks quite different, On the one hand, we have the camera driver, but possibly HDMI input and transport stream input, are less easy. also, given the plethora of such subsytems in a device, you definitely don't want to have one static big boot-time allocation for each of those devices. You don't want to waste that much memory all the time just in case at some time you start an application that actually needs this. Also, it is unlikely that all of the subsystems will operate at the same time. So having an in-kernel allocator for physically contiguous memory is something that is needed to properly support this hardware. At boot time you allocate one big pool, from which you then on-demand allocate and free physically contiguous buffers, even at much later time. Furthermore, think of something like the JPEG codec acceleration, which you also want to use zero-copy from userspace. So userpsace (like libjpeg for decode, or a camera application for encode)would also need to be able to allocate such a buffer inside the kernel for input and output data of the codec, mmap it, put its jpeg data into it and then run the actual codec. How would that relate to the proposed global video buffers pool? Well, I think before thinking strictly about video buffers for camera chips, we have to think much more generically! Also, has anyone investigated if GEM or TTM could be used in unmodified or modified form for this? After all, they are intended to allocate (and possibly map) video buffers... Regards, Harald -- - Harald Welte lafo...@gnumonks.org http://laforge.gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Global video buffers pool
Hi Guennadi, On Tuesday 27 October 2009 08:49:15 Guennadi Liakhovetski wrote: Hi This is a general comment to the whole (contiguous) video buffer work: having given a talk at the ELC-E in Grenoble on soc-camera, I mentioned briefly a few related RFCs, including this one. I've got a couple of comments back, including the following ones (which is to say, opinions are not mine and may or may not be relevant, I'm just fulfilling my promise to pass them on;)): 1) has been requested to move this discussion to a generic mailing list like LKML. 2) the reason for (1) was, obviously, to consider making such a buffer pool also available to other subsystems, of which video / framebuffer drivers have been mentioned as likely interested parties. Those are good ideas. The global video buffers pool will sooner or later (and my guess is sooner) need to interact with X buffers (either for Xv rendering, or opengl textures). This needs to be discussed globally on the LKML. (btw, not sure if this has also been mentioned among those wishes - what about DVB? Can they also use such buffers?) If I'm not mistaken DVB uses read/write syscalls to transfer data from/to the driver. A video buffers pool wouldn't fit well in that scheme. -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Global video buffers pool
Laurent Pinchart laurent.pinchart at ideasonboard.com writes: Hi Stefan, On Monday 28 September 2009 16:04:58 Stefan.Kost at nokia.com wrote: hi, -Original Message- From: ext Laurent Pinchart [mailto:laurent.pinchart at ideasonboard.com] Sent: 16 September, 2009 18:47 To: linux-media at vger.kernel.org; Hans Verkuil; Sakari Ailus; Cohen David.A (Nokia-D/Helsinki); Koskipaa Antti (Nokia-D/Helsinki); Zutshi Vimarsh (Nokia-D/Helsinki); Kost Stefan (Nokia-D/Helsinki) Subject: [RFC] Global video buffers pool Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. Sorry for ther very late reply. No worries, better late than never. I have been thinking about the problem on a bit broader scale and see the need for something more kernel wide. E.g. there is some work done from intel for graphics: http://keithp.com/blogs/gem_update/ and this is not so much embedded even. If there buffer pools are v4l2specific then we need to make all those other subsystems like xvideo, opengl, dsp-bridges become v4l2 media controllers. The global video buffers pool topic has been discussed during the v4l2 mini- summit at Portland last week, and we all agreed that it needs more research. The idea of having pools at the media controller level has been dropped in favor of a kernel-wide video buffers pool. Whether we can make the buffers pool not v4l2-specific still needs to be tested. As you have pointed out, we currently have a GPU memory manager in the kernel, and being able to interact with it would be very interesting if we want to DMA video data to OpenGL texture buffers for instance. I'm not sure if that would be possible though, as the GPU and the video acquisition hardware might have different memory requirements, at least in the general case. I will contact the GEM guys at Intel to discuss the topic. If we can't share the buffers between the GPU and the rest of the system, we could at least create a V4L2 wrapper on top of the DSP bridge core (which will require a major cleanup/restructuring), making it possible to share video buffers between the ISP and the DSP. TI has been providing this sort of contiguous buffer support for quite a few years now. TI provides a SW package named LinuxUtils, and it contains a module named CMEM (stand for Contiguous MEMory manager). Latest LinuxUtils release, contains cdocs of CMEM: http://software- dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/linuxutils/2_24_03/exports/l inuxutils_2_24_03.tar.gz And the background/usage article here: http://tiexpressdsp.com/index.php/CMEM_Overview CMEM solves lots of the same sorts of things that the driver described in this thread does. However, it doesn't integrate into other drivers, and it's accessed through the CMEM user interface. Also, CMEM alleviates some of the issues raised in this thread since it uses memory not known to the kernel (user carves out a chunk by reducing kernel memory through u-boot mem= param), which IMO can be both good and bad (good - alleviates locking/unavailable memory issues, bad - doesn't cooperate with the kernel in getting memory, requiring user intervention). Regards, Robert Tivy MGTS Systems Software Texas Instruments, Santa Barbara -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] Global video buffers pool
hi, -Original Message- From: ext Laurent Pinchart [mailto:laurent.pinch...@ideasonboard.com] Sent: 16 September, 2009 18:47 To: linux-media@vger.kernel.org; Hans Verkuil; Sakari Ailus; Cohen David.A (Nokia-D/Helsinki); Koskipaa Antti (Nokia-D/Helsinki); Zutshi Vimarsh (Nokia-D/Helsinki); Kost Stefan (Nokia-D/Helsinki) Subject: [RFC] Global video buffers pool Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. Sorry for ther very late reply. I have been thinking about the problem on a bit broader scale and see the need for something more kernel wide. E.g. there is some work done from intel for graphics: http://keithp.com/blogs/gem_update/ and this is not so much embedded even. If there buffer pools are v4l2specific then we need to make all those other subsystems like xvideo, opengl, dsp-bridges become v4l2 media controllers. Stefan All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. Introduction The V4L2 video buffers handling API makes use of a queue of video buffers to exchange data between video devices and userspace applications (the read method don't expose the buffers objects directly but uses them underneath). Although quite efficient for simple video capture and output use cases, the current implementation doesn't scale well when used with complex hardware and large video resolutions. This RFC will list the current limitations of the API and propose a possible solution. The document is at this stage a work in progress. Its main purpose is to be used as support material for discussions at the Linux Plumbers Conference. Limitations === Large buffers allocation Many video devices still require physically contiguous memory. The introduction of IOMMUs on high-end systems will probably make that a distant nightmare in the future, but we have to deal with this situation for the moment (I'm not sure if the most recent PCI devices support scatter-gather lists, but many embedded systems still require physically contiguous memory). Allocating large amounts of physically contiguous memory needs to be done as soon as possible after (or even during) system bootup, otherwise memory fragmentation will cause the allocation to fail. As the amount of required video memory depends on the frame size and the number of buffers, the driver can't pre-allocate the buffers beforehand. A few drivers allocate a large chunk of memory when they are loaded and then use it when a userspace application requests video buffers to be allocated. However, that method requires guessing how much memory will be needed, and can lead to waste of system memory (if the guess was too large) or allocation failures (if the guess was too low). Buffer queuing latency --- VIDIOC_QBUF is becoming a performance bottleneck when capturing large images on some systems (especially in the embedded world). When capturing high resolution still pictures, the VIDIOC_QBUF delay adds to the shot latency, making the camera appear slow to the user. The delay is caused by several operations required by DMA transfers that all happen when queuing buffers. - Cache coherency management When the processor has a non-coherent cache (which is the case with most embedded devices, especially ARM-based) the device driver needs to invalidate (for video capture) or flush (for video output) the cache (either a range, or the whole cache) every time a buffer is queued. This ensures that stale data in the cache will not be written back to memory during or after DMA and that all data written by the CPU is visible to the device. Invalidating the cache for large resolutions take a considerable amount of time. Preliminary tests showed that cache invalidation for a 5MP buffer requires several hundreds of milliseconds on an OMAP3 platform for range invalidation, or several tens of milliseconds when invalidating the whole D cache. When video buffers are passed between two devices (for instance when passing the same USERPTR buffer to a video capture device and a hardware codec) without any userspace access to the memory, CPU cache invalidation/flushing isn't required on either side (video capture and hardware codec) and could be skipped. - Memory locking and IOMMU Drivers need to lock the video buffer pages in memory to make sure that the physical pages will not be freed while DMA is in progress under low-memory conditions. This requires looping over all pages (typically 4kB long) that back the video buffer (10MB for a 5MP YUV image) and takes a considerable amount
Re: [RFC] Global video buffers pool
Hi Stefan, On Monday 28 September 2009 16:04:58 stefan.k...@nokia.com wrote: hi, -Original Message- From: ext Laurent Pinchart [mailto:laurent.pinch...@ideasonboard.com] Sent: 16 September, 2009 18:47 To: linux-media@vger.kernel.org; Hans Verkuil; Sakari Ailus; Cohen David.A (Nokia-D/Helsinki); Koskipaa Antti (Nokia-D/Helsinki); Zutshi Vimarsh (Nokia-D/Helsinki); Kost Stefan (Nokia-D/Helsinki) Subject: [RFC] Global video buffers pool Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. Sorry for ther very late reply. No worries, better late than never. I have been thinking about the problem on a bit broader scale and see the need for something more kernel wide. E.g. there is some work done from intel for graphics: http://keithp.com/blogs/gem_update/ and this is not so much embedded even. If there buffer pools are v4l2specific then we need to make all those other subsystems like xvideo, opengl, dsp-bridges become v4l2 media controllers. The global video buffers pool topic has been discussed during the v4l2 mini- summit at Portland last week, and we all agreed that it needs more research. The idea of having pools at the media controller level has been dropped in favor of a kernel-wide video buffers pool. Whether we can make the buffers pool not v4l2-specific still needs to be tested. As you have pointed out, we currently have a GPU memory manager in the kernel, and being able to interact with it would be very interesting if we want to DMA video data to OpenGL texture buffers for instance. I'm not sure if that would be possible though, as the GPU and the video acquisition hardware might have different memory requirements, at least in the general case. I will contact the GEM guys at Intel to discuss the topic. If we can't share the buffers between the GPU and the rest of the system, we could at least create a V4L2 wrapper on top of the DSP bridge core (which will require a major cleanup/restructuring), making it possible to share video buffers between the ISP and the DSP. -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Global video buffers pool
Hello Laurent, We have been developing quite simmilar solution for Samsung SoCs multimedia drivers that the one mentioned in this RFC. Our solution bases on the global buffer manager that provides buffers (contiguous in physical memory) to user applications. Then the application can pass the buffers (as input or output) to different multimedia device drivers. Please note that our solution is aimed at UMA systems, where all multimedia devices can access system memory directly. We decided not to use any special buffer identifiers. In our solution applications must mmap the buffer (even if they don't plan to read/write it directly) and pass the buffer user pointer to the multimedia driver. To get access to specified buffer we prepared a special layer that checks if the passed user pointer points to the buffer that is continuous in physical memory, locks properly the buffer memory and returns the buffer physical address. More details on this solution can be found here: http://thread.gmane.org/gmane.linux.ports.arm.kernel/56879 Using the user pointer access type gave us the possibility to directly transfer multimedia data to frame buffer memory and to create an SYSV SHMem area from it (by some additional hacks in kernel mm). This gave us the real power esspecially in hardware acceleration of XServer - with XSHM extensions we were able to blit frames directly from user application's buffer to the frame buffer memory. Our multimedia devices do not use V4L framework currently, but moving towards V4L2 is possible. Now let's get back to the RFC thesis. The idea behind the global memory pool is really good and especially required in embedded-like systems. One of the important features of the buffer manager is cache coherency control. User, who allocated a buffer can request the buffer should be mapped as cacheable area or not, depending on the aimed use case. Queueing non-cacheable buffers is faster of course (no cache flush is required), but CPU read access is much slower (note the write-combining here). A global memory pool should also reduce system memory requirements, however it should be kept in mind that some use cases might cause memory fragmentation issues. A pluginable memory management should also be considered. With some standard allocating methods like all buffers of the same size, first fit, best fit, etc in the buffer manager most of the typical usecases can be covered. Also some statistics on buffer allocation/deallocation and usage can be easily gathered with buffer manager. However one should consider whether introducing new v4l2 buffer access method (V4L2_MEMORY_POOL) is really required. One of the key features of the introduced pool buffer identifiers is the much quicker buffer locking, as no per-page locking needs to be done. However a simmilar effect can be achieved with USERPTR access method. User application can allocate the buffer from the buffer manager (global pool), mmap it and pass it to the driver with USERPTR method. The driver can quite easily check if the passed user pointer is a pointer to the buffer from the pool and then lock it quickly with the simmilar method we used in our drivers for SoCs multimedia hardware. Best regards -- Marek Szyprowski Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] Global video buffers pool
-Original Message- From: linux-media-ow...@vger.kernel.org [mailto:linux-media- ow...@vger.kernel.org] On Behalf Of Laurent Pinchart Sent: Wednesday, September 16, 2009 9:17 PM To: linux-media@vger.kernel.org; Hans Verkuil; Sakari Ailus; Cohen David Abraham; Koskipää Antti Jussi Petteri; Zutshi Vimarsh (Nokia- D-MSW/Helsinki); stefan.k...@nokia.com Subject: [RFC] Global video buffers pool Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. [Hiremath, Vaibhav] Thanks Laurent for putting this, I believe memory fragmentation is a critical issue for most of new the drivers. We need some sort of solution to address this. Please find some observations/issues/Q below - Introduction The V4L2 video buffers handling API makes use of a queue of video buffers to exchange data between video devices and userspace applications (the read method don't expose the buffers objects directly but uses them underneath). Although quite efficient for simple video capture and output use cases, the current implementation doesn't scale well when used with complex hardware and large video resolutions. This RFC will list the current limitations of the API and propose a possible solution. The document is at this stage a work in progress. Its main purpose is to be used as support material for discussions at the Linux Plumbers Conference. Limitations === Large buffers allocation Many video devices still require physically contiguous memory. The introduction of IOMMUs on high-end systems will probably make that a distant nightmare in the future, but we have to deal with this situation for the moment (I'm not sure if the most recent PCI devices support scatter- gather lists, but many embedded systems still require physically contiguous memory). Allocating large amounts of physically contiguous memory needs to be done as soon as possible after (or even during) system bootup, otherwise memory fragmentation will cause the allocation to fail. As the amount of required video memory depends on the frame size and the number of buffers, the driver can't pre-allocate the buffers beforehand. A few drivers allocate a large chunk of memory when they are loaded and then use it when a userspace application requests video buffers to be allocated. However, that method requires guessing how much memory will be needed, and can lead to waste of system memory (if the guess was too large) or allocation failures (if the guess was too low). [Hiremath, Vaibhav] Could it possible to fine tune this based on use-case. At-least on OMAP Display driver we have boot argument to control number of buffers and size of buffers which user can pass through boot time argument. The default setting is 3 buffers for max resolution (720P). With this it won't be guessing any more, right? Buffer queuing latency --- VIDIOC_QBUF is becoming a performance bottleneck when capturing large images on some systems (especially in the embedded world). When capturing high resolution still pictures, the VIDIOC_QBUF delay adds to the shot latency, making the camera appear slow to the user. The delay is caused by several operations required by DMA transfers that all happen when queuing buffers. - Cache coherency management [Hiremath, Vaibhav] Agreed. When the processor has a non-coherent cache (which is the case with most embedded devices, especially ARM-based) the device driver needs to invalidate (for video capture) or flush (for video output) the cache (either a range, or the whole cache) every time a buffer is queued. This ensures that stale data in the cache will not be written back to memory during or after DMA and that all data written by the CPU is visible to the device. Invalidating the cache for large resolutions take a considerable amount of time. Preliminary tests showed that cache invalidation for a 5MP buffer requires several hundreds of milliseconds on an OMAP3 platform for range invalidation, or several tens of milliseconds when invalidating the whole D cache. When video buffers are passed between two devices (for instance when passing the same USERPTR buffer to a video capture device and a hardware codec) without any userspace access to the memory, CPU cache invalidation/flushing isn't required on either side (video capture and hardware codec) and could be skipped. - Memory locking and IOMMU Drivers need to lock the video buffer pages in memory to make sure
Re: [RFC] Global video buffers pool
Hi Hans, On Thursday 17 September 2009 23:19:24 Hans Verkuil wrote: On Thursday 17 September 2009 20:49:49 Mauro Carvalho Chehab wrote: Em Wed, 16 Sep 2009 17:46:39 +0200 Laurent Pinchart laurent.pinch...@ideasonboard.com escreveu: Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. First of all, thank you Laurent for working on this! Much appreciated. Some comments about your proposal: 1) For embedded systems, probably the better is to create it at boot time, instead of controlling it via userspace, since as early it is done, the better. I agree with Mauro here. The only way you can allocate the required memory is in general to do it early in the boot sequence. I agree with you there as well, but there's one obvious problem with that approach: the Linux kernel doesn't know how much memory you will need. Let me take the OMAP3 camera as an example. The sensor has a native 5MP resolution (2548x1938). When taking a still picture, we want to display live video on the device's screen in a lower resolution (840x400) and, when the user presses the camera button, switch to the 5MP resolution and capture 3 images. For this we need a few (let's say 5) 840x400 buffers (672000 bytes each in YUV) and 3 2548x1938 buffers (9876048 bytes each). Those requirements come from the product specifications, and the device driver has no way to know about them. Allocating several huge buffers at boot time big enough for all use cases will here use 75MB of memory instead of 31.5MB. That's why I was thinking about allowing a userspace application to allocate those buffers very early after boot. One other possible solution would be to use a kernel command line parameter set to something like 5x672000,3x9876048. Another reason to allow applications to allocate buffers in the pool was to be able to pre-queue buffers to avoid cache invalidation and memory pinning delays at VIDIOC_QBUF. This is a very important topic that I might not have stressed enough in the RFC. VIDIOC_QBUF currently hurts performances. For the camera use case I've explained above, we need a way to pre-queue the 3 5MP buffers while still streaming video in 840x400. 2) As I've posted at the media controller RFC, we should take care to not abuse about its usage. Media controller has two specific objetives: topology enumeration/change and subdev parameter send. True, but perhaps it can also be used for other purposes. I'm not saying we should, but neither should we stop thinking about it. Someone may come up with a great idea for which a mc is ideally suited. We are still in the brainstorming stage, so any idea is welcome. Agreed. The media controller RFC described its intended purpose, but I don't see why it couldn't be extended if we find a use case for which the media controller is ideally suited. For the last, as I've explained there, the proper solution is to create devices for each v4l subdev that requires control from userspace. The proper solution *in your opinion*. I'm still on the fence on that one. In the case of a video buffers memory poll, it is none of the usecases of media controller. So, it is needed to think better about where to implement it. Why couldn't it be one of the use cases? Again, it is your opinion, not a fact. Note that I share this opinion, but try to avoid presenting opinions as facts. 3) I don't think that having a buffer pool per media controller will be so useful. A media controller groups /dev/video with their audio, IR, I2C... resources. On systems with more than one different board (for example a cellular phone with a camera and an DVB-H receiver), you'll likely have more than one media controller. So, controlling video buffer pools at /dev/video or at media controller will give the same results on several environments; I don't follow the logic here, sorry. 4) As you've mentioned, a global set of buffers seem to be the better alternative. This means that V4L2 core will take care of controlling the pool, instead of leaving this task to the drivers. This makes easier to have a boot-time parameter specifying the size of the memory pool and will optimize memory usage. We may even have a Kconfig var specifying the default size of the memory pool (although this is not really needed, since new kernels allow specifying default line command parameters). Different devices may have quite different buffer requirements (size, number of buffers). Would it be safe to have them all allocated from a global
Re: [RFC] Global video buffers pool
Hi Mauro, thanks for the review. A few comments. On Friday 18 September 2009 00:45:42 Mauro Carvalho Chehab wrote: Em Thu, 17 Sep 2009 23:19:24 +0200 Hans Verkuil hverk...@xs4all.nl escreveu: [snip] 4) As you've mentioned, a global set of buffers seem to be the better alternative. This means that V4L2 core will take care of controlling the pool, instead of leaving this task to the drivers. This makes easier to have a boot-time parameter specifying the size of the memory pool and will optimize memory usage. We may even have a Kconfig var specifying the default size of the memory pool (although this is not really needed, since new kernels allow specifying default line command parameters). Different devices may have quite different buffer requirements (size, number of buffers). Would it be safe to have them all allocated from a global pool? I do not feel confident myself that I understand all the implications of a global pool or whether you actually always want that. This is a problem with the pool concept. Even having the same driver, you'll still be needing different resolutions, frame rates, formats and bits per pixel on each /dev/video interface. That's right (the frame rate doesn't matter though), but not different memory type (low-mem, non-cacheable, contiguous, ...) requirements. The only thing that matters in the end is the number of buffers and their size. The pool doesn't care about the formats and resolutions separately. I'm not sure how to deal. My idea was to have several groups of video buffers. You could allocate on large group of low-resolution buffers for video preview, and a small group of high-resolution buffers for still image capture. Video devices could then pick buffers from one of those groups depending on their needs. Maybe we'll need to allocate the buffers considering the worse case that can be passed to the driver. For example, in the case of a kernel parameter, it could be something like: videobuf=buffers=32,size=256K To allocate 32 buffers with 256K each. This way, even if application asks for a smaller buffer, it will keep reserving 256K for each buffer. If bad specified, memory will be wasted, but the memory will be there. Eventually, after allocating that memory, some API could be provided for example to rearrange the allocated space into 64 x 128K. We still need separate groups, otherwise we will waste too much memory. 5MP sensors are common today, and the size will probably grow in the years to come. We can't allocate 32 5MP buffers on an embedded system. 5) The step to have a a global-wide video buffers pool allocation, as you mentioned at the RFC, is to make sure that all drivers will use v4l2 framework to allocate memory. So, this means porting a few drivers (ivtv, uvcvideo, cx18 and gspca) to use videobuf. As videobuf already supports all sorts of different memory types and configs (contig and Scatter/Gather DMA, vmalloced buffers, mmap, userptr, read, overlay modes), it should fits well on the needs. Why would I want to change ivtv for this? In fact, I see no reason to modify any of the existing drivers. A mc-wide or global memory pool is only of interest for very complex devices where you want to pass buffers around between various sub-devices (and possibly to other media devices or DSPs). And yes, they probably will have to use the framework in order to be able to coordinate these pools properly. The issue here is not necessarely related to device complexity. It can be motivated by other factors, for example: - arch's with non-coherent cache; - devices that aren't capable of doing DMA scatter/gather; - high memory fragmentation. Just as an example, I used an old laptop with only 256 Mb of ram, running a new distro, when I started developing the tm6000 drivers. On that hardware, I was needing buffers of about 600 KB each. It were very common to not be able to allocate such buffers there, due to high memory fragmentation, since the USB driver were trying to allocate a continuous buffer on that hardware. So, the same argument we used with the EMBEEDED Kconfig option also applies here: it is not everything black or white. For example, surveillance systems need to be very reliable. So, the possibility of allocating memory during boot will help them. Just to take a random real usecase, David Liontooth mentioned recently at the ML his intention of maybe using ivtv hardware to capture TV signals at remote locations, having the hardware minimally assisted. He mention the needs of capturing data continuously for 15 hours. That means that the machine will likely close devices and reopen once a day, during years. In such application, a video buffer pool will for sure reduce the risk of memory fragmentation on such systems, giving more reliability to the system, especially if the hardware it will
Re: [RFC] Global video buffers pool
I'm joining your comments to Vaibhav with your comments to me, in order to avoid duplicating comments. Em Fri, 18 Sep 2009 10:39:17 +0200 Laurent Pinchart laurent.pinch...@ideasonboard.com escreveu: Different devices may have quite different buffer requirements (size, number of buffers). Would it be safe to have them all allocated from a global pool? I do not feel confident myself that I understand all the implications of a global pool or whether you actually always want that. This is a problem with the pool concept. Even having the same driver, you'll still be needing different resolutions, frame rates, formats and bits per pixel on each /dev/video interface. That's right (the frame rate doesn't matter though), but not different memory type (low-mem, non-cacheable, contiguous, ...) requirements. The only thing that matters in the end is the number of buffers and their size. The pool doesn't care about the formats and resolutions separately. For raw formats, that's right. However, with some compressed formats, there are other parameters that affect the size of a framebuffer. For example, just knowing the resolution is not enough for h.264/mpeg/jpeg formats. Even frame rate can affect some of them, since they'll affect the temporal estimations. For compressed formats, maybe the right approach would be to allocate buffers based on the maximum allowed bandwidth. I'm not sure how to deal. My idea was to have several groups of video buffers. You could allocate on large group of low-resolution buffers for video preview, and a small group of high-resolution buffers for still image capture. Video devices could then pick buffers from one of those groups depending on their needs. Maybe we'll need to allocate the buffers considering the worse case that can be passed to the driver. For example, in the case of a kernel parameter, it could be something like: videobuf=buffers=32,size=256K To allocate 32 buffers with 256K each. This way, even if application asks for a smaller buffer, it will keep reserving 256K for each buffer. If bad specified, memory will be wasted, but the memory will be there. Eventually, after allocating that memory, some API could be provided for example to rearrange the allocated space into 64 x 128K. We still need separate groups, otherwise we will waste too much memory. 5MP sensors are common today, and the size will probably grow in the years to come. We can't allocate 32 5MP buffers on an embedded system. I agree with Mauro here. The only way you can allocate the required memory is in general to do it early in the boot sequence. I agree with you there as well, but there's one obvious problem with that approach: the Linux kernel doesn't know how much memory you will need. Let me take the OMAP3 camera as an example. The sensor has a native 5MP resolution (2548x1938). When taking a still picture, we want to display live video on the device's screen in a lower resolution (840x400) and, when the user presses the camera button, switch to the 5MP resolution and capture 3 images. For this we need a few (let's say 5) 840x400 buffers (672000 bytes each in YUV) and 3 2548x1938 buffers (9876048 bytes each). Those requirements come from the product specifications, and the device driver has no way to know about them. Allocating several huge buffers at boot time big enough for all use cases will here use 75MB of memory instead of 31.5MB. That's why I was thinking about allowing a userspace application to allocate those buffers very early after boot. One other possible solution would be to use a kernel command line parameter set to something like 5x672000,3x9876048. Interesting approach. Another alternative would be to allocate a flat memory block during boot time, and provide a set of controls to control how the memory will be divided. So, while I agree that it is not a mandatory requirement to port the existing drivers to benefit with the memory pool, by not doing it, those drivers will be less reliable than the other drivers on professional usage. Good point. No need to be too clever though. I think that the memory pool concept can be restricted to use cases where the user knows in advance what's going to happen with the hardware. A video monitoring system is one of them, a digital camera is another one. Agreed. In those cases the system designer knows what resolutions will be streamed at, and how many buffers will be needed. This information can come from userspace or the kernel command line, and the memory pool won't need to become a complete memory management system. An application that wants to use buffers from the pool will the explicitly which set of buffers it wants to use. I'm not sure that reserving memory size on userspace would be good enough, even if done too early. On the other hand, a complex command line won't be good enough.
Re: [RFC] Global video buffers pool
Hi, On 09/16/2009 05:46 PM, Laurent Pinchart wrote: Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. Sound like a reasonable and usefull proposal to me. Not much to add other then that. Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Global video buffers pool
Em Wed, 16 Sep 2009 17:46:39 +0200 Laurent Pinchart laurent.pinch...@ideasonboard.com escreveu: Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. Some comments about your proposal: 1) For embedded systems, probably the better is to create it at boot time, instead of controlling it via userspace, since as early it is done, the better. 2) As I've posted at the media controller RFC, we should take care to not abuse about its usage. Media controller has two specific objetives: topology enumeration/change and subdev parameter send. For the last, as I've explained there, the proper solution is to create devices for each v4l subdev that requires control from userspace. In the case of a video buffers memory poll, it is none of the usecases of media controller. So, it is needed to think better about where to implement it. 3) I don't think that having a buffer pool per media controller will be so useful. A media controller groups /dev/video with their audio, IR, I2C... resources. On systems with more than one different board (for example a cellular phone with a camera and an DVB-H receiver), you'll likely have more than one media controller. So, controlling video buffer pools at /dev/video or at media controller will give the same results on several environments; 4) As you've mentioned, a global set of buffers seem to be the better alternative. This means that V4L2 core will take care of controlling the pool, instead of leaving this task to the drivers. This makes easier to have a boot-time parameter specifying the size of the memory pool and will optimize memory usage. We may even have a Kconfig var specifying the default size of the memory pool (although this is not really needed, since new kernels allow specifying default line command parameters). 5) The step to have a a global-wide video buffers pool allocation, as you mentioned at the RFC, is to make sure that all drivers will use v4l2 framework to allocate memory. So, this means porting a few drivers (ivtv, uvcvideo, cx18 and gspca) to use videobuf. As videobuf already supports all sorts of different memory types and configs (contig and Scatter/Gather DMA, vmalloced buffers, mmap, userptr, read, overlay modes), it should fits well on the needs. 6) As videobuf uses a common method of allocating memory, and all memory requests passes via videobuf-core (videobuf_alloc function), the implementation of a global-wide set of videobuffer means to touch on just one function there, at the abstraction layer, and to double check at the videobuf-dma-sg/videobuf-vmalloc/videobuf-contig if they don't call directly their own allocation methods. If they do, a simple change would be needed. 7) IMO, the better interface for it is to add some sysfs attributes to media class, providing there the means to control the video buffer pools. If the size of a video buffer pool is set to zero, it will use normal memory allocation. Otherwise, it will work at the pool mode. 8) By using videobuf, we can also export usage statistics via debugfs, providing runtime statistics about how many memory is being used by what drivers and /dev devices. Cheers, Mauro -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] Global video buffers pool
Laurent, Thanks for working on this. I might need some more time to review this as there are many RFCs put for review (including one from myself). From TI's point of view we need something like a global buffer allocator for video drivers. Two ideas that came up while discussing this internally are given below which I thought will share with you. 1)Add a common contiguous buffer allocator/deallocator to video buffer layer which will pre-allocate the buffer at bootup and contiguous buffer layer use it for allocating buffer. When runs out, it will fallback to it's current scheme. 2)Similar to 1) except that the allocator use the bootargs mem variable to calculate the available memory in the board and use this memory for buffer allocation. This way user can customize it based on a system design goal. We might want to have user application to request buffer from the same pool through an API. User space applications would use this API to allocate contiguous buffers as needed and would use USERPTR IO in all drivers or use MMAP IO in one driver and USERPTR IO in other drivers using these buffer pointers. Murali Karicheri Software Design Engineer Texas Instruments Inc. Germantown, MD 20874 new phone: 301-407-9583 Old Phone : 301-515-3736 (will be deprecated) email: m-kariche...@ti.com -Original Message- From: linux-media-ow...@vger.kernel.org [mailto:linux-media- ow...@vger.kernel.org] On Behalf Of Laurent Pinchart Sent: Wednesday, September 16, 2009 11:47 AM To: linux-media@vger.kernel.org; Hans Verkuil; Sakari Ailus; Cohen David Abraham; Koskipää Antti Jussi Petteri; Zutshi Vimarsh (Nokia-D- MSW/Helsinki); stefan.k...@nokia.com Subject: [RFC] Global video buffers pool Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. Introduction The V4L2 video buffers handling API makes use of a queue of video buffers to exchange data between video devices and userspace applications (the read method don't expose the buffers objects directly but uses them underneath). Although quite efficient for simple video capture and output use cases, the current implementation doesn't scale well when used with complex hardware and large video resolutions. This RFC will list the current limitations of the API and propose a possible solution. The document is at this stage a work in progress. Its main purpose is to be used as support material for discussions at the Linux Plumbers Conference. Limitations === Large buffers allocation Many video devices still require physically contiguous memory. The introduction of IOMMUs on high-end systems will probably make that a distant nightmare in the future, but we have to deal with this situation for the moment (I'm not sure if the most recent PCI devices support scatter-gather lists, but many embedded systems still require physically contiguous memory). Allocating large amounts of physically contiguous memory needs to be done as soon as possible after (or even during) system bootup, otherwise memory fragmentation will cause the allocation to fail. As the amount of required video memory depends on the frame size and the number of buffers, the driver can't pre-allocate the buffers beforehand. A few drivers allocate a large chunk of memory when they are loaded and then use it when a userspace application requests video buffers to be allocated. However, that method requires guessing how much memory will be needed, and can lead to waste of system memory (if the guess was too large) or allocation failures (if the guess was too low). Buffer queuing latency --- VIDIOC_QBUF is becoming a performance bottleneck when capturing large images on some systems (especially in the embedded world). When capturing high resolution still pictures, the VIDIOC_QBUF delay adds to the shot latency, making the camera appear slow to the user. The delay is caused by several operations required by DMA transfers that all happen when queuing buffers. - Cache coherency management When the processor has a non-coherent cache (which is the case with most embedded devices, especially ARM-based) the device driver needs to invalidate (for video capture) or flush (for video output) the cache (either a range, or the whole cache) every time a buffer is queued. This ensures that stale data in the cache will not be written back to memory during or after DMA and that all data written by the CPU is visible to the device. Invalidating the cache for large resolutions take a considerable amount of time. Preliminary tests showed that cache
Re: [RFC] Global video buffers pool
On Thursday 17 September 2009 20:49:49 Mauro Carvalho Chehab wrote: Em Wed, 16 Sep 2009 17:46:39 +0200 Laurent Pinchart laurent.pinch...@ideasonboard.com escreveu: Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. First of all, thank you Laurent for working on this! Much appreciated. Some comments about your proposal: 1) For embedded systems, probably the better is to create it at boot time, instead of controlling it via userspace, since as early it is done, the better. I agree with Mauro here. The only way you can allocate the required memory is in general to do it early in the boot sequence. 2) As I've posted at the media controller RFC, we should take care to not abuse about its usage. Media controller has two specific objetives: topology enumeration/change and subdev parameter send. True, but perhaps it can also be used for other purposes. I'm not saying we should, but neither should we stop thinking about it. Someone may come up with a great idea for which a mc is ideally suited. We are still in the brainstorming stage, so any idea is welcome. For the last, as I've explained there, the proper solution is to create devices for each v4l subdev that requires control from userspace. The proper solution *in your opinion*. I'm still on the fence on that one. In the case of a video buffers memory poll, it is none of the usecases of media controller. So, it is needed to think better about where to implement it. Why couldn't it be one of the use cases? Again, it is your opinion, not a fact. Note that I share this opinion, but try to avoid presenting opinions as facts. 3) I don't think that having a buffer pool per media controller will be so useful. A media controller groups /dev/video with their audio, IR, I2C... resources. On systems with more than one different board (for example a cellular phone with a camera and an DVB-H receiver), you'll likely have more than one media controller. So, controlling video buffer pools at /dev/video or at media controller will give the same results on several environments; I don't follow the logic here, sorry. 4) As you've mentioned, a global set of buffers seem to be the better alternative. This means that V4L2 core will take care of controlling the pool, instead of leaving this task to the drivers. This makes easier to have a boot-time parameter specifying the size of the memory pool and will optimize memory usage. We may even have a Kconfig var specifying the default size of the memory pool (although this is not really needed, since new kernels allow specifying default line command parameters). Different devices may have quite different buffer requirements (size, number of buffers). Would it be safe to have them all allocated from a global pool? I do not feel confident myself that I understand all the implications of a global pool or whether you actually always want that. 5) The step to have a a global-wide video buffers pool allocation, as you mentioned at the RFC, is to make sure that all drivers will use v4l2 framework to allocate memory. So, this means porting a few drivers (ivtv, uvcvideo, cx18 and gspca) to use videobuf. As videobuf already supports all sorts of different memory types and configs (contig and Scatter/Gather DMA, vmalloced buffers, mmap, userptr, read, overlay modes), it should fits well on the needs. Why would I want to change ivtv for this? In fact, I see no reason to modify any of the existing drivers. A mc-wide or global memory pool is only of interest for very complex devices where you want to pass buffers around between various sub-devices (and possibly to other media devices or DSPs). And yes, they probably will have to use the framework in order to be able to coordinate these pools properly. 6) As videobuf uses a common method of allocating memory, and all memory requests passes via videobuf-core (videobuf_alloc function), the implementation of a global-wide set of videobuffer means to touch on just one function there, at the abstraction layer, and to double check at the videobuf-dma-sg/videobuf-vmalloc/videobuf-contig if they don't call directly their own allocation methods. If they do, a simple change would be needed. 7) IMO, the better interface for it is to add some sysfs attributes to media class, providing there the means to control the video buffer pools. If the size of a video buffer pool is set to zero, it will use normal memory allocation. Otherwise, it will work at the pool mode. Or you use the existing
[RFC] Global video buffers pool
Hi everybody, I didn't want to miss this year's pretty flourishing RFC season, so here's another one about a global video buffers pool. All comments are welcome, but please don't trash this proposal too fast. It's a first shot at real problems encountered in real situations with real hardware (namely high resolution still image capture on OMAP3). It's far from perfect, and I'm open to completely different solutions if someone thinks of one. Introduction The V4L2 video buffers handling API makes use of a queue of video buffers to exchange data between video devices and userspace applications (the read method don't expose the buffers objects directly but uses them underneath). Although quite efficient for simple video capture and output use cases, the current implementation doesn't scale well when used with complex hardware and large video resolutions. This RFC will list the current limitations of the API and propose a possible solution. The document is at this stage a work in progress. Its main purpose is to be used as support material for discussions at the Linux Plumbers Conference. Limitations === Large buffers allocation Many video devices still require physically contiguous memory. The introduction of IOMMUs on high-end systems will probably make that a distant nightmare in the future, but we have to deal with this situation for the moment (I'm not sure if the most recent PCI devices support scatter-gather lists, but many embedded systems still require physically contiguous memory). Allocating large amounts of physically contiguous memory needs to be done as soon as possible after (or even during) system bootup, otherwise memory fragmentation will cause the allocation to fail. As the amount of required video memory depends on the frame size and the number of buffers, the driver can't pre-allocate the buffers beforehand. A few drivers allocate a large chunk of memory when they are loaded and then use it when a userspace application requests video buffers to be allocated. However, that method requires guessing how much memory will be needed, and can lead to waste of system memory (if the guess was too large) or allocation failures (if the guess was too low). Buffer queuing latency --- VIDIOC_QBUF is becoming a performance bottleneck when capturing large images on some systems (especially in the embedded world). When capturing high resolution still pictures, the VIDIOC_QBUF delay adds to the shot latency, making the camera appear slow to the user. The delay is caused by several operations required by DMA transfers that all happen when queuing buffers. - Cache coherency management When the processor has a non-coherent cache (which is the case with most embedded devices, especially ARM-based) the device driver needs to invalidate (for video capture) or flush (for video output) the cache (either a range, or the whole cache) every time a buffer is queued. This ensures that stale data in the cache will not be written back to memory during or after DMA and that all data written by the CPU is visible to the device. Invalidating the cache for large resolutions take a considerable amount of time. Preliminary tests showed that cache invalidation for a 5MP buffer requires several hundreds of milliseconds on an OMAP3 platform for range invalidation, or several tens of milliseconds when invalidating the whole D cache. When video buffers are passed between two devices (for instance when passing the same USERPTR buffer to a video capture device and a hardware codec) without any userspace access to the memory, CPU cache invalidation/flushing isn't required on either side (video capture and hardware codec) and could be skipped. - Memory locking and IOMMU Drivers need to lock the video buffer pages in memory to make sure that the physical pages will not be freed while DMA is in progress under low-memory conditions. This requires looping over all pages (typically 4kB long) that back the video buffer (10MB for a 5MP YUV image) and takes a considerable amount of time. When using the MMAP streaming method, the buffers can be locked in memory when allocated (VIDIOC_REQBUFS). However, when using the USERPTR streaming method, the buffers can only be locked the first time they are queued, adding to the VIDIOC_QBUF latency. A similar issue arises when using IOMMUs. The IOMMU needs to be programmed to translate physically scattered pages into a contiguous memory range on the bus. This operation is done the first time buffers are queued for USERPTR buffers. Sharing buffers between devices --- Video buffers memory can be shared between several devices when at most one of them uses the MMAP method, and the others the USERPTR method. This avoids memcpy() operations when transferring video data from one device to another through memory (video acquisition - hardware