Re: Optimization idea: soft XvPutImage
--- On Mon, 9/22/08, Michel Dänzer [EMAIL PROTECTED] wrote: On Sun, 2008-09-21 at 13:58 +0200, Soeren Sandmann wrote: As other people pointed out, XRender does allow arbitrary 3x3 transformations of source images, but you are right that the XRender protocol would require putting the data in a drawable first. There could be a generic XVideo adaptor which uses RENDER internally. The Xgl code might already have something like that. Wow! Right you are. xorg-server-1.5/src/hw/xgl/xglxv.c. At first glance, at least, it looks like it should be readily portable for Xgl to fb, as the Xgl specific stuff appears to be contained mostly in well labelled macros. Thank you very much for pointing this out. Adam Richter ___ xorg mailing list xorg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xorg
Re: Optimization idea: soft XvPutImage
On Fri, 9/19/08, Soeren Sandmann [EMAIL PROTECTED] wrote: Adam Richter [EMAIL PROTECTED] writes: I want to suggest a way we could eliminate a substantial amount of data copying [...] [...] Pixman, the software implementation of XRender already has support for YUV formats, so all that is really required is to just export YUV picture formats through the XRender protocol. [...] Thank you for pointing out that pixman has some limited YUV reading support already. The biggest problem that I see with using the X Render is that it lacks stretch and shrink, at least if I understand correctly from looking at the protocol specification here: http://gitweb.freedesktop.org/?p=xorg/proto/renderproto.git;a=blob_plain;f=renderproto.txt See lines 758-766: Composite op: PICTOP src:PICTURE mask: PICTURE or None dst:PICTURE src-x, src-y: INT16 mask-x, mask-y: INT16 dst-x, dst-y: INT16 width, height: CARD16 The last two parameters (width and height) presumably apply to both source and destination rather than having separate parameters for the source and destination rectangles. This also appears to be the case when I look in the header for the pixman library (pixman-0.12/pixman/pixman.h) at the declaration of pixman_blt. It also only has a width and height, which presumably apply to both source and destination. Even if you do not want to do stretch, I believe that the X Render extension would require first copying the YUV data to a drawable and then doing a drawable-drawable block transfer operation to do the YUV transformation. In comparison, XvPutImage is a single call takes an XImage, which can be in shared memory, and would normally be in YUV, and specifies the YUV-RGB conversion and stretch in a single operation. Thanks for your input, especially the tip about some YUV support already existing in libpixman. Adam Richter ___ xorg mailing list xorg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xorg
Re: Optimization idea: soft XvPutImage
On Sun, Sep 21, 2008 at 11:10 AM, Adam Richter [EMAIL PROTECTED] wrote: On Fri, 9/19/08, Soeren Sandmann [EMAIL PROTECTED] wrote: Adam Richter [EMAIL PROTECTED] writes: I want to suggest a way we could eliminate a substantial amount of data copying [...] [...] Pixman, the software implementation of XRender already has support for YUV formats, so all that is really required is to just export YUV picture formats through the XRender protocol. [...] Thank you for pointing out that pixman has some limited YUV reading support already. The biggest problem that I see with using the X Render is that it lacks stretch and shrink, at least if I understand correctly from looking at the protocol specification here: http://gitweb.freedesktop.org/?p=xorg/proto/renderproto.git;a=blob_plain;f=renderproto.txt See lines 758-766: Composite op: PICTOP src:PICTURE mask: PICTURE or None dst:PICTURE src-x, src-y: INT16 mask-x, mask-y: INT16 dst-x, dst-y: INT16 width, height: CARD16 The last two parameters (width and height) presumably apply to both source and destination rather than having separate parameters for the source and destination rectangles. This also appears to be the case when I look in the header for the pixman library (pixman-0.12/pixman/pixman.h) at the declaration of pixman_blt. It also only has a width and height, which presumably apply to both source and destination. Even if you do not want to do stretch, I believe that the X Render extension would require first copying the YUV data to a drawable and then doing a drawable-drawable block transfer operation to do the YUV transformation. In comparison, XvPutImage is a single call takes an XImage, which can be in shared memory, and would normally be in YUV, and specifies the YUV-RGB conversion and stretch in a single operation. Thanks for your input, especially the tip about some YUV support already existing in libpixman. Adam Richter ___ xorg mailing list xorg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xorg Src and Mask pictures have a transform, which can translate and rotate coordinates as you please. ___ xorg mailing list xorg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xorg
Re: Optimization idea: soft XvPutImage
On Sun, Sep 21, 2008 at 02:10:07AM -0700, Adam Richter wrote: Thank you for pointing out that pixman has some limited YUV reading support already. The biggest problem that I see with using the X Render is that it lacks stretch and shrink, at least if I understand correctly from looking at the protocol specification here: Render also allows you to apply a transformation matrix to pictures, so you can scale with that. Cheers, Daniel signature.asc Description: Digital signature ___ xorg mailing list xorg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xorg
Re: Optimization idea: soft XvPutImage
Adam Richter [EMAIL PROTECTED] writes: Even if you do not want to do stretch, I believe that the X Render extension would require first copying the YUV data to a drawable and then doing a drawable-drawable block transfer operation to do the YUV transformation. In comparison, XvPutImage is a single call takes an XImage, which can be in shared memory, and would normally be in YUV, and specifies the YUV-RGB conversion and stretch in a single operation. As other people pointed out, XRender does allow arbitrary 3x3 transformations of source images, but you are right that the XRender protocol would require putting the data in a drawable first. A shared memory pixmap would be a possibility, perhaps, though the shared memory extension should eventually be replaced with something based on the DRM memory manager. Soren ___ xorg mailing list xorg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xorg
Optimization idea: soft XvPutImage
I want to suggest a way we could eliminate a substantial amount of data copying when playing video on X servers that do not provide hardware video windows, including servers that offer the X shared memory extension. In common situations, I suspect that this could reduce memory bus utilization for playing video by more than a factor of two. I do not know if I have time to implement this optimization right now, but I think it is potentially a big enough benefit that I really ought to describe it here in case someone else wants to implement it or can relieve me of thinking about it by showing me why it will not work. The copy operation that I want to eliminate occurs when the X server reads data from XPutImage (usually via a shared memory area) and copies it to the frame buffer. The amount of data copied is particularly large because the image is often stretched from its native dimensions (720x480 for DVD, for example) to the dimensions of the display area (for example, 1920x1200 for full screen video on a 24 panel). To eliminate this copy, I want the X server to receive the unstretched YUV image by XvPutImage provided by the Xvideo-2.2 (Xv) extension, as is done for video display hardware that provided video windows, which typically do YUV-RGB and stretch in the display hardware. In this proposed Xv driver, which I will refer to as soft XvPutImage, the YUV-RGB and stretch operations would have to be done in software by the X server, just as they are currently done in software by video playing programs. The difference is that by combining this operation with the X server receiving the image, a big copy operation is eliminated that might plausibly account for more than half of the memory bus utilization in some common video playing scenarios. I realize that most modern video hardware has YUV/stretch video window capabilities or other hardware acceleration for this operation (for example, in hardware 3D operations), but there are at common cases in practice where this optimization should be useful: 1) Improving the capabilities of the weakest systems would allow video to be used more ubiquitously (for example, adding video-based tutorials to larger application suites might become more common). 2) Many open source drivers lack this YUV/stretch capability even if the hardware has it, due to lack of public documentation or slow development in comparison to the life cycle of the hardware, even though efforts to address these problems are definitely helping. 3) The following scenarios may fall under #1 or #2, but are worth separate mention: a) On systems with more processor cores (typically ones which have YUV/stretch hardware but lack drivers), memory bus utilization will be especially important. b) Fake X servers, such as for VNC or when running on a virtualized computer, are less likely to have access to acceleration hardware (although it is possible). c) There are those who believe that 3D acceleration hardware will be traded off for more CPU cores in typical systems of the future. So, at least for the case of playing video through a 3D effect, this optimization may help. See, for example, the Twilight of the GPU interview on slashdot yesterday at http://tech.slashdot.org/tech/08/09/15/2116240.shtml . 4) There are also a couple of cases of small benefit I will note for completeness: a) For video with a slow frame rate playing on a monitor with a high refresh rate where the frame buffer and video window are part of system memory (i.e., no video RAM), where pixels in the frame buffer under the video window are still fetched for chroma key comparison, Soft XvPutImage might actually use less bandwidth than a YUV/stretch video window. b) Not part of this proposal, but a similar idea for systems that have Xv but lack XvMC would be SoftXvMC to eliminate a verbatim copying in of YUV data in Xv, but the bandwidth savings would be more modest. To understand the possible bandwidth savings, here is a calculation based on the scenario mentioned earlier: playing standard DVD (720x480 yuv422) stretched to 1920x1200 (a popular full screen resolution). To start, here is a list of data transfers that occur in the early stages of video decoding, regardless of whether this soft XvPutImage optimization is used. (I believe yuv422 is 2 bytes per pixel). In the descriptions I
Re: Optimization idea: soft XvPutImage
On 17.09.2008 14:22, Adam Richter wrote: 2) Many open source drivers lack this YUV/stretch capability even if the hardware has it, due to lack of public documentation or slow development in comparison to the life cycle of the hardware, even though efforts to address these problems are definitely helping. Note that for this case, you could just implement this in the driver itself (might be easiest for figuring out if it really helps in practice). The memory bus utilization would also be reduced (but never more than that factor 3) as the ratio of the size of the unstretched video to stretched video increases, such as when playing a 720x480 video on a newer 2560x1600 display. It works the other way around too however, think Full HD playback on a (mobile) device with a (say) 800x480 screen (not that I'd say this is very common, but it's a case which could definitely happen). Of course, normal Xv needs to transfer the full resolution image too (albeit only as packed or more common planar yuv which is a bit smaller). I could see this being faster and provide some benefits, but I don't know if it's worth the effort. Roland ___ xorg mailing list xorg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xorg