Re: [Dri-devel] Streaming video through glTexSubImage2D
If texture is locked in DRI while rendering, the straghforward solution is to allocate new texture for every frame or reuse old one from the ring. The new texture will be free, one application thread can be dedicated to push texture to AGP and than into the card, while another threads can decode the video, eliminating stall. The method you described is just async io, the only advantage is that there is no copy app mem -> AGP mem. So multu-buffer is not an advantage over multithreaded approach (besides of sheduling overhead) but use of DMA'able memory directly is. Did I understood it correctly? Probably you can advice MPlayer HQ to use multiple textures in a ring to speedup mplayer on DRI (in case original problem poster system is not CPU bound)? arkadi. --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Streaming video through glTexSubImage2D
Arkadi Shishlov wrote: On Fri, Jan 31, 2003 at 10:26:13AM -0800, Ian Romanick wrote: There are two typical ways to go about imporving texture upload performance in OpenGL applications. One is through the use of OpenGL extensions. There are several extensions available (or available any You are talking about extensions here, but my P3 600MHz Radeon8500 box with ATI binary drivers is able to push normal frame rates in MPlayer with 720x480 movies with OpenGL output driver at 80% CPU load. 30% with XVideo. It use regular glTexSubImage2D, so it is either R100 or DRI beign slow in this case (if CPU is powerful enough). I'm 99% sure that the ATI driver multi-buffers textures. This was the second technique that I mentioned in my post to improve texture upload performance. There are probably other ways to pipeline texture uploads, but the DRI doesn't use any of them. My guess is that if you profiled it you would see that most of the wall clock time is spent waiting for the rendering pipe to flush. I believe that this problem is the reason the guys at Tungsten implemented NV_vertex_array_range and the simplified version of APPLE_client_storage. The "real" fix is going to be a LOT of work. The current sollution is a good stop-gap method, though. I would suggest modifying MPlayer to use the var+client_storage work-around, and then help us implement the long-term fix. :) I don't know much about extensions you mentioned, but how much you'll save with MPlayer? One memcpy() (assuming it doesn't wait for texture upload)? It depends. Using a "real" implementation of APPLE_client_storage, your main loop would like something like the following, and it there would be little or no waiting and no copies. This loop would actually require APPLE_fence, but that would be fairly trivial. The trick is that when you use a texture the driver uses pages from your memory space as pages for the AGP aperture. I don't know exactly what they've done, but I know that Apple has gone to some great lengths to optimize this path. struct { GLuint texture_id; GLuint fence_id; void * buffer; } texture_ring[ MAX_TEXTURES ]; foo( ... ) { /* Allocate memory, texture IDs, and fence IDs for the ring. */ i = 0; while ( ! done ) { glFinishObjectAPPLE( texture_ring[i].fence ); decode_video_frame( texture_ring[i].buffer ); glBindTexture( GL_TEXTURE_2D, texture_ring[i].texture_id ); /* Render with the texture. */ i = (i + 1) % MAX_TEXTURES; } /* Destroy textures, free memory, etc. */ } --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Streaming video through glTexSubImage2D
On Fri, 31 Jan 2003, Arkadi Shishlov wrote: > On Fri, Jan 31, 2003 at 04:33:36PM -0500, Leif Delgass wrote: > > Actually, iirc, all the drivers actually implement glTexSubImage2D the > > same way as glTexImage2D. They always upload the entire texture image -- > > there was a comment I remeber seeing about the subimage index calculations > > being wrong. Fixing this to only upload the subimage would help the > > performance of glTexSubImage2D. > > I think it doesn't make any difference with MPLayer, it replace whole > texture for every frame (there is draw_slice() in libvo/vo_gl.c, but I > doubt it is used too much; possible source of low performance with > DRI?). It always upload in RGB format, so probably much of the CPU is > spent in yuv2rgb(). You're probably right, in most apps it likely wouldn't have a large impact. The extensions that Ian described are going to have more of an effect. > How glTexSubImage2D can upload full texture? The original source is > gone, does it keep a copy internally? Yes, the Mesa drivers currently keep a copy of all textures in system memory, but this is one of the things that could change with a new AGP/texture management scheme. -- Leif Delgass http://www.retinalburn.net --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Streaming video through glTexSubImage2D
On Fri, Jan 31, 2003 at 04:33:36PM -0500, Leif Delgass wrote: > Actually, iirc, all the drivers actually implement glTexSubImage2D the > same way as glTexImage2D. They always upload the entire texture image -- > there was a comment I remeber seeing about the subimage index calculations > being wrong. Fixing this to only upload the subimage would help the > performance of glTexSubImage2D. I think it doesn't make any difference with MPLayer, it replace whole texture for every frame (there is draw_slice() in libvo/vo_gl.c, but I doubt it is used too much; possible source of low performance with DRI?). It always upload in RGB format, so probably much of the CPU is spent in yuv2rgb(). How glTexSubImage2D can upload full texture? The original source is gone, does it keep a copy internally? arkadi. --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Streaming video through glTexSubImage2D
On Fri, 31 Jan 2003, Arkadi Shishlov wrote: > On Fri, Jan 31, 2003 at 10:26:13AM -0800, Ian Romanick wrote: > > There are two typical ways to go about imporving texture upload > > performance in OpenGL applications. One is through the use of OpenGL > > extensions. There are several extensions available (or available any > > You are talking about extensions here, but my P3 600MHz Radeon8500 box > with ATI binary drivers is able to push normal frame rates in MPlayer > with 720x480 movies with OpenGL output driver at 80% CPU load. > 30% with XVideo. > It use regular glTexSubImage2D, so it is either R100 or DRI beign slow > in this case (if CPU is powerful enough). > I don't know much about extensions you mentioned, but how much you'll > save with MPlayer? One memcpy() (assuming it doesn't wait for texture > upload)? Actually, iirc, all the drivers actually implement glTexSubImage2D the same way as glTexImage2D. They always upload the entire texture image -- there was a comment I remeber seeing about the subimage index calculations being wrong. Fixing this to only upload the subimage would help the performance of glTexSubImage2D. -- Leif Delgass http://www.retinalburn.net --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Streaming video through glTexSubImage2D
On Fri, Jan 31, 2003 at 10:26:13AM -0800, Ian Romanick wrote: > There are two typical ways to go about imporving texture upload > performance in OpenGL applications. One is through the use of OpenGL > extensions. There are several extensions available (or available any You are talking about extensions here, but my P3 600MHz Radeon8500 box with ATI binary drivers is able to push normal frame rates in MPlayer with 720x480 movies with OpenGL output driver at 80% CPU load. 30% with XVideo. It use regular glTexSubImage2D, so it is either R100 or DRI beign slow in this case (if CPU is powerful enough). I don't know much about extensions you mentioned, but how much you'll save with MPlayer? One memcpy() (assuming it doesn't wait for texture upload)? arkadi. --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Streaming video through glTexSubImage2D
Morten Hustveit wrote: Currently, the performance when streaming video through glTexSubImage2D is very low. In my test program and with mplayer, I get approximately 8 fps in 720x576 on my Radeon 7500 with texmem-branch from a couple of weeks ago. glDrawPixels is equally slow. I assume glTexSubImage2D is supposed to be able to process realtime video, since it handles extensions like EXT_422_pixels (for 4:2:2 Y'CbCr) and EXT_interlace. Using OpenGL for streaming video is useful for creating nonlinear video editing applications (I think Apple's Shake use OpenGL), because you will be able to preview many of the most common effects in realtime. Is there any work in progress to make texture sub-image uploading faster? Which changes need to be done? There are two typical ways to go about imporving texture upload performance in OpenGL applications. One is through the use of OpenGL extensions. There are several extensions available (or available any day now) to help this process. NV_pixel_data_range and APPLE_client_storage are the two most directly applicable. Neither of these two is /generally/ available in DRI. There is a version of NV_vertex_array_range in the R200 (and R100?) driver that can be used with APPLE_client_storage for texture data. http://oss.sgi.com/projects/ogl-sample/registry/NV/pixel_data_range.txt http://oss.sgi.com/projects/ogl-sample/registry/APPLE/client_storage.txt Jeff Hartmann and I are in the process of designing a COMPLETE replacement of the memory management system for DRI. This re-work should allow for a full, proper implementation of APPLE_client_storage. It's going to take a lot of work, though. The way that APPLE_client_storage is implemented in MacOS X is the application mallocs memory for textures and the system dynamically maps those pages into the AGP aperture. This would be very difficult on x86, but I think Jeff has thought of a different way to get the same effect. There is another extension from the ARB that should be available, literally, any day now to accelerate the process of uploading vertex data (it's a replacement for NV_vertex_array_range & ATI_vertex_array_object). John Carmack made brief mention of it in his recent plan update. As a follow on, there will likely be a version for texture data very soon. I plan to have both these extensions implemented in DRI as part of the memory managment re-write. My personal opinion is that NV_*_range will universally go away after ARB_vertex_buffer_object gains ground. There are too many pitfalls with them for general use, especially WRT software fallbacks. The slow software path becomes even slower if the application "optimizes" by putting data in AGP or on-card memory. :P The other way to speed-up texture upload performance is to double-buffer the textures in side the driver. The straight forward way to implement texture updates is to wait for rendering that may be using the texture to finish, then modify the texture data in place. If I'm not mistaken, this is how DRI works. The optimization is to allocate a new texture buffer if the texture has in-flight rendering. This should be doable in the current implementation, but the implementation would be non-trivial. Basically, you'd have to add a way to track if a texture has in-flight rendering. In the TexSubImage functions for each driver you'd need to add code to detect this case. In this case the "old" driTextureObject would need to be added to a list of "dead" texture object (to be released when their rendering is done), and a new driTextureObject would need to be allocated. Periodically objects in the dead list would need to be checked and, if their rendering is complete, freed. That's the 10,000 mile over-view. There's probably some other cases I'm missing. It might also be possible to implement most of this in a device independent way, but I would do it in a single driver first. I think the tough part will be getting the fencing right. If you (or anyone else!!!) would be interested in working on this, we can talk about it more in next Monday's #dri-devel meeting. --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Streaming video through glTexSubImage2D
Morten Hustveit wrote: Currently, the performance when streaming video through glTexSubImage2D is very low. In my test program and with mplayer, I get approximately 8 fps in 720x576 on my Radeon 7500 with texmem-branch from a couple of weeks ago. glDrawPixels is equally slow. I assume glTexSubImage2D is supposed to be able to process realtime video, since it handles extensions like EXT_422_pixels (for 4:2:2 Y'CbCr) and EXT_interlace. Using OpenGL for streaming video is useful for creating nonlinear video editing applications (I think Apple's Shake use OpenGL), because you will be able to preview many of the most common effects in realtime. Is there any work in progress to make texture sub-image uploading faster? Which changes need to be done? Morten, The R200 driver supports an AGP allocator, but that's for the Radeon 8500 and 9000. You would need to port the allocator (APPLE_client_storage) to the Radeon driver if you wanted to use it on the Radeon 7500. Regards, Jens -- /\ Jens Owen/ \/\ _ [EMAIL PROTECTED] /\ \ \ Steamboat Springs, Colorado --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel