Re: [osg-users] RTT slave views and multi-threading
Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com
Re: [osg-users] RTT slave views and multi-threading
Hi Tugkan, Tugkan Calapoglu wrote: Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See the attach method of osg::Camera. Think there is an example in osgprerender. Also see here: http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651 and http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432 rgds jp cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users
Re: [osg-users] RTT slave views and multi-threading
hi Jp, unfortunately that method is easy but very slow. I think it also uses glGetTexImage. cheers, tugkan Hi Tugkan, Tugkan Calapoglu wrote: Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See the attach method of osg::Camera. Think there is an example in osgprerender. Also see here: http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651 and http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432 rgds jp cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Tugkan, On Thu, Jan 14, 2010 at 12:00 PM, Tugkan Calapoglu tug...@vires.com wrote: unfortunately that method is easy but very slow. I think it also uses glGetTexImage. An operation like glReadPixels and glGetTexImage involves the fifo being flushed and the data copied back into main memory. These two things together make it slow and there isn't much you can do about it directly. The best way to deal with the high cost of these operations is to avoid them completely. Try to use algorithms that can use render to texture using FBO's and read this textures directly in other shaders. Never try to copy the results back to the CPU/main memory, this does force you to do more work on the GPU and rely on more complex shaders but in the end it means that you don't have to force a round trip to the GPU. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi, Tugkan Calapoglu wrote: hi Jp, unfortunately that method is easy but very slow. I think it also uses glGetTexImage. You might be surprised. Have you read the threads I linked to? Attach uses glReadPixels (while doing the FBO rendering, so you don't have to bind anything yourself later) and in many cases this is the fastest. If you want something more elaborate, such as async PBO use, see the osgscreencapture example. Also, test whatever you use for your setup, all sorts of things can change the efficiency of reading data back to CPU. YMMV. Like Robert said tho, not reading anything back to CPU if you can help it is the best. rgds jp cheers, tugkan Hi Tugkan, Tugkan Calapoglu wrote: Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See the attach method of osg::Camera. Think there is an example in osgprerender. Also see here: http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651 and http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432 rgds jp cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render
Re: [osg-users] RTT slave views and multi-threading
Hi Robert, I am working on an HDR implementation which should work on multiple channels. The method I use requires average luminance of the scene. If I use different average luminances for different channels the colors will simply not match. E.g. in a tunnel front channel will see the tunnel exit and have a higher average luminance than the side channels which only see the dark tunnel walls. So, I do need a way to collect current average luminances of all channels and compute a single average that can be used for all (by channel I mean separate computers that are connected to separate projectors). I know that getting data back from GPU is slow but 12ms for a 4x4 texture seems extreme. glReadPixels seems to be faster, because we are able to make full screen grabs (800x600) and still keep 60hz (even w/o pbo). Some GPGPU people suggest using glReadPixels to read directly from a FBO rather than glGetTexImage, so I was wondering if there is a way to obtain the osg::FBO pointer from the camera? cheers, tugkan Hi Tugkan, On Thu, Jan 14, 2010 at 12:00 PM, Tugkan Calapoglu tug...@vires.com wrote: unfortunately that method is easy but very slow. I think it also uses glGetTexImage. An operation like glReadPixels and glGetTexImage involves the fifo being flushed and the data copied back into main memory. These two things together make it slow and there isn't much you can do about it directly. The best way to deal with the high cost of these operations is to avoid them completely. Try to use algorithms that can use render to texture using FBO's and read this textures directly in other shaders. Never try to copy the results back to the CPU/main memory, this does force you to do more work on the GPU and rely on more complex shaders but in the end it means that you don't have to force a round trip to the GPU. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Tugkan, On Thu, Jan 14, 2010 at 12:31 PM, Tugkan Calapoglu tug...@vires.com wrote: I know that getting data back from GPU is slow but 12ms for a 4x4 texture seems extreme. It's the flushing of the fifo that is the problem, that's why it's so slow, not the data transfer itself. Once you flush the fifo you loose the parallelism between the CPU and GPU. The only way to hide this is to use PBO's to do the read back and do the actual read back on the next frame rather than in the current frame. In your case you might be able to get away with this, a frames latency might not be a big issue if you can keep to a solid 60Hz and the values you are reading back aren't changing drastically between frames. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Jp, my initial implementation used osg:Image attached to a camera and it was just as slow. I will see what I can do with PBO's. regards, tugkan Hi, Tugkan Calapoglu wrote: hi Jp, unfortunately that method is easy but very slow. I think it also uses glGetTexImage. You might be surprised. Have you read the threads I linked to? Attach uses glReadPixels (while doing the FBO rendering, so you don't have to bind anything yourself later) and in many cases this is the fastest. If you want something more elaborate, such as async PBO use, see the osgscreencapture example. Also, test whatever you use for your setup, all sorts of things can change the efficiency of reading data back to CPU. YMMV. Like Robert said tho, not reading anything back to CPU if you can help it is the best. rgds jp cheers, tugkan Hi Tugkan, Tugkan Calapoglu wrote: Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See the attach method of osg::Camera. Think there is an example in osgprerender. Also see here: http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651 and http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432 rgds jp cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel
Re: [osg-users] RTT slave views and multi-threading
Hi Robert, yes one frame latency is OK. Is there an example about the PBO usage? osgscreencapture seems to be about getting the data from frame buffer not from an RTT texture. tugkan Hi Tugkan, On Thu, Jan 14, 2010 at 12:31 PM, Tugkan Calapoglu tug...@vires.com wrote: I know that getting data back from GPU is slow but 12ms for a 4x4 texture seems extreme. It's the flushing of the fifo that is the problem, that's why it's so slow, not the data transfer itself. Once you flush the fifo you loose the parallelism between the CPU and GPU. The only way to hide this is to use PBO's to do the read back and do the actual read back on the next frame rather than in the current frame. In your case you might be able to get away with this, a frames latency might not be a big issue if you can keep to a solid 60Hz and the values you are reading back aren't changing drastically between frames. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi, Tugkan Calapoglu wrote: Hi Jp, my initial implementation used osg:Image attached to a camera and it was just as slow. OK. I will see what I can do with PBO's. There is some code in the threads I linked to earlier that shows how to get data into a PBO using osg's PixelBufferDataObject. It does not do the async reading, but see here for more details: http://www.songho.ca/opengl/gl_pbo.html regards jp regards, tugkan Hi, Tugkan Calapoglu wrote: hi Jp, unfortunately that method is easy but very slow. I think it also uses glGetTexImage. You might be surprised. Have you read the threads I linked to? Attach uses glReadPixels (while doing the FBO rendering, so you don't have to bind anything yourself later) and in many cases this is the fastest. If you want something more elaborate, such as async PBO use, see the osgscreencapture example. Also, test whatever you use for your setup, all sorts of things can change the efficiency of reading data back to CPU. YMMV. Like Robert said tho, not reading anything back to CPU if you can help it is the best. rgds jp cheers, tugkan Hi Tugkan, Tugkan Calapoglu wrote: Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See the attach method of osg::Camera. Think there is an example in osgprerender. Also see here: http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651 and http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432 rgds jp cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I
Re: [osg-users] RTT slave views and multi-threading
Hi Tugkan, On Thu, Jan 14, 2010 at 12:51 PM, Tugkan Calapoglu tug...@vires.com wrote: yes one frame latency is OK. Is there an example about the PBO usage? osgscreencapture seems to be about getting the data from frame buffer not from an RTT texture. osgscreencapture uses a frame latency when it double buffers the PBO's. It doesn't matter whether it's frame buffer or FBO, the PBO is only related to memory management. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Robert, Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. osgdistortion's threading model is set to SingleThreaded in the code. I changed it to DrawThreadPerContext and now I can see that draw starts after cull, i.e. they do not run parallel. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. Sure. I do not expect that two cameras render in parallel onto a single window, but cull and draw of a certain camera should run parallel. Indeed they do so normally with the exact same scene and application. It breaks only if the second camera (the slave) has PRE_RENDER render order. tugkan As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
HI Tugkan, On Wed, Jan 13, 2010 at 12:20 PM, Tugkan Calapoglu tug...@vires.com wrote: Sure. I do not expect that two cameras render in parallel onto a single window, but cull and draw of a certain camera should run parallel. Indeed they do so normally with the exact same scene and application. It breaks only if the second camera (the slave) has PRE_RENDER render order. Cull and draw can only run in a parallel once all the dynamic geometry has been dispatched, otherwise the draw will be dispatching data that is being modified by the next frames update and cull traversals. Perhaps you have some dynamic geometry or StateSet's that are holding back the next frame. Regardless of threading of cull your problem is draw dispatch not cull, you need to look into why the draw dispatch on the second draw is taking so long. Please look at my last email. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org