[osg-users] Parallel draw on a multi-core, single GPU architecture
Hi all, we have a performance problem where the draw thread seems to be the bottleneck. The target hw has 8 cores and a single GPU. We already run cull and draw in parallel but parallelizing draw itself would also help. So I tried following setup: 2 windows 2 cameras (each on one window) threading model: CullThreadPerCameraDrawThreadPerContext In this mode, I can see that I have two cull and two draw threads and they are assigned to separate cores. However, the performance is 1/2 of my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does not make any difference. Is there anything else I have to do to get good performance from above setup (e.g. some more env. vars to set)? Or should I try something completely different? thanks, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Parallel draw on a multi-core, single GPU architecture
Hi Robert, thanks for the answer. Our scene graph is heavily optimized and there doesn't seem to be any serious optimization possibilities. Maybe a few percent improvement from here and there. We observe that, as our databases grow, the performance requirement for draw thread grows faster than cull. In the long run, draw will be a serious limiting factor for us. So I wanted to see if we can take advantage of hardware to solve the problem. tugkan Hi Tugkan, It only ever makes sense to parallelize draw dispatch when you have multiple graphics cards and one context per graphics card. Trying to add more threads than OpenGL/graphics card can handle will just stall things and result in lower performance. If your draw dispatch is overloaded then consider optimizing your scene graph. There are many ways of doings, which way that might appropriate will depend entirely on the needs of your scenes and your application. Since I don't know anything about these specifics I can't provide any direction. Robert. On Wed, Aug 18, 2010 at 9:14 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi all, we have a performance problem where the draw thread seems to be the bottleneck. The target hw has 8 cores and a single GPU. We already run cull and draw in parallel but parallelizing draw itself would also help. So I tried following setup: 2 windows 2 cameras (each on one window) threading model: CullThreadPerCameraDrawThreadPerContext In this mode, I can see that I have two cull and two draw threads and they are assigned to separate cores. However, the performance is 1/2 of my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does not make any difference. Is there anything else I have to do to get good performance from above setup (e.g. some more env. vars to set)? Or should I try something completely different? thanks, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Parallel draw on a multi-core, single GPU architecture
Hi Robert, our databases are for driving simulation and they are usually city streets. We have been employing batching and occlusion culling with very good results so far. Extensive use of LOD's is problematic because 1- They actually reduce performance in most of the cases 2- The switching effect (even the slightest) is found very disturbing by the customers. So with LOD's we are somewhat conservative (but we use them nevertheless) Actually, draw always required more time than cull. If you look OSG examples, draw usually takes more time than cull. Now that draw reaches 16 ms border, this started to hurt us. I think I should retract my sentence from previous email. Draw is not slowing down faster than cull; I did not really make such an observation. My actual observation is that, both cull and draw require more CPU time with larger scenes, but since draw is generally more performance hungry, it becomes bottleneck earlier. tugkan Hi Tugkan, On Wed, Aug 18, 2010 at 9:49 AM, Tugkan Calapoglu tug...@vires.com wrote: We observe that, as our databases grow, the performance requirement for draw thread grows faster than cull. In the long run, draw will be a serious limiting factor for us. So I wanted to see if we can take advantage of hardware to solve the problem. Does this not point out that perhaps the scene graph is become less well balanced over time? I'd look closely at what in the draw dispatch is becoming the bottleneck and how to manage this more efficient. Typically the best way to reduce draw dispatch is to batch the geometry and state in more coarsed grained way. One can also use different approaches to achieve the same/similar results. For instance perhaps you can cull more effectively, or perhaps introduce LOD more effectively. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] OcclusionQueryNode and multithreading
Hi, I tried osgocclusionquery example with a city model we have and observed that it can reduce the number of rendered objects significantly. However, osgocclusionquery example can not make parallel rendering. I tried with all threading modes and saw that draw always starts *after* cull is finished. As a result, the performance gain from occlusion tests is lost due to serialized rendering pipeline. With osgviewer cull and draw perfectly overlaps. I looked into the code and saw that there are a few setDataVariance( osg::Object::DYNAMIC ) calls in OcclusionQueryNode.cpp. I guess this is the reason why rendering pipeline is not parallel with occlusion queries. Is DYNAMIC data variance a necessity for OccluderQuery? If so, is there a way to use occlusion queries and still make parallel cull draw. tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] Explosion effect under an osg::MatrixTransform
Hi, in one of our old applications we use osgParticle::ExplosionEffect which is made child to an osg::MatrixTransform. This code is relatively old and was tested with OSG 0.99. The problem is, with new OSG (2.9.6) explosion effect is always at the center of the scene (at 0,0,0) even though matrix transform has a translation matrix in it. It looks like explosion effect is not affected by the parent transforms anymore. Is there a way to change this? Regards, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] HW swap synching with swap groups/barriers
Hi Roland, thanks for the info, in this case we'll also go with our own implementation. Do you use Linux? I'd like to know whether HW/Drivers really support this feature cleanly under Linux. regards, tugkan Hi Tugkan the submission was indeed not accepted. We are working with a custom OSG version that includes this modification. kind regards, Roland Smeenk -- Read this topic online here: http://forum.openscenegraph.org/viewtopic.php?p=22800#22800 ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org . -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] HW swap synching with swap groups/barriers
Hi All, I have to implement swap synchronization on Linux with NVidia HW. I saw in the mailing list that there was some discussion on this subject sometime in 2008 (topic: Feedback sought on osgViewer swap ready support for clusters). There was even a code submission. However, I can't find any of the required glx functions in the OSG code (e.g. glXJoinSwapGroupNV) now, so I think the submission didn't make it into the core. Or do I miss something? Does OSG support swap ready for clusters in a different way now? tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com
Re: [osg-users] RTT slave views and multi-threading
hi Jp, unfortunately that method is easy but very slow. I think it also uses glGetTexImage. cheers, tugkan Hi Tugkan, Tugkan Calapoglu wrote: Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See the attach method of osg::Camera. Think there is an example in osgprerender. Also see here: http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651 and http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432 rgds jp cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Robert, I am working on an HDR implementation which should work on multiple channels. The method I use requires average luminance of the scene. If I use different average luminances for different channels the colors will simply not match. E.g. in a tunnel front channel will see the tunnel exit and have a higher average luminance than the side channels which only see the dark tunnel walls. So, I do need a way to collect current average luminances of all channels and compute a single average that can be used for all (by channel I mean separate computers that are connected to separate projectors). I know that getting data back from GPU is slow but 12ms for a 4x4 texture seems extreme. glReadPixels seems to be faster, because we are able to make full screen grabs (800x600) and still keep 60hz (even w/o pbo). Some GPGPU people suggest using glReadPixels to read directly from a FBO rather than glGetTexImage, so I was wondering if there is a way to obtain the osg::FBO pointer from the camera? cheers, tugkan Hi Tugkan, On Thu, Jan 14, 2010 at 12:00 PM, Tugkan Calapoglu tug...@vires.com wrote: unfortunately that method is easy but very slow. I think it also uses glGetTexImage. An operation like glReadPixels and glGetTexImage involves the fifo being flushed and the data copied back into main memory. These two things together make it slow and there isn't much you can do about it directly. The best way to deal with the high cost of these operations is to avoid them completely. Try to use algorithms that can use render to texture using FBO's and read this textures directly in other shaders. Never try to copy the results back to the CPU/main memory, this does force you to do more work on the GPU and rely on more complex shaders but in the end it means that you don't have to force a round trip to the GPU. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Jp, my initial implementation used osg:Image attached to a camera and it was just as slow. I will see what I can do with PBO's. regards, tugkan Hi, Tugkan Calapoglu wrote: hi Jp, unfortunately that method is easy but very slow. I think it also uses glGetTexImage. You might be surprised. Have you read the threads I linked to? Attach uses glReadPixels (while doing the FBO rendering, so you don't have to bind anything yourself later) and in many cases this is the fastest. If you want something more elaborate, such as async PBO use, see the osgscreencapture example. Also, test whatever you use for your setup, all sorts of things can change the efficiency of reading data back to CPU. YMMV. Like Robert said tho, not reading anything back to CPU if you can help it is the best. rgds jp cheers, tugkan Hi Tugkan, Tugkan Calapoglu wrote: Hi Robert, Wojciech my initial guess was that the lengthy draw dispatch of the master view and failing cull draw parallelism was the result of the same problem. However, they actually seem to be different problems and I'll focus first on the draw dispatch. The master camera draws only a screen aligned quad and nothing else (scene with shadows is rendered by the slave camera). Also no dynamic geometry. But, I indeed have a read buffer operation: a glGetTexImage call in the postdraw callback of the master camera. This call takes ~12 ms. I read back a small texture that is rendered by a camera in the current frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation. It looks like using glReadPixels to read directly from the FBO is the advised method for getting data back to the system memory. How do I get the FBO that the camera is rendering to? Or, is there a better method to get the texture data back to the sysmem? Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See the attach method of osg::Camera. Think there is an example in osgprerender. Also see here: http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651 and http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432 rgds jp cheers, tugkan Hi Tugkan, Robert mentioned lengthy read operation. It may be related to read buffer operation thats used to compute shadow volume in LightSpacePerpspectiveShadowMapDB. If your slave view uses osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the same problem. I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but I could not find quick and easy to implement workaround that would do this without scanning the image by CPU. I allocate small 64x64 texture and render the scene there, then read it into CPU memory and use CPU to scan pixels to optimzie shadow volume from depths and pixel locations strored in this prerender image. Wojtek - Original Message - From: Robert Osfield robert.osfi...@gmail.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Wednesday, January 13, 2010 1:04 PM Subject: Re: [osg-users] RTT slave views and multi-threading Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel
Re: [osg-users] RTT slave views and multi-threading
Hi Robert, yes one frame latency is OK. Is there an example about the PBO usage? osgscreencapture seems to be about getting the data from frame buffer not from an RTT texture. tugkan Hi Tugkan, On Thu, Jan 14, 2010 at 12:31 PM, Tugkan Calapoglu tug...@vires.com wrote: I know that getting data back from GPU is slow but 12ms for a 4x4 texture seems extreme. It's the flushing of the fifo that is the problem, that's why it's so slow, not the data transfer itself. Once you flush the fifo you loose the parallelism between the CPU and GPU. The only way to hide this is to use PBO's to do the read back and do the actual read back on the next frame rather than in the current frame. In your case you might be able to get away with this, a frames latency might not be a big issue if you can keep to a solid 60Hz and the values you are reading back aren't changing drastically between frames. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] RTT slave views and multi-threading
Hi Robert, Hi Tugkan, The osgdistortion example works a bit like what you are describing, could you try this to see what performance it's getting. osgdistortion's threading model is set to SingleThreaded in the code. I changed it to DrawThreadPerContext and now I can see that draw starts after cull, i.e. they do not run parallel. As for general notes about threading, if you are working on the same graphics context as you are then all the draw dispatch and the draw GPU can only be done by a single graphics thread so there is little opportunity to make it more parallel without using another graphics card/graphics context and interleaving of frames. Sure. I do not expect that two cameras render in parallel onto a single window, but cull and draw of a certain camera should run parallel. Indeed they do so normally with the exact same scene and application. It breaks only if the second camera (the slave) has PRE_RENDER render order. tugkan As for why the second camera is very expensive on draw dispatch, this suggest to me that it's blocking either due to the OpenGL fifo being full or that it contains a GL read back operation of some kind. Robert. On Wed, Jan 13, 2010 at 11:34 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi All, I am using a slave view for rendering the scene to a texture. Initially I tried with a camera node, however, this did not work well due to a problem in LiSPSM shadows and I was suggested to use RTT slave views. My setup is as follows: There is a single main view and I attach a slave view to it. This slave view is attached with addSlave( slave , false ); so that it does *not* automatically use the master scene. I attach a texture to the slave view and make my scene child of this view. I attach a screen aligned quad to the main view. This quad visualizes the RTT texture from the slave view. Now I have a threading problem which can be seen on the snapshot I attached. There are two issues: 1- The main view (cam1) has a very large draw time even though it only renders the screen aligned quad. I double checked to see whether it also renders the actual scene but this is not the case. 2- Slave view does not run cull and draw in parallel. Cull and draw do run in parallel if they are not rendered with the slave view. Moreover, if I change the render order of the slave camera from PRE_RENDER to POST_RENDER it is ok. I could simply use POST_RENDER but I am afraid it introduces an extra one frame latency. If I render the screen aligned quad first and the scene later than what I see on the quad is the texture from previous frame (right?). Any ideas? cheers, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Render to texture and Shadows
Hi, thanks for the suggestion, with RTT slave views it works. tugkan Hi Tugkan, Somebody recently, also mentioned that nested cams not work with LiSPSM. Frankly, I have no time to investigate this. I would recommend using RTT slave views instead of nested RTT cam. I think this will have more chance to work as intended. I apologize for trouble. Wojtek Lewandowski - Original Message - From: Tugkan Calapoglu tug...@vires.com To: OpenSceneGraph Users osg-users@lists.openscenegraph.org Sent: Tuesday, November 03, 2009 9:10 AM Subject: [osg-users] Render to texture and Shadows Hi All, I am using Light Space Perspective Shadow Maps for shadowing. I would like to render the shadowed scene to a texture, so I have another camera on top of the shadowed scene. It looks like following: root | RTTCamera | ShadowedScene I also render a small sized screen aligned quad which visualizes the texture to which RTTCamera is rendering. When there is no RTT, shadows work as expected. However, when I am rendering onto the texture, there are no shadows. I can see the scene on the screen aligned quad but it simply has no shadows. When I debugged the application I found out that StandardShadowMap::ViewData::selectLight does not find the light. Following line should normally give a list of matrices/attrib pairs rs-getPositionalStateContainer()-getAttrMatrixList(); however, it returns an empty list. I am not familiar with the implementation of Shadows in OSG so I got stuck here. Does anybody have an idea? Thanks, Tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] Render to texture and Shadows
Hi All, I am using Light Space Perspective Shadow Maps for shadowing. I would like to render the shadowed scene to a texture, so I have another camera on top of the shadowed scene. It looks like following: root | RTTCamera | ShadowedScene I also render a small sized screen aligned quad which visualizes the texture to which RTTCamera is rendering. When there is no RTT, shadows work as expected. However, when I am rendering onto the texture, there are no shadows. I can see the scene on the screen aligned quad but it simply has no shadows. When I debugged the application I found out that StandardShadowMap::ViewData::selectLight does not find the light. Following line should normally give a list of matrices/attrib pairs rs-getPositionalStateContainer()-getAttrMatrixList(); however, it returns an empty list. I am not familiar with the implementation of Shadows in OSG so I got stuck here. Does anybody have an idea? Thanks, Tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] sRGB frame buffer
I found out that one can simply call stateset-setMode( GL_FRAMEBUFFER_SRGB_EXT , osg::StateAttribute::ON); to enable sRGB mode. An explicit OSG support is not necessary. I didn't try something like this with sRGB texture formats yet. tugkan Tugkan Calapoglu wrote: Hi, I couldn't find sRGB related GLX tokens in the source code so it looks like sRGB color space is not supported by OSG at the moment. Am I right or did I miss something? I don't have any experience with sRGB framebuffers (apart from hearing about them at SIGGRAPH one year), but I agree that it looks like OSG doesn't support them yet. --J ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org . -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] sRGB frame buffer
Hi, I couldn't find sRGB related GLX tokens in the source code so it looks like sRGB color space is not supported by OSG at the moment. Am I right or did I miss something? Does anybody here have experience with sRGB frame buffers under Linux with NVidia hardware? tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] Using GLObjectsVisitor with DrawThreadPerContext
Hi, In our application we create new objects in runtime. Most of the time newly created objects are not in frustum in the beginning hence they do not result in a stutter But, when the object comes into the frustum texture download and display list creation results in stutters. It is acceptable for us to have stutter when the object is created so whenever we create an object we use GLObjectVisitor to compile GL objects. We do this in a FinalDrawCallback. The problem we are facing is that this results in crashes. The crash is very rare. Sometimes more than an hour passes with hundreds of objects created and destroyed. There are some computers; however, where the problem is far more frequent. The difference of these computers is that they have Quaddro cards and more powerful CPU's. Our application uses DrawThreadPerContext and all test computers have at least two CPU's. When we run in SingleThreaded mode we do not see any problems. So the question is whether it is safe to use GLObjectsVisitor in a FinalDrawCallback? If not, how should it be used? General info: OS: Suse 10.3 CPU: Several different CPU's (at least two CPU) Graphics : Several different NVidia cards OSG: 2.9.5 (SVN version 10258) Tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] OpenSceneGraph-2.8.1 release candidate four tagged
Hi, Following method is from ive plugin, ReaderWriterIVE.cpp. Is #if 1 forgotten here? In some other methods ive::Exception is caught within the plugin. In this method, however, the exception goes out of the plugin and has to be handled in the application. virtual ReadResult readNode(std::istream fin, const Options* options) const { #if 1 ive::DataInputStream in(fin, options); return in.readNode(); #else try{ // Create datainputstream. ive::DataInputStream in(fin, options); return in.readNode(); } catch(ive::Exception e) { return e.getError(); } #endif } tugkan Hi All, After a long delay - health, submissions backlog and other work all got in the way of a quick turn around on this rc4, I've finally got tagging OpenScenGraph-2.8.1-rc4. You can download it from: http://www.openscenegraph.org/projects/osg/wiki/Downloads Please test. If things go smoothly with testing today and over the weekend I'll go for the final 2.8.1 release early next week. Please post your successes/failures to this thread. Thanks in advance with you help on testing on as many machines and OS combinations that you can lay your hands on. Robert. -- Change Log since 2.8.0 2009-05-14 17:26 robert * NEWS.txt, README.txt: Updated dates 2009-05-14 17:25 robert * AUTHORS.txt, ChangeLog: Updated ChangeLog and AUTHOR file 2009-05-14 17:18 robert * applications/osgversion/CMakeLists.txt, applications/osgversion/Contributors.cpp, applications/osgversion/osgversion.cpp: Moved the Contributors generation code out into a separate source file that is only compiled when OSG_MAINTAINER is enable via ccamke. This has been done to prevent build issues on some machines with compilers that chose a different local to the one that the contributors names are encoded. 2009-05-13 08:35 robert * src/OpenThreads/win32, src/OpenThreads/win32/Win32Thread.cpp: From Thibault Genessay, On Windows, when a process tries to spawn one too many thread, _beginthreadex() fails but OpenThreads still waits on the startup Block before returning to the caller of OpenThreads::Thread::start(). This causes a deadlock. The return value of _beginthreadex() is actually checked, but after the call to OpenThreads::Block::block() so it is basically useless. Attached is a fix to move the check for the return value of _beginthreadex() before the call to block(), so that start() can return to the caller with a non-zero error code. This solves the problem for me. Merged from svn trunk using: svn merge -r 10190:10191 http://www.openscenegraph.org/svn/osg/OpenSceneGraph/trunk/src/OpenThreads/win32 2009-05-12 11:15 robert * ChangeLog, NEWS.txt, README.txt: Updated ChangeLog and NEWS/READER for release candidate 2009-05-12 11:12 robert * src/osgWrappers/osgUtil/IntersectVisitor.cpp: Updated wrappers 2009-05-12 10:49 robert * src/osgWidget, src/osgWidget/WindowManager.cpp: From Fajran Iman Rusadi, fixed to handling of widget indices in insert and remove methods. Merged from svn/trunk using: svn merge -r 10181:10182 http://www.openscenegraph.org/svn/osg/OpenSceneGraph/trunk/src/osgWidget 2009-05-08 12:38 robert * src/osgViewer/GraphicsWindowWin32.cpp: Added initializer for _mouseCursor 2009-05-08 07:49 robert * src/osgViewer/GraphicsWindowWin32.cpp: From Neil Groves, fixed unitialized variable 2009-05-07 15:59 robert * CMakeLists.txt: Updated release candidate to 4. 2009-05-07 15:14 robert * src/osgViewer/GraphicsWindowWin32.cpp: From Frederic Bouvier, workaround of setCursor problems under Windows. 2009-05-07 14:58 robert * src/osg/GraphicsContext.cpp: Fixed ABSOLUTE_RF slave camera resize policy, merged from svn trunk using: http://www.openscenegraph.org/svn/osg/OpenSceneGraph/trunk/src/ 2009-05-07 13:24 robert * src/osgDB, src/osgDB/CMakeLists.txt, src/osgDB/DatabasePager.cpp, src/osgDB/Registry.cpp: From Stephan Huber, while debugging a problem in finding plugins on OS X I discovered, that the conditional directives for setting the prepend-string in createLibraryNameForExtension were not in effect, because of the mixture of different #ifdef styles. I removed the conditional part for __APPLE__ completely to be more standard-conform with other platforms (plugins should be located in osgPlugins-X.X.X/). Because of the wrong syntax of the conditional compile the old code was not
Re: [osg-users] Performance comparison with older OSG versions
Hi Robert, I changed cmake files to ignore automatic selection and choose _OPENTHREADS_ATOMIC_USE_GCC_BUILTINS. After a recompile I do not see a difference in the performance. I dont have any experience with GCC builtins so I don't know if this can be normal. In my tests I observed that for this model and camera position enabling/disabling thread safe ref/unref makes around 20% performance difference. I thought GCC builtins should reduce this. Tugkan Robert Osfield wrote: Hi Tugkan, On Mon, Dec 8, 2008 at 11:04 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: With printf's I ensured that the mutex code was not called in ref() and unref(). Here are the results (removed printfs before measurement :) ): No ref_ptr : ~1.15 ms ref_ptr but thread safety off : ~1.35 ms ref_ptr with thread safety on : ~1.65 ms I was focused on cull so I didn't write down what happens with draw. What are model are you measuring for the above results, it seem like very short cull times, and not worthy of optimizing, so I presume this is for a very small test case. If the performance looks like it's going to break frame then worry about it, so in that vain could you try throwing at the system a model that breaks frame/or near breaks frame due to cull. The model in question is a small sized city. What worries me is, of course, not the 1.65 ms but the change from 1.15 to 1.65. We use this model for benchmarking but we delivered much larger databases to customers where a performance loss of this size would bring us well under 60Hz. I made these tests with SVN revisions 7327 and 7328 so things may be different now. But using pointers instead of ref_ptr seems to be better for performance. What kind of restrictions would using c pointers require? The danger in using C pointers come from when you are running the DrawThreadPerContext/ CullThreadPerCameraThreadPerContext threadings model dynamically removing StateSet and Drawables from the scene graph. This threading models are allow the draw thread to overlap with the update, event and cull traversals of the next frame, so it's possible to modify the scene graph in a way that deletes objects that are still being drawn which results in a crash. A follow up problem can occur once you exit the frame loop, as you may delete the scene graph before the draw threads have completed the last draw traversals. Since the OSG is used in some many different types of applications we need to make sure the defaults are robust across a wide range of usage models, so in this case the ref_ptr in the rendering backend is essential. We intend to use CullThreadPerCameraThreadPerContext or DrawThreadPerContext on multi core machines. I guess for this type applications ref_ptr version would be required anyway. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Performance comparison with older OSG versions
Hi Robert, Hi Tugkan, I have certainly measured a performance improvement with the move to using built ins, but it was only a couple of percent difference. Your finding of a 20% difference when using thread safe ref/unref is way beyond my own findings so perhaps the models you're testing have a different composition than the ones I use for benchmarking. Does you scene have lots of separate StateSet or Drawables or heavily loaded with Transforms? The other possibility is that your build just isn't coming together to provide as an efficient libs/binaries. Yes the scene has many state sets and drawables. We heavily optimize our scenes to reduce the number of state sets and drawables but going beyond a certain point is not possible. If I remember correctly, you also have the city model we are using for benchmarking. I think last year we gave it to you. I asked, but didn't get an answer, on what architecture and gcc you are building on. This is important as I wish to see what is particular about your build environement which might helps see why the built ins aren't being detected, and why you're getting such bad performance. gcc version 4.2.1 , 32 bit Suse 10.3 Intel Core2 2.66GHz tugkan Robert. On Wed, Dec 10, 2008 at 12:43 PM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: Hi Robert, I changed cmake files to ignore automatic selection and choose _OPENTHREADS_ATOMIC_USE_GCC_BUILTINS. After a recompile I do not see a difference in the performance. I dont have any experience with GCC builtins so I don't know if this can be normal. In my tests I observed that for this model and camera position enabling/disabling thread safe ref/unref makes around 20% performance difference. I thought GCC builtins should reduce this. Tugkan Robert Osfield wrote: Hi Tugkan, On Mon, Dec 8, 2008 at 11:04 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: With printf's I ensured that the mutex code was not called in ref() and unref(). Here are the results (removed printfs before measurement :) ): No ref_ptr : ~1.15 ms ref_ptr but thread safety off : ~1.35 ms ref_ptr with thread safety on : ~1.65 ms I was focused on cull so I didn't write down what happens with draw. What are model are you measuring for the above results, it seem like very short cull times, and not worthy of optimizing, so I presume this is for a very small test case. If the performance looks like it's going to break frame then worry about it, so in that vain could you try throwing at the system a model that breaks frame/or near breaks frame due to cull. The model in question is a small sized city. What worries me is, of course, not the 1.65 ms but the change from 1.15 to 1.65. We use this model for benchmarking but we delivered much larger databases to customers where a performance loss of this size would bring us well under 60Hz. I made these tests with SVN revisions 7327 and 7328 so things may be different now. But using pointers instead of ref_ptr seems to be better for performance. What kind of restrictions would using c pointers require? The danger in using C pointers come from when you are running the DrawThreadPerContext/ CullThreadPerCameraThreadPerContext threadings model dynamically removing StateSet and Drawables from the scene graph. This threading models are allow the draw thread to overlap with the update, event and cull traversals of the next frame, so it's possible to modify the scene graph in a way that deletes objects that are still being drawn which results in a crash. A follow up problem can occur once you exit the frame loop, as you may delete the scene graph before the draw threads have completed the last draw traversals. Since the OSG is used in some many different types of applications we need to make sure the defaults are robust across a wide range of usage models, so in this case the ref_ptr in the rendering backend is essential. We intend to use CullThreadPerCameraThreadPerContext or DrawThreadPerContext on multi core machines. I guess for this type applications ref_ptr version would be required anyway. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing
Re: [osg-users] Performance comparison with older OSG versions
Hi Robert, Hi Tugkan, My best guess would that your StateStet and Drawables aren't taking advantage of STATIC DataVariance, if the DataVariance isn't STATIC then the DrawTheadPerContext and CullThreadPerCameraDrawTheadPerContext threading models won't provide any frame overlap and won't provide any improvement in performance. I am loading an IVE file where I explicitly set STATIC data variance during file generation. I will double check this to be sure. Another thing to check up on is compiler optimization - make sure it's enabled. I used standard configure with cmake . -DCMAKE_BUILD_TYPE=Release . Finally have a look at what type of ref counting has been implemented, ideally cmake should have chosen atomic ref counting. Have a look at the include/OpenThreads/Config file for what type of ref counting has been selected. It does sound like you might be stuck using Mutex based ref counting. include/OpenThreads/Config has _OPENTHREADS_ATOMIC_USE_MUTEX instead of _OPENTHREADS_ATOMIC_USE_GCC_BUILTINS I tried to debug this. Here are my observations: 1- In CheckAtomicOps.cmake some small programs are compiled and run to see whether the architecture supports GCC_BUILTINS. I copied and pasted the one for GCC_BUILTINS to a file and tried to compile it. If I compile with g++ test.cpp I get following errors: test.cpp:(.text+0x43): undefined reference to `__sync_bool_compare_and_swap_4' test.cpp:(.text+0x6e): undefined reference to `__sync_bool_compare_and_swap_4' If I explicitly define an architecture as follows, g++ -march=i686 test.cpp, then compile succeeds and the resulting program runs and returns successfully. 2- I changed the CheckAtomicOps.cmake so that the results of the test programs are completely ignored and SET(_OPENTHREADS_ATOMIC_USE_GCC_BUILTINS 1) is called in any case. This successfully changes include/OpenThreads/Config to use _OPENTHREADS_ATOMIC_USE_GCC_BUILTINS. However, when I try to compile OSG I get linker errors similar to the above ones. 3- It can be seen in CMakeFiles/CMakeSystem.cmake that the architecture is correctly recognized as i686 ( SET(CMAKE_SYSTEM_PROCESSOR i686) line is present ). 4- I saw on a different Suse 10.3 computer that it also has_OPENTHREADS_ATOMIC_USE_MUTEX. I don't have much experience with cmake system so I don't know how I should continue with debugging. Can you give me some pointers? Tugkan Robert. On Thu, Dec 4, 2008 at 12:13 PM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: Hi All, I have been working on performance issues because after we ported our engine to OSG2.7.6 we lost some performance. In my tests I am using a complex town database which is pretty representative for our applications. I use osgviewer for the tests. I tested with several versions of OSG starting from 0.99 and I observe a gradual degredation of performance from 0.99 to 2.7.6 Most notably, the frame rate I get in 2.7.6 with CullThreadPerCameraDrawThreadPerContex is not any better than one I get from OSG0.99. Other threading modes give even worse results. It is expected that thread syncronization adds some overhead but I was hoping to get overall better performance from new multithreading models. Here are some of my observations: - From SVN revision 7327 to 7328 a significant loss of cull performance is observed. I found out that this is due to the addition of #define OSGUTIL_RENDERBACKEND_USE_REF_PTR I made some tests and measured following for SingleThreaded case: in 7327 cull time is ~1.15 in 7328 cull time is ~1.65 in 7328 where thread safe ref/unref is turned off cull time is ~1.35 - cull time gets even worse from 7328 onward. In latest SVN version I have 1.95 ms in SingleThreaded mode. I couldn't find yet where the performance loss occurs. - A similar performance loss is observable in draw times but I didn't have time yet to look closer at it. Did anybody else compared old OSG versions to newer ones? How was your results? Tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Performance comparison with older OSG versions
Hi Robert, Check the CMAKE_CXX_FLAGS_RELEASE, it should read something like -O3, CMAKE_CXX_FLAGS_RELEASE is set to -O3 -NDEBUG so thats ok I think. Potentially we could make the use of ref_ptr in the rendering back optional offering the faster but less robust using C pointers - perhaps a cmake build option, but it'll complicate the code a bit and require a little know knowledge from the application developer about the restrictions that such a change introduces. In my comparison tests I observed that using ref_ptr even though thread safe ref/unref is *turned off* results in a significant increase in cull time. With printf's I ensured that the mutex code was not called in ref() and unref(). Here are the results (removed printfs before measurement :) ): No ref_ptr : ~1.15 ms ref_ptr but thread safety off : ~1.35 ms ref_ptr with thread safety on : ~1.65 ms I was focused on cull so I didn't write down what happens with draw. I made these tests with SVN revisions 7327 and 7328 so things may be different now. But using pointers instead of ref_ptr seems to be better for performance. What kind of restrictions would using c pointers require? I'll have to defer to Mathias Froehlich on how best to detect the support for atomic ref counting as he's the author of the detection code. Ok, thanks. What gcc version do you have on your system? Are all your systems 32 or 64bit? gcc version 4.2.1 , 32 bit Suse 10.3 thanks tugkan HI Tugkan, I used standard configure with cmake . -DCMAKE_BUILD_TYPE=Release . Check the CMAKE_CXX_FLAGS_RELEASE, it should read something like -O3, I mention this as I had found that CMake 2.6.0 pulled down from the Ubuntu 8.10 repositories erroneously left this field blank. include/OpenThreads/Config has _OPENTHREADS_ATOMIC_USE_MUTEX instead of _OPENTHREADS_ATOMIC_USE_GCC_BUILTINS This is likely to be a large part of why you aren't getting the performance that isn't as good as it should be. When I did introduce the ref_ptr into the rendering backend I this was when we still just had mutex based ref_ptr and while there was a small performance hit, was at most only a couple of percent with vsync off. The change was introduced for robustness reasons - as the code required to prevent crashes when users deleted objects during the DrawThreadPerContext/CullThreadPerCameraDrawThreadPerContext modes with rather awkward and prone to bugs. Potentially we could make the use of ref_ptr in the rendering back optional offering the faster but less robust using C pointers - perhaps a cmake build option, but it'll complicate the code a bit and require a little know knowledge from the application developer about the restrictions that such a change introduces. I tried to debug this. Here are my observations: 1- In CheckAtomicOps.cmake some small programs are compiled and run to see whether the architecture supports GCC_BUILTINS. I copied and pasted the one for GCC_BUILTINS to a file and tried to compile it. If I compile with g++ test.cpp I get following errors: test.cpp:(.text+0x43): undefined reference to `__sync_bool_compare_and_swap_4' test.cpp:(.text+0x6e): undefined reference to `__sync_bool_compare_and_swap_4' If I explicitly define an architecture as follows, g++ -march=i686 test.cpp, then compile succeeds and the resulting program runs and returns successfully. 2- I changed the CheckAtomicOps.cmake so that the results of the test programs are completely ignored and SET(_OPENTHREADS_ATOMIC_USE_GCC_BUILTINS 1) is called in any case. This successfully changes include/OpenThreads/Config to use _OPENTHREADS_ATOMIC_USE_GCC_BUILTINS. However, when I try to compile OSG I get linker errors similar to the above ones. 3- It can be seen in CMakeFiles/CMakeSystem.cmake that the architecture is correctly recognized as i686 ( SET(CMAKE_SYSTEM_PROCESSOR i686) line is present ). 4- I saw on a different Suse 10.3 computer that it also has_OPENTHREADS_ATOMIC_USE_MUTEX. I don't have much experience with cmake system so I don't know how I should continue with debugging. Can you give me some pointers? I'll have to defer to Mathias Froehlich on how best to detect the support for atomic ref counting as he's the author of the detection code. What gcc version do you have on your system? Are all your systems 32 or 64bit? Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl
Re: [osg-users] Performance comparison with older OSG versions
Robert Osfield wrote: Hi Tugkan, On Mon, Dec 8, 2008 at 11:04 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: With printf's I ensured that the mutex code was not called in ref() and unref(). Here are the results (removed printfs before measurement :) ): No ref_ptr : ~1.15 ms ref_ptr but thread safety off : ~1.35 ms ref_ptr with thread safety on : ~1.65 ms I was focused on cull so I didn't write down what happens with draw. What are model are you measuring for the above results, it seem like very short cull times, and not worthy of optimizing, so I presume this is for a very small test case. If the performance looks like it's going to break frame then worry about it, so in that vain could you try throwing at the system a model that breaks frame/or near breaks frame due to cull. The model in question is a small sized city. What worries me is, of course, not the 1.65 ms but the change from 1.15 to 1.65. We use this model for benchmarking but we delivered much larger databases to customers where a performance loss of this size would bring us well under 60Hz. I made these tests with SVN revisions 7327 and 7328 so things may be different now. But using pointers instead of ref_ptr seems to be better for performance. What kind of restrictions would using c pointers require? The danger in using C pointers come from when you are running the DrawThreadPerContext/ CullThreadPerCameraThreadPerContext threadings model dynamically removing StateSet and Drawables from the scene graph. This threading models are allow the draw thread to overlap with the update, event and cull traversals of the next frame, so it's possible to modify the scene graph in a way that deletes objects that are still being drawn which results in a crash. A follow up problem can occur once you exit the frame loop, as you may delete the scene graph before the draw threads have completed the last draw traversals. Since the OSG is used in some many different types of applications we need to make sure the defaults are robust across a wide range of usage models, so in this case the ref_ptr in the rendering backend is essential. We intend to use CullThreadPerCameraThreadPerContext or DrawThreadPerContext on multi core machines. I guess for this type applications ref_ptr version would be required anyway. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] Performance comparison with older OSG versions
Hi All, I have been working on performance issues because after we ported our engine to OSG2.7.6 we lost some performance. In my tests I am using a complex town database which is pretty representative for our applications. I use osgviewer for the tests. I tested with several versions of OSG starting from 0.99 and I observe a gradual degredation of performance from 0.99 to 2.7.6 Most notably, the frame rate I get in 2.7.6 with CullThreadPerCameraDrawThreadPerContex is not any better than one I get from OSG0.99. Other threading modes give even worse results. It is expected that thread syncronization adds some overhead but I was hoping to get overall better performance from new multithreading models. Here are some of my observations: - From SVN revision 7327 to 7328 a significant loss of cull performance is observed. I found out that this is due to the addition of #define OSGUTIL_RENDERBACKEND_USE_REF_PTR I made some tests and measured following for SingleThreaded case: in 7327 cull time is ~1.15 in 7328 cull time is ~1.65 in 7328 where thread safe ref/unref is turned off cull time is ~1.35 - cull time gets even worse from 7328 onward. In latest SVN version I have 1.95 ms in SingleThreaded mode. I couldn't find yet where the performance loss occurs. - A similar performance loss is observable in draw times but I didn't have time yet to look closer at it. Did anybody else compared old OSG versions to newer ones? How was your results? Tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Bug in viewer
Hi Robert, Hi Tugkan, I've just tried the same thing on my system and it worked fine. What window manager are you using? I use KDE 3.5.7 on Suse10.3. I tested with Gnome (buggy) and FVWM (seems ok). I also made a test with a different computer with Suse10.3 and the same KDE version. There it works. An important difference is that my machine (where the bug is observed) has a dual monitor configuration. A couple of things you could try: Enable vsync Didn't help. Disable any 3D desktop effects if they are on They are off. Tugkan. Robert. On Tue, Nov 25, 2008 at 8:02 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: Hi all, I observe a problem in my own application and also on osgviewer. There are two images that show the problem attached to the email. I use following command: osgviewer --window 0 0 1280 1024 cow.osg After 3D window appears I press S to tun on statistics. The result is shot1.jpg. You can see that frame rate and threading model is not visible. After I minimize and maximize the window shot2.jpg appears. Sometimes osgviewer starts correctly but most of the time not. Here are details regarding the test environment: Suse 10.3 OpenGL renderer string: GeForce 7900 GTX/PCI/SSE2 OpenGL version string: 2.1.2 NVIDIA 173.13 OSG version : latest SVN version Regards, Tugkan Calapoglu, VIRES Simulationstechnologie GmbH ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Bug in viewer
Hi Robert, This is not a Gnome only problem. KDE and Gnome both have it. Only FVWM worked. (I didn't try with any other WM). It seems to be about the dual monitor configuration because a single monitor computer (same OS and KDE) does not have the problem. Tugkan Hi Tugkan, On Wed, Nov 26, 2008 at 10:14 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: I use KDE 3.5.7 on Suse10.3. I tested with Gnome (buggy) and FVWM (seems ok). I also made a test with a different computer with Suse10.3 and the same KDE version. There it works. An important difference is that my machine (where the bug is observed) has a dual monitor configuration. My guess is that the Gnome isn't sending the same X11 window events to osgViewer::GraphicsWindowX11 so it isn't reposition things. Or it could be just a bug in Gnome. I'm not a Gnome user, nor an Gnome/X11 expert so I'll have to defer to others which more knowledge/experience in this area to what might be going amiss. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Bug in viewer
Then everything looks ok. Actually, any type of interaction with the window (for example moving it) seems to solve the problem. Hi, what happens when you make it fullscreen and then windowed again? jp Tugkan Calapoglu wrote: Hi Robert, This is not a Gnome only problem. KDE and Gnome both have it. Only FVWM worked. (I didn't try with any other WM). It seems to be about the dual monitor configuration because a single monitor computer (same OS and KDE) does not have the problem. Tugkan Hi Tugkan, On Wed, Nov 26, 2008 at 10:14 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: I use KDE 3.5.7 on Suse10.3. I tested with Gnome (buggy) and FVWM (seems ok). I also made a test with a different computer with Suse10.3 and the same KDE version. There it works. An important difference is that my machine (where the bug is observed) has a dual monitor configuration. My guess is that the Gnome isn't sending the same X11 window events to osgViewer::GraphicsWindowX11 so it isn't reposition things. Or it could be just a bug in Gnome. I'm not a Gnome user, nor an Gnome/X11 expert so I'll have to defer to others which more knowledge/experience in this area to what might be going amiss. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Bug in viewer
Robert Osfield wrote: Hi Tugkan, On Wed, Nov 26, 2008 at 11:05 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: This is not a Gnome only problem. KDE and Gnome both have it. Only FVWM worked. (I didn't try with any other WM). Um.. just re-read your post and it kinda looked like Gnome had the buggy label attached to it. Could you try to be more precise as it's easy to get the wrong end of the stick. Hmm, as I remember I mentioned that I use KDE and also tested on Gnome and FVWM. Here is a list: Computer OS WM Monitors Result ASuse 10.3 KDE2buggy ASuse 10.3 Gnome 2buggy ASuse 10.3 FVWM 2buggy BSuse 10.3 KDE1OK Note: The difference between computer A and B is not only monitor configuration. It seems to be about the dual monitor configuration because a single monitor computer (same OS and KDE) does not have the problem. I've been using dual monitor set ups with KDE without problems so I don't believe this is specific to dual monitor. My guess it's a window manager/GraphicsWindowX11 issue of some kind. As I wrote above the monitor configuration is not the only difference between the test computers so I can't claim that should be the problem. It only seemed to be a useful hint because window initialization code might follow different paths for these configurations. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] scene graph access in multithreaded mode
Hi Robert, Hi Tugkan, On Wed, Sep 3, 2008 at 9:56 AM, Tugkan Calapoglu [EMAIL PROTECTED] wrote: I am porting an application from OSG0.99 to latest SVN version. Wow, climb aboard the time machine :-) I know :). We never had the time slot for the upgrade until now. I have implemented the changes you suggested and it seems to work, no random crashed anymore in CullThreadPerCameraDrawThreadPerContext mode. Thanks for the help, tugkan I have two questions (both for CullThreadPerCameraDrawThreadPerContext mode): 1- Is it safe to make changes in the scene graph outside the frame call (without using update etc. callbacks )? Or is the only place where we have safe scene graph acess is inside a callback? Mailing list and source code reading made me think that outside the frame() should be safe but I am getting some crashes. Before delving into my own code I'd like that someone confirms this. Modifying the scene graph outside of the frame call is safe in SingleThreader, and CullDrawThreadPerCamera threading models as they don't leave any threads active after the end of the renderingTraversals() method (called from frame()). With DrawThreadPerContext and CullThreadPerCamewraDrawThreadPerContext the draw threads will still be active on completion of the renderingTraversals(), so if you modifying drawables and state that the thread is still reading from in the draw traversal you will end up with problems - and potential crashes. There is a standard mechanism to deal with this issue - and that is the renderingTraversals() method to block till all dynamic objects in the draw traversals have been dispatched. The way you tell the draw traversal that an drawable or stateset will be modified dynamically is to set its data variance to DYNAMIC. drawable-setDataVariance(osg::Object::DYNAMIC); stateset-setDataVariance(osg::Object::DYNAMIC); This is mentioned in the Quick Start Guide book, as well as many times on the osg-users mailing list so have a look through the archives if you want more background reading. 2- Is it ok to change the global state set during rendering outside frame call? I have following code runing before frame() is called : osg::Camera* cam = getViewer()-getView(i)-getCamera(); cam-getOrCreateStateSet()-clear(); cam-getOrCreateStateSet()-setGlobalDefaults(); ... And some other changes ... Same requirement - if you are modifying the stateset then you'll need to set its data variance to DYNAMIC. For a StateSet that decorates the whole scene graph you'll end you holding back the frame till the whole scene graph is completed, so it won't have any performance advantage over CullDrawThreadPerContext. You can double buffer objects to allow you to retain the STATIC data variance and keep the threads overlapping, to do this you do: osg::Camera* cam = getViewer()-getView(i)-getCamera(); cam-setStateSet() = new StateSet; // this is where we just use a new StateSet rather than modify the previous one cam-getOrCreateStateSet()-setGlobalDefaults(); ... And some other changes ... The draw traversal takes a reference to the StateSet and Drawables so it's safe to go an remove them from the scene graph outside the frame() call, this isn't something makes then dynamic so you won't need to set their data variance to DYNAMIC. Robert. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 email[EMAIL PROTECTED] internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org