[osg-users] Parallel draw on a multi-core, single GPU architecture
Hi all, we have a performance problem where the draw thread seems to be the bottleneck. The target hw has 8 cores and a single GPU. We already run cull and draw in parallel but parallelizing draw itself would also help. So I tried following setup: 2 windows 2 cameras (each on one window) threading model: CullThreadPerCameraDrawThreadPerContext In this mode, I can see that I have two cull and two draw threads and they are assigned to separate cores. However, the performance is 1/2 of my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does not make any difference. Is there anything else I have to do to get good performance from above setup (e.g. some more env. vars to set)? Or should I try something completely different? thanks, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Parallel draw on a multi-core, single GPU architecture
Hi Tugkan, It only ever makes sense to parallelize draw dispatch when you have multiple graphics cards and one context per graphics card. Trying to add more threads than OpenGL/graphics card can handle will just stall things and result in lower performance. If your draw dispatch is overloaded then consider optimizing your scene graph. There are many ways of doings, which way that might appropriate will depend entirely on the needs of your scenes and your application. Since I don't know anything about these specifics I can't provide any direction. Robert. On Wed, Aug 18, 2010 at 9:14 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi all, we have a performance problem where the draw thread seems to be the bottleneck. The target hw has 8 cores and a single GPU. We already run cull and draw in parallel but parallelizing draw itself would also help. So I tried following setup: 2 windows 2 cameras (each on one window) threading model: CullThreadPerCameraDrawThreadPerContext In this mode, I can see that I have two cull and two draw threads and they are assigned to separate cores. However, the performance is 1/2 of my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does not make any difference. Is there anything else I have to do to get good performance from above setup (e.g. some more env. vars to set)? Or should I try something completely different? thanks, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Parallel draw on a multi-core, single GPU architecture
Hi Robert, thanks for the answer. Our scene graph is heavily optimized and there doesn't seem to be any serious optimization possibilities. Maybe a few percent improvement from here and there. We observe that, as our databases grow, the performance requirement for draw thread grows faster than cull. In the long run, draw will be a serious limiting factor for us. So I wanted to see if we can take advantage of hardware to solve the problem. tugkan Hi Tugkan, It only ever makes sense to parallelize draw dispatch when you have multiple graphics cards and one context per graphics card. Trying to add more threads than OpenGL/graphics card can handle will just stall things and result in lower performance. If your draw dispatch is overloaded then consider optimizing your scene graph. There are many ways of doings, which way that might appropriate will depend entirely on the needs of your scenes and your application. Since I don't know anything about these specifics I can't provide any direction. Robert. On Wed, Aug 18, 2010 at 9:14 AM, Tugkan Calapoglu tug...@vires.com wrote: Hi all, we have a performance problem where the draw thread seems to be the bottleneck. The target hw has 8 cores and a single GPU. We already run cull and draw in parallel but parallelizing draw itself would also help. So I tried following setup: 2 windows 2 cameras (each on one window) threading model: CullThreadPerCameraDrawThreadPerContext In this mode, I can see that I have two cull and two draw threads and they are assigned to separate cores. However, the performance is 1/2 of my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does not make any difference. Is there anything else I have to do to get good performance from above setup (e.g. some more env. vars to set)? Or should I try something completely different? thanks, tugkan ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Parallel draw on a multi-core, single GPU architecture
Hi Tugkan, On Wed, Aug 18, 2010 at 9:49 AM, Tugkan Calapoglu tug...@vires.com wrote: We observe that, as our databases grow, the performance requirement for draw thread grows faster than cull. In the long run, draw will be a serious limiting factor for us. So I wanted to see if we can take advantage of hardware to solve the problem. Does this not point out that perhaps the scene graph is become less well balanced over time? I'd look closely at what in the draw dispatch is becoming the bottleneck and how to manage this more efficient. Typically the best way to reduce draw dispatch is to batch the geometry and state in more coarsed grained way. One can also use different approaches to achieve the same/similar results. For instance perhaps you can cull more effectively, or perhaps introduce LOD more effectively. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Parallel draw on a multi-core, single GPU architecture
Hi Robert, our databases are for driving simulation and they are usually city streets. We have been employing batching and occlusion culling with very good results so far. Extensive use of LOD's is problematic because 1- They actually reduce performance in most of the cases 2- The switching effect (even the slightest) is found very disturbing by the customers. So with LOD's we are somewhat conservative (but we use them nevertheless) Actually, draw always required more time than cull. If you look OSG examples, draw usually takes more time than cull. Now that draw reaches 16 ms border, this started to hurt us. I think I should retract my sentence from previous email. Draw is not slowing down faster than cull; I did not really make such an observation. My actual observation is that, both cull and draw require more CPU time with larger scenes, but since draw is generally more performance hungry, it becomes bottleneck earlier. tugkan Hi Tugkan, On Wed, Aug 18, 2010 at 9:49 AM, Tugkan Calapoglu tug...@vires.com wrote: We observe that, as our databases grow, the performance requirement for draw thread grows faster than cull. In the long run, draw will be a serious limiting factor for us. So I wanted to see if we can take advantage of hardware to solve the problem. Does this not point out that perhaps the scene graph is become less well balanced over time? I'd look closely at what in the draw dispatch is becoming the bottleneck and how to manage this more efficient. Typically the best way to reduce draw dispatch is to batch the geometry and state in more coarsed grained way. One can also use different approaches to achieve the same/similar results. For instance perhaps you can cull more effectively, or perhaps introduce LOD more effectively. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Tugkan Calapoglu - VIRES Simulationstechnologie GmbH Oberaustrasse 34 83026 Rosenheim Germany phone+49.8031.463641 fax +49.8031.463645 emailtug...@vires.com internet www.vires.com - Sitz der Gesellschaft: Rosenheim Handelsregister Traunstein HRB 10410 Geschaeftsfuehrer: Marius Dupuis Wunibald Karl - ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Parallel draw on a multi-core, single GPU architecture
Hi Tugkan, On Wed, Aug 18, 2010 at 10:39 AM, Tugkan Calapoglu tug...@vires.com wrote: Actually, draw always required more time than cull. If you look OSG examples, draw usually takes more time than cull. Now that draw reaches 16 ms border, this started to hurt us. It is typical for cull to be shorted than draw dispatch, which ever traversal is breaking frame will be the one you need to address. Thinking that you can thread draw dispatch to make it faster is not going to help though as it will do more harm than good as it's the actually just dispatching data to the OpenGL fifo, filling it from two threads will just cause lots of issues in OpenGL driver that has to manage two contexts and card as well as CPU cache coherency. I must add that draw dispatch is a little complex to profile though as it depends upon the being able to fill the OpenGL fifo, and if the fifo fills up it'll block the draw dispatch. This effect can lead to the draw dispatch looking like it's breaking frame but it's actually the OpenGL driver/graphics card that is the bottleneck. My actual observation is that, both cull and draw require more CPU time with larger scenes, but since draw is generally more performance hungry, it becomes bottleneck earlier. This is something one would expect, ideally you'd have LOD'ing to keep the large scenes well balanced - for instance it's possible to LOD the whole earth and use paging and still get 60Hz, so while you might have Terrabytes of data the app only has what it memory what it needs with the complexity kept relatively even throughout the database. High fidelity town database will of course push things more than usual terrain paging app, but the same principles can apply, although one might need to get more creative with the LOD generation and management. For yourselves LOD'ing may not be an appropriate solution, or even required. I believe the key question you need to ask is why draw dispatch is getting so expensive. Is it the OpenGL fifo blocking? Is it that you have too much separate state? Too many transforms? Too many separate drawables? Solutions will vary depending upon what is the bottleneck so it's important to work to pinpoint it. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org