[osg-users] Parallel draw on a multi-core, single GPU architecture

2010-08-18 Thread Tugkan Calapoglu
Hi all,

we have a performance problem where the draw thread seems to be the
bottleneck. The target hw has 8 cores and a single GPU.

We already run cull and draw in parallel but parallelizing draw itself
would also help.

So I tried following setup:

2 windows
2 cameras (each on one window)
threading model: CullThreadPerCameraDrawThreadPerContext

In this mode, I can see that I have two cull and two draw threads and
they are assigned to separate cores. However, the performance is 1/2 of
my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does
not make any difference.

Is there anything else I have to do to get good performance from above
setup (e.g. some more env. vars to set)? Or should I try something
completely different?


thanks,
tugkan
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Parallel draw on a multi-core, single GPU architecture

2010-08-18 Thread Robert Osfield
Hi Tugkan,

It only ever makes sense to parallelize draw dispatch when you have
multiple graphics cards and one context per graphics card.  Trying to
add more threads than OpenGL/graphics card can handle will just stall
things and result in lower performance.

If your draw dispatch is overloaded then consider optimizing your
scene graph.  There are many ways of doings, which way that might
appropriate will depend entirely on the needs of your scenes and your
application.  Since I don't know anything about these specifics I
can't provide any direction.

Robert.

On Wed, Aug 18, 2010 at 9:14 AM, Tugkan Calapoglu tug...@vires.com wrote:
 Hi all,

 we have a performance problem where the draw thread seems to be the
 bottleneck. The target hw has 8 cores and a single GPU.

 We already run cull and draw in parallel but parallelizing draw itself
 would also help.

 So I tried following setup:

 2 windows
 2 cameras (each on one window)
 threading model: CullThreadPerCameraDrawThreadPerContext

 In this mode, I can see that I have two cull and two draw threads and
 they are assigned to separate cores. However, the performance is 1/2 of
 my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does
 not make any difference.

 Is there anything else I have to do to get good performance from above
 setup (e.g. some more env. vars to set)? Or should I try something
 completely different?


 thanks,
 tugkan
 ___
 osg-users mailing list
 osg-users@lists.openscenegraph.org
 http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Parallel draw on a multi-core, single GPU architecture

2010-08-18 Thread Tugkan Calapoglu
Hi Robert,

thanks for the answer. Our scene graph is heavily optimized and there
doesn't seem to be any serious optimization possibilities. Maybe a few
percent improvement from here and there.

We observe that, as our databases grow, the performance requirement for
draw thread grows faster than cull. In the long run, draw will be a
serious limiting factor for us. So I wanted to see if we can take
advantage of hardware to solve the problem.

tugkan



 Hi Tugkan,
 
 It only ever makes sense to parallelize draw dispatch when you have
 multiple graphics cards and one context per graphics card.  Trying to
 add more threads than OpenGL/graphics card can handle will just stall
 things and result in lower performance.
 
 If your draw dispatch is overloaded then consider optimizing your
 scene graph.  There are many ways of doings, which way that might
 appropriate will depend entirely on the needs of your scenes and your
 application.  Since I don't know anything about these specifics I
 can't provide any direction.
 
 Robert.
 
 On Wed, Aug 18, 2010 at 9:14 AM, Tugkan Calapoglu tug...@vires.com wrote:
 Hi all,

 we have a performance problem where the draw thread seems to be the
 bottleneck. The target hw has 8 cores and a single GPU.

 We already run cull and draw in parallel but parallelizing draw itself
 would also help.

 So I tried following setup:

 2 windows
 2 cameras (each on one window)
 threading model: CullThreadPerCameraDrawThreadPerContext

 In this mode, I can see that I have two cull and two draw threads and
 they are assigned to separate cores. However, the performance is 1/2 of
 my original setup. Setting OSG_SERIALIZE_DRAW_DISPATCH to ON or OFF does
 not make any difference.

 Is there anything else I have to do to get good performance from above
 setup (e.g. some more env. vars to set)? Or should I try something
 completely different?


 thanks,
 tugkan
 ___
 osg-users mailing list
 osg-users@lists.openscenegraph.org
 http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

 ___
 osg-users mailing list
 osg-users@lists.openscenegraph.org
 http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
 


-- 
Tugkan Calapoglu

-
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone+49.8031.463641
fax  +49.8031.463645
emailtug...@vires.com
internet www.vires.com
-
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein  HRB 10410
Geschaeftsfuehrer: Marius Dupuis
   Wunibald Karl
-
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Parallel draw on a multi-core, single GPU architecture

2010-08-18 Thread Robert Osfield
Hi Tugkan,

On Wed, Aug 18, 2010 at 9:49 AM, Tugkan Calapoglu tug...@vires.com wrote:
 We observe that, as our databases grow, the performance requirement for
 draw thread grows faster than cull. In the long run, draw will be a
 serious limiting factor for us. So I wanted to see if we can take
 advantage of hardware to solve the problem.

Does this not point out that perhaps the scene graph is become less
well balanced over time?

I'd look closely at what in the draw dispatch is becoming the
bottleneck and how to manage this more efficient.  Typically the best
way to reduce draw dispatch is to batch the geometry and state in more
coarsed grained way.  One can also use different approaches to achieve
the same/similar results.  For instance perhaps you can cull more
effectively, or perhaps introduce LOD more effectively.

Robert.
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Parallel draw on a multi-core, single GPU architecture

2010-08-18 Thread Tugkan Calapoglu
Hi Robert,

our databases are for driving simulation and they are usually city
streets. We have been employing batching and occlusion culling with very
good results so far.

Extensive use of LOD's is problematic because

1- They actually reduce performance in most of the cases
2- The switching effect (even the slightest) is found very disturbing by
the customers.

So with LOD's we are somewhat conservative (but we use them nevertheless)

Actually, draw always required more time than cull. If you look OSG
examples, draw usually takes more time than cull. Now that draw reaches
16 ms border, this started to hurt us.

I think I should retract my sentence from previous email. Draw is not
slowing down faster than cull; I did not really make such an observation.

My actual observation is that, both cull and draw require more CPU time
with larger scenes, but since draw is generally more performance hungry,
it becomes bottleneck earlier.



tugkan

 Hi Tugkan,
 
 On Wed, Aug 18, 2010 at 9:49 AM, Tugkan Calapoglu tug...@vires.com wrote:
 We observe that, as our databases grow, the performance requirement for
 draw thread grows faster than cull. In the long run, draw will be a
 serious limiting factor for us. So I wanted to see if we can take
 advantage of hardware to solve the problem.
 
 Does this not point out that perhaps the scene graph is become less
 well balanced over time?
 
 I'd look closely at what in the draw dispatch is becoming the
 bottleneck and how to manage this more efficient.  Typically the best
 way to reduce draw dispatch is to batch the geometry and state in more
 coarsed grained way.  One can also use different approaches to achieve
 the same/similar results.  For instance perhaps you can cull more
 effectively, or perhaps introduce LOD more effectively.
 
 Robert.
 ___
 osg-users mailing list
 osg-users@lists.openscenegraph.org
 http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
 


-- 
Tugkan Calapoglu

-
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone+49.8031.463641
fax  +49.8031.463645
emailtug...@vires.com
internet www.vires.com
-
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein  HRB 10410
Geschaeftsfuehrer: Marius Dupuis
   Wunibald Karl
-
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] Parallel draw on a multi-core, single GPU architecture

2010-08-18 Thread Robert Osfield
Hi Tugkan,

On Wed, Aug 18, 2010 at 10:39 AM, Tugkan Calapoglu tug...@vires.com wrote:
 Actually, draw always required more time than cull. If you look OSG
 examples, draw usually takes more time than cull. Now that draw reaches
 16 ms border, this started to hurt us.

It is typical for cull to be shorted than draw dispatch, which ever
traversal is breaking frame will be the one you need to address.
Thinking that you can thread draw dispatch to make it faster is not
going to help though as it will do more harm than good as it's the
actually just dispatching data to the OpenGL fifo, filling it from two
threads will just cause lots of issues in OpenGL driver that has to
manage two contexts and card as well as CPU cache coherency.

I must add that draw dispatch is a little complex to profile though as
it depends upon the being able to fill the OpenGL fifo, and if the
fifo fills up it'll block the draw dispatch.  This effect can lead to
the draw dispatch looking like it's breaking frame but it's actually
the OpenGL driver/graphics card that is the bottleneck.

 My actual observation is that, both cull and draw require more CPU time
 with larger scenes, but since draw is generally more performance hungry,
 it becomes bottleneck earlier.

This is something one would expect, ideally you'd have LOD'ing to keep
the large scenes well balanced - for instance it's possible to LOD the
whole earth and use paging and still get 60Hz, so while you might have
Terrabytes of data the app only has what it memory what it needs with
the complexity kept relatively even throughout the database.   High
fidelity town database will of course push things more than usual
terrain paging app, but the same principles can apply, although one
might need to get more creative with the LOD generation and
management.

For yourselves LOD'ing may not be an appropriate solution, or even
required.  I believe the key question you need to ask is why draw
dispatch is getting so expensive.  Is it the OpenGL fifo blocking?  Is
it that you have too much separate state?  Too many transforms?  Too
many separate drawables?

Solutions will vary depending upon what is the bottleneck so it's
important to work to pinpoint it.

Robert.
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org