Hi All,

Yesterday I did testing on my new Intel quad core + dual 7800GT system
and found performance in single threaded sometimes exceeded the
multi-threaded models, which is seriously screwy.  So I investigated.

Looking at the draw stats it was clear that the draw dispatch by two
threads/cores to two graphics context wasn't scaling well at all, with
the single threaded  draw dispatch total time being lower than a
single draw dispatch when run multi-threaded.  The per draw dispatch
stats for each camera should that running multi-threaded was more than
twice as slow.   Clearly something in system between the CPU cores and
the GPU is scaling very very poorly.  Is it CPU front side bus?  Is it
the OpenGL driver not properly managing multiple graphics contexts?
Is the chipset not properly disapatching data in parallel to two
cards?   I don't know the answer.

As a test this morning I stuck a static Mutex into the
osgViewer::Renderer's draw dispatch code to prevent the draw dispatch
for the cameras from running in parallel - the cull can still run in
parallel, but not the draw dispatch.   The result was startling - a
overall performance (i.e. fps) boost of 50-77% on the test models I've
thrown at it.  The CullThreadPerCameraDrawThreadPerContext benefiting
the most.  This also allows the multi-threaded models to out perform
single threaded as one would expect.

I haven't tried out these changes on my Athlon dual core system, but
can't do this right away as I've taken the Gfx cards out of it for
this system, but perhaps others can do similar tests.  My expectation
that different systems will exhibit different performance
characteristics when running multi-threaded - a well balanced system
should work best without serialization of the draw dispatch, but how
many of our modern systems are well balanced??

I have checked in my changes to osgViewer::Renderer and
osg::DisplaySettings to support the new serializer, so a svn update
will get these.  Since the performance difference is so colossal on my
system and I expect that my system is closer to common set up of
modern multi GPU systems I've made the default to use the serialize to
true.

To toggle the serializer:

  osgviewer mymodel.ive --serialize-draw ON
  osgviewer mymodel.ive --serialize-draw OFF

Or

  export OSG_SERIALIZE_DRAW_DISPATCH=ON
  export OSG_SERIALIZE_DRAW_DISPATCH=OFF
  osgviewer cow.osg

Replace export for setenv or set according to what your native platform is.

It would be interesting to here from others with multi-CPU, multi-GPU
systems to see how they fair.

I am also curious about systems like AMD's 4x4 system, where they have
two CPU sockets with a chipset connecting to the Gfx slots each.  Does
any one have one?  It could be that this system scales better than my
Intel quad core system.  On this system I do actually have two
chipsets too - they provide 2 x 16x PCIExpress + 1x8x PCIExpress
bandwidth to the three Gfx slots, but it kinda looks like this isn't
working properly, or perhaps its the OpenGL driver that sucks at
multi-thread, multi-GPU...

Robert.
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to