Hi All, Yesterday I did testing on my new Intel quad core + dual 7800GT system and found performance in single threaded sometimes exceeded the multi-threaded models, which is seriously screwy. So I investigated.
Looking at the draw stats it was clear that the draw dispatch by two threads/cores to two graphics context wasn't scaling well at all, with the single threaded draw dispatch total time being lower than a single draw dispatch when run multi-threaded. The per draw dispatch stats for each camera should that running multi-threaded was more than twice as slow. Clearly something in system between the CPU cores and the GPU is scaling very very poorly. Is it CPU front side bus? Is it the OpenGL driver not properly managing multiple graphics contexts? Is the chipset not properly disapatching data in parallel to two cards? I don't know the answer. As a test this morning I stuck a static Mutex into the osgViewer::Renderer's draw dispatch code to prevent the draw dispatch for the cameras from running in parallel - the cull can still run in parallel, but not the draw dispatch. The result was startling - a overall performance (i.e. fps) boost of 50-77% on the test models I've thrown at it. The CullThreadPerCameraDrawThreadPerContext benefiting the most. This also allows the multi-threaded models to out perform single threaded as one would expect. I haven't tried out these changes on my Athlon dual core system, but can't do this right away as I've taken the Gfx cards out of it for this system, but perhaps others can do similar tests. My expectation that different systems will exhibit different performance characteristics when running multi-threaded - a well balanced system should work best without serialization of the draw dispatch, but how many of our modern systems are well balanced?? I have checked in my changes to osgViewer::Renderer and osg::DisplaySettings to support the new serializer, so a svn update will get these. Since the performance difference is so colossal on my system and I expect that my system is closer to common set up of modern multi GPU systems I've made the default to use the serialize to true. To toggle the serializer: osgviewer mymodel.ive --serialize-draw ON osgviewer mymodel.ive --serialize-draw OFF Or export OSG_SERIALIZE_DRAW_DISPATCH=ON export OSG_SERIALIZE_DRAW_DISPATCH=OFF osgviewer cow.osg Replace export for setenv or set according to what your native platform is. It would be interesting to here from others with multi-CPU, multi-GPU systems to see how they fair. I am also curious about systems like AMD's 4x4 system, where they have two CPU sockets with a chipset connecting to the Gfx slots each. Does any one have one? It could be that this system scales better than my Intel quad core system. On this system I do actually have two chipsets too - they provide 2 x 16x PCIExpress + 1x8x PCIExpress bandwidth to the three Gfx slots, but it kinda looks like this isn't working properly, or perhaps its the OpenGL driver that sucks at multi-thread, multi-GPU... Robert. _______________________________________________ osg-users mailing list [email protected] http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

