On Wed, Jan 25, 2012 at 10:59 PM, Jason Daly <[email protected]> wrote: > > Hi, all, > > This is a general request to the community for some advice and expertise. > This is a bit lengthy, but if you can spare a few minutes to look over this > and send along your thoughts, we would really appreciate it. > > I've been working on a project for NIST recently. For background, you might > find this osg-users thread from December 2010 useful: > > http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/63954/focus=64014 > > > Briefly, an OSG-based test application loads a scene and displays it in a > window on one or more screens (Single Viewer, multiple slave cameras, one > GPU and context per screen). The problem was that a single screen would > draw the scene at a given frame rate, but as additional screens were added, > the frame rate would drop significantly (427 fps on one screen, 396 fps on > two, 291 on four). ... > At this point, I started looking for something else to blame. Examining the > data set itself, I discovered that it was composed of about 5500 triangle > strips, none of which were longer than 112 vertices (the data set had about > 600,000 vertices total). There were only about 10 different StateSets in > the scene, so state changes aren't a problem. After some digging, I found > the MeshOptimizers portion of osgUtil::Optimizer, and based on a message I > found from Jean-Sebastian, I tried a pass of VERTEX_PRETRANSFORM | > INDEX_MESH | VERTEX_POSTTRANSFORM, followed by another pass of MERGE_GEODES > | MERGE_GEOMETRY. This reduced the number of draw calls from around 5500 to > 9, and completely eliminated the scalability problem for both the OSG test > program, and the pure OpenGL program. This leads me to believe that bus > contention was causing the lack of scalability. As more screens were added, > the thousands of draw calls required by the unoptimized data set couldn't > fit within the bus bandwidth, effectively causing the draw calls to take > longer. The unoptimized data, only requiring 9 draw calls per screen, could > easily fit. > That is what I found at the time of that thread, testing with some models supplied by NIST. Newer hardware is highly optimized for massive geometry.
For the record, I don't think the problem is bus contention, but lock contention in the NVidia driver. The amount of bus traffic involved in the OpenGL command stream is tiny. > So, here are our questions. Does it make sense that bus contention would be > causing the lack of scalability? Are the mesh optimizations mentioned above > the most effective way to solve the problem? Are there any cases where the > mesh optimizations wouldn't be sufficient, and additional steps would need > to be taken (I briefly mentioned state changes above, which could be > problematic, anything else)? Why doesn't the 64x data set seem to scale as > well as the 1x and 8x data sets (does this indicate that the bottleneck has > moved from the bus to somewhere else)? > > Any thoughts on these issues or other thoughts you could provide would be > very valuable. The key to OpenGL performance in the 21st century is reducing the number of OpenGL function calls to a minimum... so says Captain Obvious. I'm glad the mesh optimizers turned out to be useful! Btw, the Linux tool "oprofile" is mentioned later in the thread, but I find the newer tool "perf" to be more useful. Tim _______________________________________________ osg-users mailing list [email protected] http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

