On Wed, Jan 25, 2012 at 10:59 PM, Jason Daly <[email protected]> wrote:
>
> Hi, all,
>
> This is a general request to the community for some advice and expertise.
>  This is a bit lengthy, but if you can spare a few minutes to look over this
> and send along your thoughts, we would really appreciate it.
>
> I've been working on a project for NIST recently.  For background, you might
> find this osg-users thread from December 2010 useful:
>
> http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/63954/focus=64014
>
>
> Briefly, an OSG-based test application loads a scene and displays it in a
> window on one or more screens (Single Viewer, multiple slave cameras, one
> GPU and context per screen).  The problem was that a single screen would
> draw the scene at a given frame rate, but as additional screens were added,
> the frame rate would drop significantly (427 fps on one screen, 396 fps on
> two, 291 on four).
...
> At this point, I started looking for something else to blame.  Examining the
> data set itself, I discovered that it was composed of about 5500 triangle
> strips, none of which were longer than 112 vertices (the data set had about
> 600,000 vertices total).  There were only about 10 different StateSets in
> the scene, so state changes aren't a problem.  After some digging, I found
> the MeshOptimizers portion of osgUtil::Optimizer, and based on a message I
> found from Jean-Sebastian, I tried a pass of VERTEX_PRETRANSFORM |
> INDEX_MESH | VERTEX_POSTTRANSFORM, followed by another pass of MERGE_GEODES
> | MERGE_GEOMETRY.  This reduced the number of draw calls from around 5500 to
> 9, and completely eliminated the scalability problem for both the OSG test
> program, and the pure OpenGL program.  This leads me to believe that bus
> contention was causing the lack of scalability.  As more screens were added,
> the thousands of draw calls required by the unoptimized data set couldn't
> fit within the bus bandwidth, effectively causing the draw calls to take
> longer.  The unoptimized data, only requiring 9 draw calls per screen, could
> easily fit.
>
That is what I found at the time of that thread, testing with some
models supplied by NIST. Newer hardware is highly optimized for
massive geometry.

For the record, I don't think the problem is bus contention, but lock
contention in the NVidia driver. The amount of bus traffic involved
in the OpenGL command stream is tiny.

> So, here are our questions.  Does it make sense that bus contention would be
> causing the lack of scalability?  Are the mesh optimizations mentioned above
> the most effective way to solve the problem?  Are there any cases where the
> mesh optimizations wouldn't be sufficient, and additional steps would need
> to be taken (I briefly mentioned state changes above, which could be
> problematic, anything else)?  Why doesn't the 64x data set seem to scale as
> well as the 1x and 8x data sets (does this indicate that the bottleneck has
> moved from the bus to somewhere else)?
>
> Any thoughts on these issues or other thoughts you could provide would be
> very valuable.

The key to OpenGL performance in the 21st century is reducing the
number of OpenGL function calls to a minimum... so says Captain
Obvious.

I'm glad the mesh optimizers turned out to be useful!

Btw, the Linux tool "oprofile" is mentioned later in the thread, but I
find the newer tool "perf" to be more useful.

Tim
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to