Re: [osg-users] OSG thread profiling results are in!!

Robert Osfield Sat, 28 Jun 2008 08:26:53 -0700

Hi Rick,

Sharing state is essential to good performance, and even more critical
when you start approaching memory limits.  You'll need to share
osg::Texture(s) rather than just osg::Image(s) to get the benefit.
Sharing complete osg::StateSet is the most efficient, for cull, draw
dispatch into the OpenGL fifo (the OSG's draw traversal) and draw down
on the GPU.


If you are hitting memory limits on the graphics card, beyond sharing
of Textures/StateSet, you could also look at using non power of two
textures, and using compressed texture formats as these can stay
compressed on the graphics card.  Scaling your texture sizes to fit to
your hardware limits.

My general guide would be to get your app running at a solid frame
rate (equal to your monitors refresh rate), typically this will be
something like 75Hz on modern displays, to hit this you might need to
be more conservative about just how much eye candy you are throwing at
the system i.e. texture sizes/effects etc.  Once you've got your solid
frame rate on a given hardware then look at what you can add without
breaking frame.  These days I see little excuse for not hitting a
solid 60+Hz for modern graphics apps, unless you have an app doing
something hard for the graphics hardware like volume rendering, or a
CAD app with millions of polygons in the scene.  If you aren't hitting
a solid frame rate then something's up and you need to address it.

There is *huge* number of things you can do to make graphics go more
efficiently, one can scratch the surface it in a couple of emails.

Robert.

On Sat, Jun 28, 2008 at 4:05 PM,  <[EMAIL PROTECTED]> wrote:
> Thanks Robert,
>
> (James and I are working on this together.  He has decided to focus his
> attentions on understanding OpenSceneGraph and optimizing where he can,
> where most of the client game code dealing with the scene graph is mine). I
> was planning on asking some of these questions eventually anyway, but now
> seems like a good time ;)
>
> I think a huge part of what we are seeing is that we are relying very
> heavily on image maps for everything, and that most graphics cards have to
> swap memory.  That said, I know that I need to work on my dependence on
> these image maps and look harder at resizing them where possible.  Are there
> any other recommendations out there for more effective image map
> utilization?  One thing I know I want to apply is LOD, which I have not done
> yet.  In looking over the example code, the LOD is pretty straight forward.
>
> All our ships use UV mapping, and I was trying to make it so that the image
> was only loaded once for the ship type, rather than for each instance.  It
> did not seem that this made much of a difference, however.  I have started
> to look at the osgImposter example for help in how I might better handle
> this.  AM I going in the right direction?
>
> I have lots more questions, but I figure I will ask them as I get to them
> and I am able to dig in myself.
> Thanks again for all the great support,
> -- Rick
>
> On Sat, Jun 28, 2008 at 5:22 AM, Robert Osfield <[EMAIL PROTECTED]>
> wrote:
>>
>> Hi James,
>>
>> I've read your emails but I'm afraid the stats mean absolutely nothing
>> do me.  One will really need to be find out what parts of the OSG i.e.
>> what function calls are the current bottleneck.
>>
>> As general note, performance optimization with scene graphs as almost
>> always an issue of improving the balance of the scene graph, be it
>> update, cull, draw dispatch or draw GPU, it's almost always a poor
>> scene graph that is at fault.  You can often improve performance by
>> 10x and more by simply fixing the scene graph.  Doing low level code
>> optimization will rarely get you anything like the performance
>> improvement that you'd get by just fixing the scene graph.
>>
>> Given this, diving into low level profiling could well be a case of
>> not seeing the wood from the trees.   So I'd recommend if you want
>> your app to go faster start with the basics, are your CPU or GPU
>> limited.  Then are you update, cull or draw dispatch limited?  Then
>> depending upon what results you get consider why the scene graph
>> itself is making things so slow.  This process will typically lead you
>> to things you can do to your scene graph to fix the performance
>> bottleneck, and all this without touching the actual code.
>> Performance optimization is huge topic, but hopefully I'll have given
>> you a little pointer to priorities I'd apply.
>>
>> Robert.
>>
>> On Sat, Jun 28, 2008 at 4:12 AM, James Killian
>> <[EMAIL PROTECTED]> wrote:
>> >
>> > Here are some interesting profile results from the threaded profiler.
>> > First here is the ground work:
>> > OSG SVN 8482 using VS 7.1 with threading enabled (interlocked config).
>> >  The
>> > actual client code tested that pushes some stress on osg is our game
>> > which
>> > anyone can download here http://www.fringe-online.com/.    So I run this
>> > and
>> > measure the thread performance using Intel's thread compiler.  So far,
>> > our
>> > client code main loop is very similar to how it is in the osg viewer (no
>> > fancy optimizations).
>> >
>> > There are 2 machines I have tested now... I'll post a copy of a
>> > different
>> > message I sent a few days ago here (to keep all info in this thread)
>> >
>> > -----------snip----------------
>> > Robert,
>> > This proposal you mention for 2.6 will it help balance the cpu workload
>> > against the  gpu I/O bottleneck?
>> >
>> > I've been doing some osg performance benchmark research on thread
>> > synchronization using the Intel Threaded compiler, and so far the
>> > results
>> > are looking really good except for a 26% over-utilization due to
>> > sleeping. I
>> > do want to say awesome job to those responsible for threading, the
>> > amount of
>> > critical section use looked very good!  All the worker threads also had
>> > good
>> > profiling results.
>> >
>> > The ultimate test I want to try today deals with an intentional GPU
>> > bottleneck... where I have a quadcore that pipes graphics out a PCI
>> > graphics
>> > card.  If anyone is interested I'll post these test results.  I know now
>> > that using a quad core there is lack of parallelization (e.g. 25% 85%
>> > 15%
>> > 15%), but that is a different battle for a different time.
>> >
>> > I do want to get to the bottom of the profiling and determine how well
>> > the
>> > workload is balanced against the gpu i/o, and see if there is some
>> > opportunity for optimization here.
>> > -----------------snip------------------------
>> >
>> > Today I have the numbers from the souped up machine with a poor poor pci
>> > graphics card.  The first thing to note is that the game never exceeded
>> > 18%
>> > cpu usage!!  When I profiled 65% of the main thread was devoted to
>> > "serial"
>> > time and the bulk of the cpu time was on *this thread* and
>> > PrintSchedulingInfo [20] thread.  The thread 20 showed 21% contributed
>> > to
>> > blocking, but the rest of it was active.  The rest of the threads (like
>> > with
>> > my machine) looked really good! it is just too bad they don't do much
>> > work.
>> >
>> >
>> > Realistically my machine at work is not typical due to the pci graphics,
>> > but
>> > it did put good stresses to show where the I/O bottle neck is (on the
>> > main
>> > thread).  My machine at home is a dual p 2.4 with NVidia GeForce 5900XT.
>> > When testing other games on my home machine I get great frame rate, so
>> > my
>> > goal will be to osg's performance to something comparable.
>> >
>> > Aside from the threading profiler, I have tested AMD code analyst to
>> > find
>> > the most frequent called code, and for osg 1.2  it turned out to be the
>> > Matrix Multiply.  Aside from that OSG itself took a significant bulk of
>> > the
>> > CPU workload.  This AMD profiler works differently in that it does not
>> > count
>> > sleeping or I/O time, but rather keeps note of the most frequent called.
>> >  At
>> > some point I'll retest for code optimizations, but not yet... the real
>> > gain
>> > now is to balance the CPU rendering against it sending to GPU.  It would
>> > be
>> > great if I can find a solution that can benefit the whole osg community
>> > (all
>> > platforms).
>> >
>> > If anyone has an interest in faster performance using the new osgViewer
>> > please share with me some ideas thanks.   I can track where bottlenecks
>> > are,
>> > but working out a good solution will take some time to learn.  I'll need
>> > to
>> > profile with VS 9 and OpenMP to see if this helps.
>> >
>> >
>> > _______________________________________________
>> > osg-users mailing list
>> > [email protected]
>> >
>> > http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>> >
>> _______________________________________________
>> osg-users mailing list
>> [email protected]
>> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>
>
>
> --
>>> Rick
> Check us out at http://fringe-online.com/
> _______________________________________________
> osg-users mailing list
> [email protected]
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>
>
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Re: [osg-users] OSG thread profiling results are in!!

Reply via email to