Hi,

On Friday, January 25, 2013 11:29:22 Glenn Waldron wrote:
> This is a great topic and I am very interested to learn the facts about
> this.
> 
> In osgEarth using display lists (on my NVIDIA GTX275) yields a sizable
> performance boost over VBOs in many scenarios. In some of my tests I've
> seen a 50%+ reduction in DRAW and GPU times.
> 
> But there are strange artifacts that seem to be related to the use of
> display lists with shaders that prevent me from making DL's the default.
> Driver bugs perhaps? All I can find is speculation and guessing when it
> comes to using shaders wtih DLs.
> 
> Same goes for performance comparisons. Some swear you can always get the
> same or better performance using a VBO, but it's hard to track down the
> "best practices" for doing so. Some say NVIDIA drivers have a special code
> path that speeds up DL's compared to VBOs. Some say it's all about the
> number of size of your primitive sets.
> 
> Obviously VBOs are the future since DLs are deprecated. So assembling some
> best practices for their application is critical!

Hmm, depends on the use I think. I have seen everything from massive 
improovements to massive drops.

Well, I think the IMO well known optimizations still apply.
Use medium sized draws which do not use extreme sized vbos.

Avoid polygons and strips unless you have primitive restart as each polygon or 
strip ends up being a seperate draw.
If you have primitive restart prefere strips for the good old reasons that 
were already a valid optimization reason on a SGI.

Now you can start a long series of arguments what medium means.
There are at least two concurrent needs for the buffers size. For the draw 
setup in the driver the huger the draw is the better it is for the driver. As 
a rule of thumb, each gl*Draw* call needs some more or less fixed amount of cpu 
time to push this draw command into the fifo interpreted by the gpu. That means 
less draw calls means less cpu consumption.
But, there is memory management involved which starts to make this way more 
unpredictable. There you need to distinguish between gpus having virtual 
memory addresses and older ones that have not. Those without suffer a lof from 
memory fragmentation. The huger the buffers are the more this happens. But even 
if you have a newer gpu with virtual memory space, I think all of you is aware 
that getting memory management right in every use case is hard to impossible. 
Given that kind of algorithms you are also back with cpu time wasted for 
puzzeling the memory blobs into limitted gpu accessible memory. So in the end 
you really help the driver if you do not just max out the buffer size.

In the end this really depends on the application in combination with the gpu.

So, for sensible models that are there to display things just at the level of 
detail that you can resolve with the display/eye, I still tend to stick with 
the less then 65000 vertices limit per draw. That uses less memory with the 
indices as you can use shorts there. And if you have a model that is just 
about the border to be displayed fine vs edgy so that it still looks good, you 
already have very huge geometries to show which end up in a lot of fragments 
which end up in being more likely being fragment bound that anything at the 
top of the pipeline like cpu time to setup the draw.

But this could be different for CAD like models that you see in engineering 
applications. There you can easily have millions of vertices for a small part. 
If you do this with small buffers you can already hit the draw setup 
bottleneck. There it might make sense to have bigger bunches.

So, nothing hard to beat on it, but effects that I believe to observe over 
time, different drivers and hardware...

Regarding vbo's in osg I think that neither using vbo's nor dlists is 
definitely better. It's more that they are different.
Where you hit on one dlist compiler a codepath that ends up with a lot of 
draws in the driver that you could have anticipated way better by your 
application code, you will also see a dlist compiler on a different 
driver/gpu/application combination where you can't see how you need to issue 
the draws to reach the performance of the dlist. The same goes for the memory 
management stuff.

I assume that most applications are currently optimized for the default. And 
if this default changes you will see optimizations move over to the new 
default over time.

Greetings

Mathias
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to