Hi, On Friday, January 25, 2013 11:29:22 Glenn Waldron wrote: > This is a great topic and I am very interested to learn the facts about > this. > > In osgEarth using display lists (on my NVIDIA GTX275) yields a sizable > performance boost over VBOs in many scenarios. In some of my tests I've > seen a 50%+ reduction in DRAW and GPU times. > > But there are strange artifacts that seem to be related to the use of > display lists with shaders that prevent me from making DL's the default. > Driver bugs perhaps? All I can find is speculation and guessing when it > comes to using shaders wtih DLs. > > Same goes for performance comparisons. Some swear you can always get the > same or better performance using a VBO, but it's hard to track down the > "best practices" for doing so. Some say NVIDIA drivers have a special code > path that speeds up DL's compared to VBOs. Some say it's all about the > number of size of your primitive sets. > > Obviously VBOs are the future since DLs are deprecated. So assembling some > best practices for their application is critical!
Hmm, depends on the use I think. I have seen everything from massive improovements to massive drops. Well, I think the IMO well known optimizations still apply. Use medium sized draws which do not use extreme sized vbos. Avoid polygons and strips unless you have primitive restart as each polygon or strip ends up being a seperate draw. If you have primitive restart prefere strips for the good old reasons that were already a valid optimization reason on a SGI. Now you can start a long series of arguments what medium means. There are at least two concurrent needs for the buffers size. For the draw setup in the driver the huger the draw is the better it is for the driver. As a rule of thumb, each gl*Draw* call needs some more or less fixed amount of cpu time to push this draw command into the fifo interpreted by the gpu. That means less draw calls means less cpu consumption. But, there is memory management involved which starts to make this way more unpredictable. There you need to distinguish between gpus having virtual memory addresses and older ones that have not. Those without suffer a lof from memory fragmentation. The huger the buffers are the more this happens. But even if you have a newer gpu with virtual memory space, I think all of you is aware that getting memory management right in every use case is hard to impossible. Given that kind of algorithms you are also back with cpu time wasted for puzzeling the memory blobs into limitted gpu accessible memory. So in the end you really help the driver if you do not just max out the buffer size. In the end this really depends on the application in combination with the gpu. So, for sensible models that are there to display things just at the level of detail that you can resolve with the display/eye, I still tend to stick with the less then 65000 vertices limit per draw. That uses less memory with the indices as you can use shorts there. And if you have a model that is just about the border to be displayed fine vs edgy so that it still looks good, you already have very huge geometries to show which end up in a lot of fragments which end up in being more likely being fragment bound that anything at the top of the pipeline like cpu time to setup the draw. But this could be different for CAD like models that you see in engineering applications. There you can easily have millions of vertices for a small part. If you do this with small buffers you can already hit the draw setup bottleneck. There it might make sense to have bigger bunches. So, nothing hard to beat on it, but effects that I believe to observe over time, different drivers and hardware... Regarding vbo's in osg I think that neither using vbo's nor dlists is definitely better. It's more that they are different. Where you hit on one dlist compiler a codepath that ends up with a lot of draws in the driver that you could have anticipated way better by your application code, you will also see a dlist compiler on a different driver/gpu/application combination where you can't see how you need to issue the draws to reach the performance of the dlist. The same goes for the memory management stuff. I assume that most applications are currently optimized for the default. And if this default changes you will see optimizations move over to the new default over time. Greetings Mathias _______________________________________________ osg-users mailing list [email protected] http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

