Hi all,
I was curious and tried the modifications with the trunk version of osg with
our viewer on a few of our biggest scene and there was no measurable
performance difference.
By chance I have profiled our viewer with CodeXL a few days ago on a build in
RelWithDebInfo mode. This was done on a almost static scene without moving the
camera. Because I have an Intel CPU I could only do "time-base sampling"
profiling and I don't know how accurate this is but for your information, the
top hottest functions accounting for 25% of cpu time were (this only shows the
time spend in the function and not in the functions that are called (and not
inlined) from these):
Code:
Function,Samples, % of Hotspot Samples, Module
osg::Group::traverse(class osg::NodeVisitor &), 3437, 6.19692%, osg112-osgrd.dll
OpenThreads::Atomic::operator--(void), 2847, 5.13315%, ot20-OpenThreadsrd.dll
OpenThreads::Atomic::operator++(void), 2837, 5.11512%, ot20-OpenThreadsrd.dll
osg::Plane::transformProvidingInverse(class osg::Matrixd const &), 2595,
4.67879%, osg112-osgrd.dll
osgUtil::StateGraph::find_or_insert(class osg::StateSet const *), 1918,
3.45816%, osg112-osgUtilrd.dll
The rundown from find_or_insert:
Code:
Line, Address, Source Code, Code Bytes, Hotspot Samples, % of Hotspot Samples,
Timer
175, , inline StateGraph* find_or_insert(const osg::StateSet*
stateset), , , , ,
176, 0x7feed81afb0, {, , 214, 11.1575, 214,
177, , // search for the appropriate state group, return it if
found., , , , ,
178, 0x7feed81afcb, ChildList::iterator itr =
_children.find(stateset);, , 1658, 86.4442, 1658,
179, 0x7feed81b016, if (itr!=_children.end()) return
itr->second.get();, , 42, 2.18978, 42,
180, , , , , , ,
181, , // create a state group and insert it into the children
list, , , , ,
182, , // then return the state group., , , , ,
183, 0x7feed81b021, StateGraph* sg = new
StateGraph(this,stateset);, , , , ,
184, 0x7feed81b04c, _children[stateset] = sg;, , , , ,
185, 0x7feed81b081, return sg;, , , , ,
186, 0x7feed81b084, }, , 4, 0.208551, 4,
osg::Util::StateGraph::prune and moveStateGraph are somewhere all the way down
in the list with only 0.2% of samples measured.
Note that in the osg trunk a Geode is a Group and we have a lot of Geodes with
one drawable in our scene (we need to be able to move all the objects in
realtime) that is not yet optimized out because we use the stable version of
osg for our clients. So thats why the Group::traverse function is on top.
I think that the ++ and -- atomic operators come from all the ref matrices that
are pushed/popped at stack traversal but I'm not sure. CodeXL does not want to
show the callstack for those.
Cheers,
Pjotr
------------------
Read this topic online here:
http://forum.openscenegraph.org/viewtopic.php?p=60098#60098
_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org