This isn't directly OSG, but since almost all of OSG's platforms at leave have an X86
variant, I thought it might be of interest.
Reading AMD's optimization guide (which addresses 32-bit and 64-bit optimizations, as
well as optimizations that are generally applicable to all X86 CPUs, not just AMD):
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
Chapter 5 deals with memory and cache.
5.2 recommends aligning data members on their natural alignments. Dynamic memory
allocations typically are already aligned on a known boundary -- accounts seem to disagree
whether this is 32-bit (long/float) or 64-bit (longlong/double) alignment. It may be
OS-dependent.
5.5 suggests that misalignment can cause the Store-to-Load forwarding mechanism to be
ineffective -- which is one of the main cures for the X86-32 CPU's terrible shortage of
registers.
5.11 suggests reordering structs/classes by the size of their atomic members --
doubles, then floats/longs, shorts, bytes to avert this misalignment (using padding where
necessary).
Has anyone gone this route? Using AMD's CodeAnalyst tool for Windows, one would come to
the conclusion that a lot of time is spent in CPU pipeline stalls. Is this an effective
code optimization, or is it a lot of work for very little benefit on a codebase the size
of OSG?
--
Chris 'Xenon' Hanson aka Eric Hammil | http://www.3DNature.com/ eric at logrus
"I set the wheels in motion, turn up all the machines, activate the programs,
and run behind the scenes. I set the clouds in motion, turn up light and
sound,
activate the window, and watch the world go 'round." -Prime Mover, Rush.
_______________________________________________
osg-users mailing list
[email protected]
http://openscenegraph.net/mailman/listinfo/osg-users
http://www.openscenegraph.org/