This isn't directly OSG, but since almost all of OSG's platforms at leave have an X86 variant, I thought it might be of interest.

Reading AMD's optimization guide (which addresses 32-bit and 64-bit optimizations, as well as optimizations that are generally applicable to all X86 CPUs, not just AMD):

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

  Chapter 5 deals with memory and cache.

5.2 recommends aligning data members on their natural alignments. Dynamic memory allocations typically are already aligned on a known boundary -- accounts seem to disagree whether this is 32-bit (long/float) or 64-bit (longlong/double) alignment. It may be OS-dependent.

5.5 suggests that misalignment can cause the Store-to-Load forwarding mechanism to be ineffective -- which is one of the main cures for the X86-32 CPU's terrible shortage of registers.

5.11 suggests reordering structs/classes by the size of their atomic members -- doubles, then floats/longs, shorts, bytes to avert this misalignment (using padding where necessary).


Has anyone gone this route? Using AMD's CodeAnalyst tool for Windows, one would come to the conclusion that a lot of time is spent in CPU pipeline stalls. Is this an effective code optimization, or is it a lot of work for very little benefit on a codebase the size of OSG?

--
Chris 'Xenon' Hanson aka Eric Hammil | http://www.3DNature.com/ eric at logrus
 "I set the wheels in motion, turn up all the machines, activate the programs,
  and run behind the scenes. I set the clouds in motion, turn up light and 
sound,
  activate the window, and watch the world go 'round." -Prime Mover, Rush.
_______________________________________________
osg-users mailing list
[email protected]
http://openscenegraph.net/mailman/listinfo/osg-users
http://www.openscenegraph.org/

Reply via email to