I heard that the Intel C++ compiler is able to optimize even better.
Furthermore the use of profiling first is a good approach. Maybe it would be reasonable to compare profiling data of the Math/Vector/Matrix classes with and without compiler optimizations and see if some bottlenecks disappear when
using the optimizations.

I 100% agree with that as that is the first thing I did. For the matrixf mult I got 50% improvement with aligned data and 35% with unaligned. For the Invert4x4 I got 80% improvement with aligned and 70% aligned with unaligned. I've submitted this code in as it was the most time spent in the profiles of our game.

While I am here I think whatever we do we should have CMake have an option to compile using SSE, and provide alternative c code for those who do not want it. Actually, one of the techniques we use at work we handled the case during when SSE2 was only available to some machines, where we wrote the main loop to do the bulk of the work and the remainder loop do finish the work in c code. We could then macro out the main loop for those who didn't have SSE2 as it fell to the remainder code which then did the entire loop. I believe the time has passed to make SSE and SSE2 distinction, so either someone can support SSE2, or they use the c code version. It should be implied that people who write SSE/SSE2 have tested against the c code and have seen a significant gain in performance before considering to use.




James Killian
----- Original Message ----- From: "Benjamin Eikel" <[EMAIL PROTECTED]>
To: "OpenSceneGraph Users" <[email protected]>
Sent: Tuesday, July 29, 2008 7:28 AM
Subject: Re: [osg-users] Using SSE within OSG


Am Dienstag, 29. Juli 2008 14:04:59 schrieb David Spilling:
Dear All,
[...]
Any other suggestions?

*Question 3 : (possibly the biggest) Should the core OSG include SSE?*
There are several downsides to including SSE. Firstly, x-platform provision
of SSE may be tricky due to the way different compilers define aligned
data, and how SSE instructions are used within the code. I personally don't
have much experience here, so any feedback on x-plaform issues is useful.

Secondly, the code readability drops, and the "use the source" argument may
be trickier when many might not know much SSE.
Hello David,

may I suggest that you check the assembler code that the compilers create when compiling the OSG code? I have not done it for the OSG code, but for another project I have done some time ago. There I tried to optimize the performance for composing depth-buffer attached images for sort-last rendering. Somehow I was not able to be much better than the compiler was. In some rare cases my procedures were faster, but most of the time the compiler was the winner. The
code created by the compilers consider so many things - e. g. branch
prediction by the processer, code reordering - that it is quite hard for a
human programmer to beat them.
For example if you use g++ with -march=core2 -O3 (see man page for description of parameters) the compiler automatically uses SSE or even SSE2, 3dNOW!, etc. instructions. In some cases the compiler generates much better assembler code
than a normal programmer would do. There are some case though were manual
improvements could yield better results.
I heard that the Intel C++ compiler is able to optimize even better.
Furthermore the use of profiling first is a good approach. Maybe it would be reasonable to compare profiling data of the Math/Vector/Matrix classes with and without compiler optimizations and see if some bottlenecks disappear when
using the optimizations.

Regards,
Benjamin


So - your opinion, experience and suggestions welcome!

David


_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to