I heard that the Intel C++ compiler is able to optimize even better.
Furthermore the use of profiling first is a good approach. Maybe it would
be
reasonable to compare profiling data of the Math/Vector/Matrix classes
with
and without compiler optimizations and see if some bottlenecks disappear
when
using the optimizations.
I 100% agree with that as that is the first thing I did. For the matrixf
mult I got 50% improvement with aligned data and 35% with unaligned. For
the Invert4x4 I got 80% improvement with aligned and 70% aligned with
unaligned. I've submitted this code in as it was the most time spent in the
profiles of our game.
While I am here I think whatever we do we should have CMake have an option
to compile using SSE, and provide alternative c code for those who do not
want it. Actually, one of the techniques we use at work we handled the case
during when SSE2 was only available to some machines, where we wrote the
main loop to do the bulk of the work and the remainder loop do finish the
work in c code. We could then macro out the main loop for those who didn't
have SSE2 as it fell to the remainder code which then did the entire loop.
I believe the time has passed to make SSE and SSE2 distinction, so either
someone can support SSE2, or they use the c code version. It should be
implied that people who write SSE/SSE2 have tested against the c code and
have seen a significant gain in performance before considering to use.
James Killian
----- Original Message -----
From: "Benjamin Eikel" <[EMAIL PROTECTED]>
To: "OpenSceneGraph Users" <[email protected]>
Sent: Tuesday, July 29, 2008 7:28 AM
Subject: Re: [osg-users] Using SSE within OSG
Am Dienstag, 29. Juli 2008 14:04:59 schrieb David Spilling:
Dear All,
[...]
Any other suggestions?
*Question 3 : (possibly the biggest) Should the core OSG include SSE?*
There are several downsides to including SSE. Firstly, x-platform
provision
of SSE may be tricky due to the way different compilers define aligned
data, and how SSE instructions are used within the code. I personally
don't
have much experience here, so any feedback on x-plaform issues is useful.
Secondly, the code readability drops, and the "use the source" argument
may
be trickier when many might not know much SSE.
Hello David,
may I suggest that you check the assembler code that the compilers create
when
compiling the OSG code? I have not done it for the OSG code, but for
another
project I have done some time ago. There I tried to optimize the
performance
for composing depth-buffer attached images for sort-last rendering.
Somehow I
was not able to be much better than the compiler was. In some rare cases
my
procedures were faster, but most of the time the compiler was the winner.
The
code created by the compilers consider so many things - e. g. branch
prediction by the processer, code reordering - that it is quite hard for a
human programmer to beat them.
For example if you use g++ with -march=core2 -O3 (see man page for
description
of parameters) the compiler automatically uses SSE or even SSE2, 3dNOW!,
etc.
instructions. In some cases the compiler generates much better assembler
code
than a normal programmer would do. There are some case though were manual
improvements could yield better results.
I heard that the Intel C++ compiler is able to optimize even better.
Furthermore the use of profiling first is a good approach. Maybe it would
be
reasonable to compare profiling data of the Math/Vector/Matrix classes
with
and without compiler optimizations and see if some bottlenecks disappear
when
using the optimizations.
Regards,
Benjamin
So - your opinion, experience and suggestions welcome!
David
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org