Hi All,
Regarding question 2:
Wouldn't it be possible to dynamically link different versions of the
OSG-DLLs?
So there would be two Version of the DLLs, one with the
SSE-Optimizations and one with the straightforward code.
I've seen examples of games some years ago, where they linked different
Versions of DLLs depending on the machine the program was run on.
cheers
Sebastian
Dear All,
There's a discussion going on at the moment over in osg-submissions,
and it has been raised that this ought to be opened up to the
non-submissions community for feedback. Note that the following is my
reading of the issues, and certainly doesn't represent the consensus
view of the osg-submissions crowd, so feel free to challenge what I'm
saying!
*Background*
Several people already use SSE instructions
(http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) alongside OSG
to obtain speed improvements through parallelising math operations.
The general point that has been raised is that under-the-hood, OSG
does quite a lot that could benefit from the potential performance
boost given by SSE operations. Obvious targets include some of the
Vec/Matrix routines, for example. SSE is now sufficiently mainstream
that the risk of processor incompatibility is felt to be low.
*Question 1 : Where could the core OSG include SSE?*
Most people follow the sensible approach of profiling to determine
their bottlenecks, and then optimising particular methods in order to
gain speed-up. This would be a sensible approach to follow, as SSEing
all methods would probably be a waste of effort. It would therefore
be instructive firstly to know if anybody is using SSE with OSG, and
where. Secondly, for those who have profiling data and know how much
time they spend in Vec/Matrix/whatever methods, it would be useful to
know which methods the community considered good targets for SSEing.
Any other maths "heavy lifting" going on? (e.g. Intersection testing?
Delauney triangulation? etc.)
*Question 2 : How could the core OSG include SSE?*
SSE code benefits from aligned data. Hence there are several ways in
which OSG could include SSE:
a) Provide an aligned Vec4f and aligned Matrix4f class, which support
SSE operations. This would appear (to me) to be the least intrusive.
b) Provide branching code within the existing Vec4/Matrix4 methods for
detecting whether data is aligned, and performing the correct
operations. This would appear to me to be the most user-transparent.
Although it would appear to be a performance hit, testing so far on
some specific code would support the argument that the speed gains
from SSE outweigh the branch cost; more testing needed, I guess.
c) Robert suggested that SSE enabled array operators (e.g. providing a
cross-product operator for Vec3Array) might be appropriate and provide
the best speed improvement for those who want it. Certainly using SSE
on large array type data sets is where one gains the most performance
improvement.
This question includes the possibility of linking out to, or pulling
source code our of, an external optimised math library.
Any other suggestions?
*Question 3 : (possibly the biggest) Should the core OSG include SSE?*
There are several downsides to including SSE. Firstly, x-platform
provision of SSE may be tricky due to the way different compilers
define aligned data, and how SSE instructions are used within the
code. I personally don't have much experience here, so any feedback on
x-plaform issues is useful.
Secondly, the code readability drops, and the "use the source"
argument may be trickier when many might not know much SSE.
So - your opinion, experience and suggestions welcome!
David
------------------------------------------------------------------------
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org