Gerrit Voss schrieb:
> Just a quick update, I ran some more test, same compiler options,
> VS2005:
> 
> original opensg:
> 
> Ratio: 1.32355
> 
> 
> hand coded Vec3f class (with __forceinline):
> 
> Ratio: 1.29539
> 
> 
> gmtl (even more templates) :
> 
> Ratio: 0.992 - 1.0003

I did some more tests and the problem is not the inlining (in fact, 
__forceinline doesn´t inline as well if /EHsc is specified for example). 
As Andreas stated, on an AMD there seems to be no difference while there 
is a large difference on Core2Duo and Xeon P4 Systems (I no have tested 
3 different systems, and always the openSG Version is slower by a factor 
of 1.5-1.9).
However, the problem seems to be the loop: Visual studio doesn´t seem to 
unroll it and therefore it obviously is slower on Intel Machine while 
the AMD system doesn´t seem to have the same issues.
Also, enabling and disabling SSE Code generation makes a huge 
difference. For some reason, SSE Code generation slows the OpenSG code 
down even more while the SSE Code for the manual unrolled loop is pretty 
good already. The hand coded Vec3f class seems to have the same problem 
as mine: Visual Studio doesn´t make a good inline. It still uses 
temporaries although it doesn´t need to.

The gmtl result sounds interesting although I thing the differences is 
within tolerance and it will be exactly as fast as the manual unrolled 
code if a more accurate test is used (unless it is more agressive and 
uses 4-value SSE instead of 1-value - one should really look at the 
assembly code then).

Greetings

Michael



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Opensg-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensg-users

Reply via email to