Gerrit Voss schrieb: > Just a quick update, I ran some more test, same compiler options, > VS2005: > > original opensg: > > Ratio: 1.32355 > > > hand coded Vec3f class (with __forceinline): > > Ratio: 1.29539 > > > gmtl (even more templates) : > > Ratio: 0.992 - 1.0003
I did some more tests and the problem is not the inlining (in fact, __forceinline doesn´t inline as well if /EHsc is specified for example). As Andreas stated, on an AMD there seems to be no difference while there is a large difference on Core2Duo and Xeon P4 Systems (I no have tested 3 different systems, and always the openSG Version is slower by a factor of 1.5-1.9). However, the problem seems to be the loop: Visual studio doesn´t seem to unroll it and therefore it obviously is slower on Intel Machine while the AMD system doesn´t seem to have the same issues. Also, enabling and disabling SSE Code generation makes a huge difference. For some reason, SSE Code generation slows the OpenSG code down even more while the SSE Code for the manual unrolled loop is pretty good already. The hand coded Vec3f class seems to have the same problem as mine: Visual Studio doesn´t make a good inline. It still uses temporaries although it doesn´t need to. The gmtl result sounds interesting although I thing the differences is within tolerance and it will be exactly as fast as the manual unrolled code if a more accurate test is used (unless it is more agressive and uses 4-value SSE instead of 1-value - one should really look at the assembly code then). Greetings Michael ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Opensg-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensg-users
