Gerrit Voss schrieb:
>
>>> Is the compiler really that bad, I thought they were making progress
>>> and brought in some people who actually know how to write compilers
>>> ;-(
Yes, of course the compiler should do the job. But it doesn´t, and
programming something that should work but doesn´t is programming
without considering reality. Also: What are the other options? GCC on
Windows is a no go since many library binaries are only available in
MSVC compatible library format (ok, you can convert them but I really
don´t like that option since it is one more source for failures). The
Intel Compiler takes ages to compile template-code and from our tests is
about 10% slower for our code than msvc8. And older visual studio
compilers don´t support some things (OMP for example) and mostlikely
won´t work any better.
> inline and __inline in itself yes. But usually there must be a way of
> forcing the compiler to inline.
> Otherwise the compiler is unusable for
> anything that comes even close to high performance. It's a tool so it
> should never ever have a big impact on the design as long as the design
> is within the language specification, especially if other tools prove
> that it is possible. And it should always listen to the programmer who
> hopefully knows what he/she is doing ;-)
Well, there is a way to force inline with msvc8, but it is not used in
OpenSG (means: there is only inline used everywhere, not __forceinline -
a define would be better here).
> not really, comparing the size of a completely (e.g. type, size and
> simd/altivec) unrolled implementation against a templated/specialised
> one I would still say the templated version is smaller. And more
> important if you have to fix a bug you don't have to trace it through n
> unrolled version.
Certainly, the bug thing is nice and templates have their advantages.
But basic math classes are the absolute basic for rendering applications
and losing performance here means you loose performance everywhere.
Also, how likely is it that you will change these classes once they are
up and running? And a Vec3f is NOT the same as a Vec4f, there are pretty
great differences (homogene coordinate!), so you will need to
reimplement most functions anyway since a simple loop will not work. So
with both options: Reimplement everything from scratch with the
possibility to optimize as far as it gets (using SIMD, intelligent
functions doing only the work that is necessary) or Reimplement most of
the class and using templates loosing every chance to do specific
optimizations for certain classes, I would definetly choose the first
option.
>>> Before we start changing code and design, I would like to verify if
>>> this is a MSVC only problem.
>>>If it is I'm tempted to say bad luck, if
>>> they still don't know how to build a proper compiler. I mean they
>>> seem to have managed to add ipo and pgo.
Well, even when it is, you simple don´t have many other options. As I
already said, at least in our case, the Intel Compiler is slower and
takes ages with template code while GCC wouldn´t work at all on windows.
So you are really required to optimize for the msvc compiler.
>>> Anyway usually there should
>>> be way to force the compiler to inline functions, something like
>>> __forceinline.
As already said: __forceinline should therefore used in the OpenSG Math
classes but it isn´t.
> Can anyone supply some concrete examples, e.g which function and the
> calling place. I really would like to verify it.
Here we go, the manual inlined version is about 1.9 times faster than
the not inlined opensg version for me:
// PerformanceTest.cpp : Defines the entry point for the console
application.
//
#include <OpenSG/OSGConfig.h>
#include<OpenSG/OSGVector.h>
#include <stdio.h>
#include <intrin.h>
#include <iostream>
#define SIZE 0xFFFFFF
using namespace osg;
using namespace std;
inline unsigned __int64 getCPUCycles()
{
return __rdtsc();
}
int main( int argc, char* argv[])
{
Pnt3f* pointArray = new Pnt3f[SIZE];
Pnt3f* results = new Pnt3f[SIZE];
for( unsigned int i = 0; i < SIZE; ++i)
{
Real32 value = Real32(i);
pointArray[i].setValue(Pnt3f(value, value, value));
}
unsigned __int64 startCycles2 = getCPUCycles();
for( unsigned int i = 1; i < SIZE; ++i)
{
results[i][0] = pointArray[i][0] + pointArray[i-1][0];
results[i][1] = pointArray[i][1] + pointArray[i-1][1];
results[i][2] = pointArray[i][2] + pointArray[i-1][2];
}
unsigned __int64 endCycles2 = getCPUCycles();
std::cout << "CPU cycles with manual inlining: " << endCycles2 -
startCycles2 << std::endl;
unsigned __int64 startCycles = getCPUCycles();
for( unsigned int i = 1; i < SIZE; ++i)
{
results[i] = pointArray[i] + pointArray[i-1];
}
unsigned __int64 endCycles = getCPUCycles();
std::cout << "CPU cycles With Pointinterface: " << endCycles -
startCycles << std::endl;
std::cout << "Ratio: " << static_cast<float>(endCycles - startCycles) /
static_cast<float>(endCycles2 - startCycles2) << std::endl;
delete[] results;
delete[] pointArray;
return 0;
}
Compiled with:
/O2 /Ob2 /Oi /Ot /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_WINDOWS"
/D "WINVER=0x0400" /D "_WIN32_WINDOWS=0x0410" /D "_WIN32_WINNT=0x0400"
/D "_OSG_HAVE_CONFIGURED_H_" /D "OSG_BUILD_DLL" /D "OSG_WITH_GLUT" /D
"OSG_WITH_GIF" /D "OSG_WITH_TIF" /D "OSG_WITH_JPG" /D "_UNICODE" /D
"UNICODE" /GF /FD /EHsc /MD /GS- /arch:SSE2 /fp:fast /Fo"Release\\"
/Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt
And linked with
/SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /MACHINE:X86 /ERRORREPORT:PROMPT
Greetings
Michael
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Opensg-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensg-users