> After doing some benchmarks, I found out that the OpenGL performance
> is not too bad compared to Windows : about 25 % slower on the
> Tirtanium benchmark when removing the X11 critical section protection,
> 50 % slower with it.

I don't think a comparison with Windows is that technically relevant,
since the graphic cards drivers are completly different.

Of course for real life and marketing reasons it matter so of course
we should try to optimize it, if possible.
 
> Now, I think most of the remaining FPS are lost in the CDECL ->
> STDCALL conversion of all the OpenGL routines. I looked at the code
> GCC generated for the OpenGL code and it's not really efficient : it
> 'pops' all the arguments in registers and then pushes them again for
> the calling of the CDECL function...

Not very suprising.
 
> So I had two ideas to optimize this, based on what Marcus did for
> elf.c (as I am not an x86 ASM guru, I wouls have some difficulties
> doing any of my proposals, but well :-) ) :

I can do it. In fact I already have working code that does CDECL to
STDCALL translation, that I implemented for compiling Wine with
Solaris C that doesn't support STDCALL.
 
I will look at it in more detail tonight, if I find time,
but I can give some quick comments now.

>  1) instead of generating C code for the conversion (as in
>     opengl_norm.c), generate some ASM in-line to do it as fast as
>     possible. The problem with this is how to get the address of the
>     'destination' function to put in the ASM... 

I'm not sure exacly what you mean by how to get the address of the
destination function, since you have "static" thunks,
but I think I have solved that problem for the Solaris C
thunking that is dynamic.

>  2) the other possibility would have been to modify 'build' to have a
>     new keyword for function that are 'synonyms' between Windows and
>     Linux with only the calling convertion that changes. OpenGL's spec
>     file would look like this :
> 
> @  stdcall wglUseFontOutlines(long long long long long long 
> long) wglUseFontOutlines
> @  synonym glClearIndex(long )
> 
>     For these functions, when GetProcAddress is called (or the
>     equivalent for 'static' linking) a code equivalent to the one in
>     'elf.c' would be generated.
> 
>     This would greatly simplify the OpenGL code (no more
>     auto-generated opengl_norm.c and opengl_ext.c files).

Perhaps, I will comment of this later.

>     I think the only real problem remaining would be how to generate
>     this ASM code to be at the same time efficient and thread-safe (I
>     thought a bit about it, and it seems non-trivial).

Thread safety is a problem, however I think I have solved that.
What I did was to reshuffle the stack and allocate space _before_
the arguments. Not that efficient, but it is optimized assembler and
likely more efficent than what GNU C does.

I have attached some code that might intrest you.

thunk32.tar

Reply via email to