Hi,

On Friday 19 September 2008 18:40, James Killian wrote:
> Ok in short the new optimization you submitted now (i.e. without SSE) is
> 10-20% better than what we had (i.e. your initial optimization with my SSE
> optimization).  Back then the Matrix Mult optimization had more of a
> significant impact.
That is expected. You need to optimize the additional routines also if you 
want to have them sse optimized.
Anyway I would prefere generic optimizations in any case ...
Also newer gcc's do a pretty good job in vectorization in many cases.
The next time I find time to do such things, I will look at which hint is 
missing for gcc to omit these opportunities ...

> My conclusion is that your new submission is good enough to not need to use
> SSE for the Matrix Mult.  I have not been working on CPU specific
> handtuning, since the primary bottleneck for our code is due to thread
> synchronization.  Currently I am learning some new thread synchronization
> strategies (e.g. APC calls during context switch)... once I have grasped
> how these work I may want to get together with the author of the atomic
> code in OpenThreads.
Good to see that a generic optimization is sufficient.

That is me with the atiomics.
What do you want to do?

> In regards to performance:
> One thing I propose is that OSG team dedicate some research into new
> parallelization strategies.  See http://www.threadingbuildingblocks.org/ as
> this covers some of the concepts.  We have been able to make templated
> classes that are a bit more friendlier than what is written in TBB
> (Threaded Building Blocks).  I feel that as time progresses it will be
> common for people to have multiple processors in their machines, so
> becoming proficient in threaded code will be a must, and with these new
> helper classes, it should make writing threaded code simple, and error
> proof (e.g. eliminating the need to use critical sections).
Well, osg's threading support is really good. I cannot see too much additional 
improovements for what osg's side of the coin can do.
Use the multithreaded viewer code, build up a good scenegraph, keep your 
update stage short, since this will serialize your application with the 
viewer and let it run.
This *will* run as fast as it can.
Beside that, that intel stuff is neat, but does not cover much more than the 
basic ideas *everybody* programming threads has. Nothing new there ...

Greetings and thanks

Mathias

-- 
Dr. Mathias Fröhlich, science + computing ag, Software Solutions
Hagellocher Weg 71-75, D-72070 Tuebingen, Germany
Phone: +49 7071 9457-268, Fax: +49 7071 9457-511
-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196 


_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org

Reply via email to