Hi, On Friday 19 September 2008 18:40, James Killian wrote: > Ok in short the new optimization you submitted now (i.e. without SSE) is > 10-20% better than what we had (i.e. your initial optimization with my SSE > optimization). Back then the Matrix Mult optimization had more of a > significant impact. That is expected. You need to optimize the additional routines also if you want to have them sse optimized. Anyway I would prefere generic optimizations in any case ... Also newer gcc's do a pretty good job in vectorization in many cases. The next time I find time to do such things, I will look at which hint is missing for gcc to omit these opportunities ...
> My conclusion is that your new submission is good enough to not need to use > SSE for the Matrix Mult. I have not been working on CPU specific > handtuning, since the primary bottleneck for our code is due to thread > synchronization. Currently I am learning some new thread synchronization > strategies (e.g. APC calls during context switch)... once I have grasped > how these work I may want to get together with the author of the atomic > code in OpenThreads. Good to see that a generic optimization is sufficient. That is me with the atiomics. What do you want to do? > In regards to performance: > One thing I propose is that OSG team dedicate some research into new > parallelization strategies. See http://www.threadingbuildingblocks.org/ as > this covers some of the concepts. We have been able to make templated > classes that are a bit more friendlier than what is written in TBB > (Threaded Building Blocks). I feel that as time progresses it will be > common for people to have multiple processors in their machines, so > becoming proficient in threaded code will be a must, and with these new > helper classes, it should make writing threaded code simple, and error > proof (e.g. eliminating the need to use critical sections). Well, osg's threading support is really good. I cannot see too much additional improovements for what osg's side of the coin can do. Use the multithreaded viewer code, build up a good scenegraph, keep your update stage short, since this will serialize your application with the viewer and let it run. This *will* run as fast as it can. Beside that, that intel stuff is neat, but does not cover much more than the basic ideas *everybody* programming threads has. Nothing new there ... Greetings and thanks Mathias -- Dr. Mathias Fröhlich, science + computing ag, Software Solutions Hagellocher Weg 71-75, D-72070 Tuebingen, Germany Phone: +49 7071 9457-268, Fax: +49 7071 9457-511 -- Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Florian Geyer, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Prof. Dr. Hanns Ruder Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ osg-submissions mailing list [email protected] http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org
