HI Mathias, Sorry for the slow review, been busy with VPB and volume rendering these last two weeks, just niw surfacing for air...
I've reviewing all your changes and they look sounds and are almost all merged as is. The use of include<limits> and associated does make me a little concerned about compile under MipsPro as a vaguely recall problems with some elements of the std library, I've merged regardless of this concern as you are one of the best placed people to test this out as you are amoungst the few that retain IRIX boxes. So could you specific test this. I didn't merge a couple of refactorings from Matrix*Matrx*Matrix usage to the new optimized post/mult methods in the a couple of these as the changes produced an awkward mix of orderings that make the code less readable, but these are just example with the op being doing once per frame so it is a little consequence to overall performance - here clarity is more important. I also change a couple of instance of 0 and 1 to 0.0 and 1.0 respectively to avoid VS compilers complaining about using an int in code that has double maths in it. I run all the examples and so far they all seem to be working just fine, so fingers crossed no new bugs have been introduced, we'll the community to test it out for a while before we can be sure. Many thanks for these optimizations, Robert. On Wed, Sep 3, 2008 at 12:51 PM, Mathias Fröhlich <[EMAIL PROTECTED]> wrote: > > Hi Robert, > > I have attached a change to the matrix multiply stuff. > This is a generic optimization that does not depend on any cpu or instruction > set. > > The optimization is based on the observation that matrix matrix multiplication > with a dense matrix 4x4 is 4^3 Operations whereas multiplication with a > transform, or scale matrix is only 4^2 operations. Which is a gain of a > *FACTOR*4* for these special cases. > The change implements these special cases, provides a unit test for these > implementation and converts uses of the expensiver dense matrix matrix > routine with the specialized versions. > > Depending on the transform nodes in the scenegraph this change gives a > noticable improovement. > For example the osgforest code using the MatrixTransform is about 20% slower > than the same codepath using the PositionAttitudeTransform instead of the > MatrixTransform with this patch applied. > > If I remember right, the sse type optimizations did *not* provide a factor 4 > improovement. Also these changes are totally independent of any cpu or > instruction set architecture. So I would prefer to have this current kind of > change instead of some hand coded and cpu dependent assembly stuff. If we > need that hand tuned stuff, these can go on top of this changes which must > provide than hand optimized additional variants for the specialized versions > to give a even better result in the end. > > An other change included here is a change to rotation matrix from quaterion > code. There is a sqrt call which couold be optimized away. Since we divide in > effect by sqrt(length)*sqrt(length) which is just length ... > > The change is based on rev 8828. > > Greetings > > Mathias > > -- > Dr. Mathias Fröhlich, science + computing ag, Software Solutions > Hagellocher Weg 71-75, D-72070 Tuebingen, Germany > Phone: +49 7071 9457-268, Fax: +49 7071 9457-511 > -- > Vorstand/Board of Management: > Dr. Bernd Finkbeiner, Dr. Florian Geyer, > Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Prof. Dr. Hanns Ruder > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > > > _______________________________________________ > osg-submissions mailing list > [email protected] > http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org > > _______________________________________________ osg-submissions mailing list [email protected] http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org
