> > By my reading even fairly simple iterative implementation should give > > a single precision result in about 4 iterations (2 modified multiply-add > > per iteration, probably 9 or 10 instructions total) > > > That's really interesting. What about integer division, though?
Not sure. Many the chips I deal with just Don't Do That :-) Do we actually need integer division in practice? I'd guess that the data workloads are going to be float based, and control code is almost entirely division by constant, which the compiler can change to fixed-point multiply (i.e. widening multiply+shift). The nVidia programming guides say that "integer division and modulo operations are particularly costly and should be avoided", which I take as meaning it's probably done in software. Paul _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
