> > You should only implement 1 cycle operation. If you really need div, > > pipeline (1/x) with MUL with enough garded bit to have the required > > precision. There is a lots of 1 cycle operation for complexe function > > (1/x, 1/sqrt(x)). > > Any division operation even when if it's supposedly 1 cycle is in reality: > 1 operation is to be executed per cycle, but the latency will be between > 25 to 64 cycles. It depend on the operation requested and the data type. > Doing so will require ~ 64 substractor if we support fractionnal result > divide for integer. 32 substractor if we support divide and modulo only.
For floating point at least it's fairly common to have a low-precision reciprocal estimate (LUT + a bit of exponent twiddling), then do explicit N-R iterations [x = 1/d => x = x(2 - dx)] to get the desired precision. Feedng the N-R iterations through the regular ALU (possibly with ISA cooperation) may give better overall throughput (through increased ALU space) than a dedicated divider. Paul _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
