> > You should only implement 1 cycle operation. If you really need div,
> > pipeline (1/x) with MUL with enough garded bit to have the required
> > precision. There is a lots of 1 cycle operation for complexe function
> > (1/x, 1/sqrt(x)).
> 
> Any division operation even when if it's supposedly 1 cycle is in reality:
>  1 operation is to be executed per cycle, but the latency will be between
>  25 to 64 cycles. It depend on the operation requested and the data type. 
>  Doing so will require ~ 64 substractor if we support fractionnal result
>  divide for integer. 32 substractor if we support divide and modulo only.

For floating point at least it's fairly common to have a low-precision 
reciprocal estimate (LUT + a bit of exponent twiddling), then do explicit N-R 
iterations [x = 1/d => x = x(2 - dx)] to get the desired precision. Feedng the 
N-R iterations through the regular ALU (possibly with ISA cooperation) may 
give better overall throughput (through increased ALU space) than a dedicated 
divider.

Paul
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to