On Wed, Sep 23, 2009 at 1:48 PM, Paul Brook <[email protected]> wrote:
>> > You should only implement 1 cycle operation. If you really need div,
>> > pipeline (1/x) with MUL with enough garded bit to have the required
>> > precision. There is a lots of 1 cycle operation for complexe function
>> > (1/x, 1/sqrt(x)).
>>
>> Any division operation even when if it's supposedly 1 cycle is in reality:
>>  1 operation is to be executed per cycle, but the latency will be between
>>  25 to 64 cycles. It depend on the operation requested and the data type.
>>  Doing so will require ~ 64 substractor if we support fractionnal result
>>  divide for integer. 32 substractor if we support divide and modulo only.
>
> For floating point at least it's fairly common to have a low-precision
> reciprocal estimate (LUT + a bit of exponent twiddling), then do explicit N-R
> iterations [x = 1/d => x = x(2 - dx)] to get the desired precision. Feedng the
> N-R iterations through the regular ALU (possibly with ISA cooperation) may
> give better overall throughput (through increased ALU space) than a dedicated
> divider.

Ah.  It sounds like you're suggesting that we unroll the divider in
decode and send the subops down to the ALU.  I like this.


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to