> > For floating point at least it's fairly common to have a low-precision > > reciprocal estimate (LUT + a bit of exponent twiddling), then do explicit > > N-R iterations [x = 1/d => x = x(2 - dx)] to get the desired precision. > > Feedng the N-R iterations through the regular ALU (possibly with ISA > > cooperation) may give better overall throughput (through increased ALU > > space) than a dedicated divider. > > Ah. It sounds like you're suggesting that we unroll the divider in > decode and send the subops down to the ALU. I like this.
Yes. Either in decode or in the compiler, depending on whether control logic complexity or code size is critical. Doing it in decode means you need to be able to cope with muti-stage instructions. This is fine if you already have other macro instructions, but could be a real PITA if all other instructions take a single execute cycle - Loads and pipelined operations don't count because they stall or move down the pipe rather than occupying multiple execute slots. Paul _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
