On 2/5/08, Kenneth Ostby <[EMAIL PROTECTED]> wrote:

> An interesting point would be how much we're actually gaining/loosing
> from adding a 4th step to the multiplyer? Seing how it should be
> possible to pipeline it, we shouldn't have to sacrifice a lot of
> performance to gain accuracy?

In the cases where we can simply pipeline it, it only costs us logic
area.  For the most part, the pipeline is a straight shot, and
inserting an extra cycle of latency won't have any impact.  For the
one extra cycle, we also carry one extra pixel outstanding in the
pipeline.

There are two places where we have "loops" where an extra cycle could
hurt us.  One is the rasterizer, which only uses adders, so we're safe
there.  The other is in the texture unit.  There, we could end up
fetching up to 8 texels to make one fragment, and that would be done
in a loop.  I'm pretty sure we don't use a multiplier there either.

Here's another way to deal with this issue.  We only restrict
ourselves specifically to a 17-bit mantissa because the multipliers
take signed 18-bit operands.  Aside from that, we could easily enough
handle wider numbers (up to 23 for single precision).  Perhaps an
appropriate thing to do would be to hold wider floats, but when you're
going to multiply, only use the most significant 17 bits of each
operand.  The result from the multiplier can then be taken as the
wider number.  This is no worse than what we were doing before.

One way to test this would be to hack float25 so that it does
everything 32-bit, except multiply, where is masks the lower bits of
the mantissa before computing the product.

That being said, wherever we can we will chop lower-order mantissa bits.

-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to