On 2/5/08, Kenneth Ostby <[EMAIL PROTECTED]> wrote: > An interesting point would be how much we're actually gaining/loosing > from adding a 4th step to the multiplyer? Seing how it should be > possible to pipeline it, we shouldn't have to sacrifice a lot of > performance to gain accuracy?
In the cases where we can simply pipeline it, it only costs us logic area. For the most part, the pipeline is a straight shot, and inserting an extra cycle of latency won't have any impact. For the one extra cycle, we also carry one extra pixel outstanding in the pipeline. There are two places where we have "loops" where an extra cycle could hurt us. One is the rasterizer, which only uses adders, so we're safe there. The other is in the texture unit. There, we could end up fetching up to 8 texels to make one fragment, and that would be done in a loop. I'm pretty sure we don't use a multiplier there either. Here's another way to deal with this issue. We only restrict ourselves specifically to a 17-bit mantissa because the multipliers take signed 18-bit operands. Aside from that, we could easily enough handle wider numbers (up to 23 for single precision). Perhaps an appropriate thing to do would be to hold wider floats, but when you're going to multiply, only use the most significant 17 bits of each operand. The result from the multiplier can then be taken as the wider number. This is no worse than what we were doing before. One way to test this would be to hack float25 so that it does everything 32-bit, except multiply, where is masks the lower bits of the mantissa before computing the product. That being said, wherever we can we will chop lower-order mantissa bits. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
