That sounds like a really neat idea. Mind you, if we can avoid using multipliers, that would be even better. As it is, I'm not sure we'll have enough.
On Wed, 2 Feb 2005 09:52:34 +0100, Lourens Veen <[EMAIL PROTECTED]> wrote: > On Tuesday 01 February 2005 19:33, Daniel Phillips wrote: > > > > > Therein lies a problem. Since the reciprocal isn't precise (we > > > > > use only 10 mantissa bits when computing it) > > > > > > > > Hmm, I thought we had 18 bits of precision readily available. Is > > > > this a consequence of using linear interpolation for the divide? > > > > > > I was also under the impression that 10 mantissa bits were used for > > > the LUT, and the other bits were used for linear interpolation > > > between two 18 bit values from the LUT. This should actually yield a > > > pretty good result, I think Nicolas Caspens was the one who > > > contributed most of this in the original discussion (obviously, I may > > > be mistaken, so please don't kill me if I got the attribution wrong). > > > In any case, with linear interpolation the precision should be *much* > > > better than 10 bits. > > > > Anyway, if interpolation doesn't work out for some reason there's always > > Newton-Raphson, which is tried and true. I seem to recall that > > Newton-Raphson needs two multipliers for the single iteration step > > required, so if linear interpolation can do the job with one then I > > guess it's better. > > I've been thinking about this for a bit. How about the following. Instead of > just storing 16 bits of the reciprocal, how about storing both the reciprocal > and its derivative in those 18 bits? Then we would essentially have a > quantised approximation to a piecewise linear approximation to 1/x, rather > than a quantised approximation to 1/x. The numbers would have to be adjusted > slightly because we truncate rather than round to get the table index, but > that's doable. The question is how we divide those 18 bits over the two > numbers. > > Calculating the final number would then be something like > Read 1 18-bit word using lines 15:6 of the input for the address > Take bits 5:0 of the result, multiply by bits 5:0 of the input, and add to > bits 17:6 of the result > > That would fit the RAM gate constraints for a two-pixel pipeline, and require > only a single multiplier. The question is how accurate it is and whether it's > worth it. > > What is the input range for this? 16 bits, but what does it map to? And how > should the output be represented? If I can find the time I might just write a > test program and see if I can figure out what the best split is and how good > it is. > > Lourens > _______________________________________________ > Open-graphics mailing list > [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics > List service provided by Duskglow Consulting, LLC (www.duskglow.com) > _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
