On Friday 04 February 2005 20:47, Daniel Phillips wrote: > On Friday 04 February 2005 09:14, Lourens Veen wrote: <fast divide for perspective correction> > > > > What kind of precision is acceptable for this? > > Hi Lourens, > > My kneejerk reaction is that 16 bits is required for the perspective > divide. I did experiment with 16/8 divides at one point in software but > never managed to produce stable results. Floating point output might have > helped there, but I really expect 8 bit divide precision to cause a lot of > easily visible artifacts.
Well, we're not really limited to those two options. The input value is a 24-bit float, with 8 bit exponent and 17 bits of mantissa including the hidden 1 bit, and what we need is the reciprocal. The exponent is not the problem, since that's a simple matter of addition and subtraction. For the mantissa, we need to do a divide, and we don't want to do divides in hardware because a division unit takes up way too much space on the FPGA. We do have some 18-bit integer multipliers, but they're scarce, so we only want to use them if nothing else is good enough. Now, this function that we're looking for takes a 16-bit mantissa (with an implicit 1 in front) and returns a 16-bit mantissa (same thing) for the reciprocal. Instead of dividing, we're using a LUT, which will be put in a RAM block in the FPGA. These RAM blocks are 1k 18-bit words. Now, 1k words means that we use the topmost 10 bits of the input mantissa as an index into the table. If we store the 16-bit results in the table, that gives us 9 bits of precision (1/x is not a linear function, and we lose a bit in the approximation). That is, you get a 16-bit value back, but the least significant bits are garbage. You can be sure however that if you round it to a 9-bit value, you get the same result as when you had rounded the correct result to 9 bits. 9 bits is not too good though. Second option is to read two consecutive 16-bit values from the LUT, and do a linear interpolation between them. That costs us a multiplier, but gives 15 bits precision. Unfortunately, it requires reading two words from the RAM block per pixel, and we can only read two words at a time. So this is not going to work for a dual-pixel pipeline. My latest attempt it to store two values in the 18-bit words: a 14-bit approximation of the actual value, and a 4-bit approximation of the difference to the next value. That allows us to do a linear interpolation while reading only one word per pixel, at a cost of some precision. It looks like I can almost get 13 bits of precision this way, and a 4-bit multiplier can probably be done in normal logic, without using the inbuilt multipliers on the FPGA. I guess when I get it to work we'll have to put it into the software model and see what the result looks like. Incidentally, the software model currently uses 32-bits floats where the hardware would be using 25-bits floats. I guess we need a real float25 class with appropriately diminished performance... Lourens _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
