On Friday 04 February 2005 20:47, Daniel Phillips wrote:
> On Friday 04 February 2005 09:14, Lourens Veen wrote:
<fast divide for perspective correction>
> >
> > What kind of precision is acceptable for this?
>
> Hi Lourens,
>
> My kneejerk reaction is that 16 bits is required for the perspective
> divide. I did experiment with 16/8 divides at one point in software but
> never managed to produce stable results.  Floating point output might have
> helped there, but I really expect 8 bit divide precision to cause a lot of
> easily visible artifacts.

Well, we're not really limited to those two options. The input value is a 
24-bit float, with 8 bit exponent and 17 bits of mantissa including the 
hidden 1 bit, and what we need is the reciprocal.

The exponent is not the problem, since that's a simple matter of addition and 
subtraction. For the mantissa, we need to do a divide, and we don't want to 
do divides in hardware because a division unit takes up way too much space on 
the FPGA. We do have some 18-bit integer multipliers, but they're scarce, so 
we only want to use them if nothing else is good enough.

Now, this function that we're looking for takes a 16-bit mantissa (with an 
implicit 1 in front) and returns a 16-bit mantissa (same thing) for the 
reciprocal. Instead of dividing, we're using a LUT, which will be put in a 
RAM block in the FPGA. These RAM blocks are 1k 18-bit words.

Now, 1k words means that we use the topmost 10 bits of the input mantissa as 
an index into the table. If we store the 16-bit results in the table, that 
gives us 9 bits of precision (1/x is not a linear function, and we lose a bit 
in the approximation). That is, you get a 16-bit value back, but the least 
significant bits are garbage. You can be sure however that if you round it to 
a 9-bit value, you get the same result as when you had rounded the correct 
result to 9 bits. 9 bits is not too good though.

Second option is to read two consecutive 16-bit values from the LUT, and do a 
linear interpolation between them. That costs us a multiplier, but gives 15 
bits precision. Unfortunately, it requires reading two words from the RAM 
block per pixel, and we can only read two words at a time. So this is not 
going to work for a dual-pixel pipeline.

My latest attempt it to store two values in the 18-bit words: a 14-bit 
approximation of the actual value, and a 4-bit approximation of the 
difference to the next value. That allows us to do a linear interpolation 
while reading only one word per pixel, at a cost of some precision. It looks 
like I can almost get 13 bits of precision this way, and a 4-bit multiplier 
can probably be done in normal logic, without using the inbuilt multipliers 
on the FPGA.

I guess when I get it to work we'll have to put it into the software model and 
see what the result looks like.

Incidentally, the software model currently uses 32-bits floats where the 
hardware would be using 25-bits floats. I guess we need a real float25 class 
with appropriately diminished performance...

Lourens

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to