Re: [Open-graphics] Accurate rasterizaton

Daniel Phillips Fri, 04 Feb 2005 14:58:44 -0800

On Friday 04 February 2005 15:21, Lourens Veen wrote:
> On Friday 04 February 2005 20:47, Daniel Phillips wrote:
> > On Friday 04 February 2005 09:14, Lourens Veen wrote:
>
> <fast divide for perspective correction>
>
> > > What kind of precision is acceptable for this?
> >
> > Hi Lourens,
> >
> > My kneejerk reaction is that 16 bits is required for the
> > perspective divide. I did experiment with 16/8 divides at one point
> > in software but never managed to produce stable results.  Floating
> > point output might have helped there, but I really expect 8 bit
> > divide precision to cause a lot of easily visible artifacts.
>
> Well, we're not really limited to those two options. The input value
> is a 24-bit float, with 8 bit exponent and 17 bits of mantissa
> including the hidden 1 bit, and what we need is the reciprocal.
>
> The exponent is not the problem, since that's a simple matter of
> addition and subtraction. For the mantissa, we need to do a divide,
> and we don't want to do divides in hardware


Whatever way it ends up, it will be "in hardware" since it's just 
programmable logic.  By the way, take a look at how this FPGA works, 
it's all done with lookup tables.  Maybe not the dedicated multipliers, 
but then again, maybe those too.  Anyway, thinking of the problem in 
terms of lookup tables seems highly appropriate.

> because a division unit 
> takes up way too much space on the FPGA. We do have some 18-bit
> integer multipliers, but they're scarce, so we only want to use them
> if nothing else is good enough.

This is really an excellent place to burn a multiplier or two, please 
don't be shy :-)

> Now, this function that we're looking for takes a 16-bit mantissa
> (with an implicit 1 in front) and returns a 16-bit mantissa (same
> thing) for the reciprocal. Instead of dividing, we're using a LUT,
> which will be put in a RAM block in the FPGA. These RAM blocks are 1k
> 18-bit words.
>
> Now, 1k words means that we use the topmost 10 bits of the input
> mantissa as an index into the table. If we store the 16-bit results
> in the table, that gives us 9 bits of precision (1/x is not a linear
> function, and we lose a bit in the approximation). That is, you get a
> 16-bit value back, but the least significant bits are garbage.

Speaking more precisely, they've been quantized.

> You 
> can be sure however that if you round it to a 9-bit value, you get
> the same result as when you had rounded the correct result to 9 bits.
> 9 bits is not too good though.
>
> Second option is to read two consecutive 16-bit values from the LUT,
> and do a linear interpolation between them. That costs us a
> multiplier, but gives 15 bits precision. Unfortunately, it requires
> reading two words from the RAM block per pixel, and we can only read
> two words at a time. So this is not going to work for a dual-pixel
> pipeline.

I'm thinking that the divide approximation really should be worked out 
in detail for the two-pixel case.  There has to be some redundancy to 
take advantage of.

> My latest attempt it to store two values in the 18-bit words: a
> 14-bit approximation of the actual value, and a 4-bit approximation
> of the difference to the next value.

Wait, if it's just the difference between two lookup iterations, why not 
compute it on the fly?  OK, I see, what you're proposing looks pretty 
interesting.

> That allows us to do a linear 
> interpolation while reading only one word per pixel, at a cost of
> some precision. It looks like I can almost get 13 bits of precision
> this way, and a 4-bit multiplier can probably be done in normal
> logic, without using the inbuilt multipliers on the FPGA.
>
> I guess when I get it to work we'll have to put it into the software
> model and see what the result looks like.

Yes.  I imagine you're getting pretty close to the money with 13 bits of 
precision.

> Incidentally, the software model currently uses 32-bits floats where
> the hardware would be using 25-bits floats. I guess we need a real
> float25 class with appropriately diminished performance...

Yes, and just to be kind to old C hacks like me, please don't overload 
the math operators, just make it a function so it's readable out of 
context.

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Accurate rasterizaton

Reply via email to