Re: [Open-graphics] Rounding reciprocals

Lourens Veen Thu, 17 Feb 2005 03:06:00 -0800

On Wednesday 16 February 2005 03:02, Daniel Phillips wrote:
> On Tuesday 15 February 2005 18:44, Lourens Veen wrote:
> > On Tuesday 15 February 2005 23:47, I wrote:
> > > At this point I'm fretting more about DDA precision than the
> > > perspective divide.  I seriously doubt we'll get stable results
> > > stepping across the whole screen with 16 bit precision.  I'm
> > > mulling over a couple of suggestions, short of adding more bits.
> >
> > ...At any rate, could we assume a maximum of 2048x1536 for the
> > resolution? That means 11 bits for integer screen coordinates, so
> > we'd have an 11.5 split if we did fixed point (and floating point
> > just doesn't make sense to me here really, comments?).
>
> Floating point makes a whole lot of sense here because we don't have
> much control over the input parameters, which can vary over wide
> ranges.  The W divide introduces a further, large degree of variation
> for new and far objects.  A typical nasty case is a viewpoint near a
> huge vertical plane running far into the distance, i.e., looking along
> the side of a building.


Colours are still relatively limited I'd say. But I see your point, I hadn't 
thought of texture coordinates outside of the texture, but of course that 
happens when the texture wraps.

> > and what if we drew the span from
> > both sides to cut the error in half (ie, we do two pixels at a time,
> > but not next to one another, but one starting at the left side of the
> > span and the other on the right side)?
>
> This problem needs stronger medicine than just cutting the damage in
> half.  Also, meeting in the middle makes any cumulative error easy to
> see.  The tears down the middle of triangles will crawl around the
> screen in a distracting way.  And it's not going to be friendly to the
> DRAM interface.

It was more intended as an additional measure, not a complete one. But I see 
your point, it's not a good idea.

> > Just some thoughts.
>
> Three options that seem viable to me are:
>
>   1) Correct each interpolant on the fly using a single multiplier in a
>      round robin.
>
>   2) Chop up big geometry in the driver vertically and horizontally into
>      bite size chunks.
>
>   3) More bits, just for the interpolants.

Okay, you're with 2), I'll take the other two :-).

Let's take a look at the vertical interpolation of, for example, the X1 
coordinate. It's a linear interpolation, so the formula is

X1 = X1_0 + dX1dY * (Y - Y_0)

which we calculate as

X1 = X1_0 + dX1dY + dX1dY + dX1dY + ... + dX1dY // (Y - Y_0 additions)

and if I understand correctly, the problem is that we do not have enough bits 
in dX1dY and X1 so that the error accumulates as Y grows. So, because of 
rounding, we do not store dX1dY, but (dX1dY + delta). If the fractional part 
of dX1dY is n bits, then |delta| < 2**-(n+1). So, worst case, what we 
actually calculate by cumulatively adding is

X1 = X1_0 + (dX1dY + 2**-(n+1)) * (Y - Y_0)
   = X1_0 + dX1dY * (Y - Y_0) + (2**-(n+1) * (Y - Y_0))

or

X1 = X1_0 + dX1dY * y + err * y

with y = (Y - Y_0) and err = (2**-(n+1))

A real multiplier would not calculate the product by doing y additions. 
Instead, it does

X1 = X1_0 + dX1dY * y:0 + (dX1dY << 1) * y:1 + ... + (dX1dY << m) * y:m

We want to do this incrementally. Take the the bit representation of y, which 
runs from 0 to height H. We can write y as the sum of a previous value of y, 
and a power of two:

0 = 000
1 = 001 = 000 + 001
2 = 010 = 000 + 010
3 = 011 = 010 + 001
4 = 100 = 000 + 100
5 = 101 = 100 + 001
6 = 110 = 100 + 010
7 = 111 = 110 + 001
...
1024 = 0 + 1 << 10

Multiply by dX1dY and you get

X1[0] = 0
X1[1] = 1 * dX1dY = X1[0] + dX1dY
X1[2] = 2 * dX1dY = X1[0] + dX1dY << 1
X1[3] = 3 * dX1dY = X1[2] + dX1dY
X1[4] = 4 * dX1dY = X1[0] + dX1dY << 2
X1[5] = 5 * dX1dY = X1[4] + dX1dY
X1[6] = 6 * dX1dY = X1[4] + dX1dY << 1
X1[7] = 7 * dX1dY = X1[6] + dX1dY
...
X1[1024] = 1024 * dX1dY = X1[0] + dX1dY << 10

(or X1[x] = X1[x & (x-1)] + dX1dY << number-of-rightmost-zeros and note that 
we can latch x-1 and use a priority encoder on the inverted carry outputs of 
the incrementor to get the shift factor)

That's still one addition per increment, and if we store enough bits of dX1dY 
(ie, 10 bits fraction in this example, so that dX1dY becomes dX1dY' >> 10) 
and start with X1[0] = 0.5 + X1_0 (to round properly) then the rounding error 
grows with the 2log of y, which is the same rate you lose precision at in a 
limited precision floating point number. Unless I'm missing something this 
means perfect results without a multiplier at all. We do need a bunch of 
registers to store the intermediate results however, 15 per interpolant for a 
16-bit mantissa, and I'm not sure how expensive that gets. The RAM blocks 
don't have enough ports to use them effectively (it would require 1 RAM block 
per interpolant for the horizontal rasterisation, which is rather wasteful). 

> The second of these seems the most pragmatic since it can be offloaded
> to the host, saving gates and Timothy cycles.  The extra work would
> level out nicely because larger triangles incur most of the penalty,
> and there should be fewer of them.  For best visual stability, the
> clipping planes would form a rectangular mesh in screen space.  Note
> that this means small triangles do not necessarily escape intact,
> however they are more likely to.

I guess we really need to know how big this error can get, and how far we can 
go without creating artifacts. So we need a theoretical worst case. The 
problem seems to be that you can rotate a triangle arbitrarily close to 
edge-on, so you can always make it worse...

How would mipmapping influence this? Needs thought...

> > Incidentally, these differentials are calculated
> > on the host, not the card, right?
>
> Most probably off-card because we ran out of multipliers some time ago.
> The problem with that is, it really bulks up the DMA stream, and PCI
> bandwidth is already tight.  This is probably just a case of grin and
> bear it.

Yeah, and I don't see us doing all those reciprocals in parallel in hardware 
either. Perhaps we can still do something with the colour values, since they 
have a limited range.

Lourens
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Rounding reciprocals

Reply via email to