On Wed, 23 Feb 2005 19:34:42 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> Hi Timothy,
> 
> Trilinear blending is not in the spec, but I would like to add it.  This
> is a fairly important feature.  Without it, banding at mipmap breaks
> will be annoyingly visible.  The quality difference between bilinear
> and trilinear filtered mipmapping is much greater than, for example,
> between trilinear and anisotropic.  As a sweetener, trilinear blending
> could be used to generate 2D cross fade effects.

I'm quite certain it's in there.  I also know for a fact that it's in
the model.  It hasn't, however, been tested.

> 
> I think trilinear blending can be added at a cost of only 4 8x8
> multipliers and some control logic.  Please bear with me as I attempt
> to express the concept as conceived from the point of view of a
> software guy with a bare inkling of how hardware works.
> 
> First, some review.  A bilinear blend combines four color values using
> three linear interpolation operations.  Given
> 
>    lerp(t, a, b) = (1-t)*a + t*b
> 
> then the bilinear blend of four color values c1..c4 is
> 
>    lerp(frac(y), lerp(frac(x), c1, c2), lerp(frac(x), c3, c4))
> 
> To save multiplies, we rewrite lerp as
> 
>    lerp(t, a, b) = t*(b - a) + a

Now, this I definitely like.  It's an optimization that I hadn't
thought of, but it will definitely make the logic simpler.

> This has to be performed on four color values for each of two pixels per
> clock, totalling 24 8x8 multiplies.  The pipeline delivers one pair of
> bilinear blended pixels per clock in two stages.  The first stage has
> 16 multiplies and the second, 8.

Well, I wasn't going to do it in exactly that order, but okay.

> A trilinear blend is two bilinear blends lerped together.  This totals
> 56 multiplies.  We probably can't afford that much real estate for this
> feature.  I propose to modify the bilinear hardware to do this at half
> clock speed, with 4 additional multipliers.  Given

It doesn't require that many multiplies, actually.  Since only two
pixels come in per clock, only one multiply has to be done per clock
to blend them.  I think the whole thing could be done with two or
three mulipliers without any performance hit (relatively speaking).

>   triblend(t1, t2, t3, c[2][4]) =
>      biblend(t3, biblend(t1, t2, c[0]), biblend(t1, t2, c[1]))
> 
> Then the idea is to repurpose the bilinear blend pipeline by alternating
> between the left and right members of the pixel pair, computing the two
> bilinear blend components in parallel.  Four multipliers appended to
> the end of the pipeline complete one trilinear blend per clock,
> alternating between left and right pixels.
> 
> How am I doing so far?

Not bad.  There are two sets of texture registers, but there is only
one texture unit that can read only two pixels per clock from RAM.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to