On Fri, 25 Feb 2005 16:10:14 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:

> 
> LRU implements the policy I described so long as the eight texels of a
> trilinear blend are accessed in the proper order, which isn't the best
> order for the DRAM (it alternates between two textures).  But if the memory
> request queue is long enough, maybe the accesses can be reordered there.
> Or simpler, have two independent texture caches per port, or switch that on
> when trilinear blending is active.

As a matter of fact, doing the two texture units as really two texture
units simplifies some things and makes some more complex, but it would
give us room to improve efficiency by reordering memory accesses. 
Given the change of direction, I may do it that way.

> 
> OK, it looks like you have the memory controller under control (so to
> speak).  It's obviously something you have a lot of experience with in the
> 2D case.  The reason I'm worrying about it so much is, I'm wondering if
> there is any way to squeeze 2x2 FSAA + trilinear blending out of this card,
> even if only usable at 640 x 480.  This for me is where we enter the realm
> of "doesn't suck" for a 3D card.

The way I was going to do FSAA was to have one stage that divided the
screen coordinates by 2**n and modified the alpha channel
appropriately.  Horizontally associated fragments would be
automatically combined, but vertically associated fragments would be
resolved by the blender.  That would lose some accuracy and
performance, but oh-well.

However, it's been suggested that one solution is to render the scene
N times with slightly different camera positions and them composite
them.  It seems to me that that would be a lot slower, but I haven't
put a lot of thought into it.

> A pixel is one word, so I'll do the estimate in words.  We have 400
> megawords per second bandwidth, ignoring pre-charge delays.  We have to
> draw 640x480 at double resolution, that is, 4 times the pixels.  Each pixel
> accesses 8 words of texture and writes one word of frame buffer, that's 9
> words/pixel.  Quake 3 has a roughly a 2X overdraw factor, so assuming that
> and 30 frames/second, the grand total is:
> 
>   640 * 480 * 4 * 9 * 2 * 30 = 663,552,000 words/sec
> 
> which is 65% too much.  The scanout adds another 640 * 480 * 4 * 30 =
> 36,864,000 words/sec, making it 75% over, not counting DRAM pre-charge
> latencies.  The texel access pattern will sometimes bottleneck on one or
> two chips, slowing things down some more.  The point is, we're not that far
> away, and even if we can't quite get a playable frame rate, it would still
> be wonderful to at least have the option of seeing a scene in full glorious
> non-jagginess from time to time.

Only after starting this project did I start to notice that polygon
edges in games I play are jagged.  :)

> 
> There are several ways to claw back some fill rate:
> 
>   1) Texture cache, saving maybe 25% of texel accesses

There are all sorts of thing that would qualify as a "texture cache".

>   2) 16 bit textures, saving 50% of texel bandwidth and improving cache hit
>      rate slightly

Making all memory words exactly 32 bits is a simplification that I
don't want to give up.

>   3) 16 bit frame buffer, saving 50% of pixel write and scanout bandwidth.
> 
> If we do all of these, we just might hit 30 frames/sec for Quake 3, with
> decent filtering and antialiasing.

This is something to consider.  Quake 3 isn't so advanced that we
shouldn't pay attention to it.

But we're going to do what we can do, and that's going to be more than
good enough for most embedded and desktop applications.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to