On Sun, 30 Jan 2005 20:54:40 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote: > Hi Timothy, > > You mentioned that the logic should end up clocked somewhere around > 200 MHz but that the card would be capable of 400 MHz fill rate. This > means two pixel pipelines, no? This would seem to correspond well with > the number of multipliers available. The question is, how best to > factor the work between the units? I can imagine several alternatives: > > 1) Two horizontally adjacent pixels per clock
This one. It ensures that the memory controller can maximize throughput for writes (well, makes it easier), and allows a lot of logic to be shared between pipelines. Really, it's not two pipelines--I'm going to design one pipeline that processes two pixels and see just how much simplification I can make that way. > > 2) Two vertically adjacent pixels per clock > > 3) Two adjacent trapezoid spans in parallel > > 4) Two arbitrary trapezoids in parallel > > The higher you go up the food chain, so to speak, the bigger a penalty > DRAM row crossing becomes but the more flexibility is introduced. My > guess is, you're thinking of alternative (1), since it would need the > least logic and earlier setup stages don't look like bottlenecks. Yup. > Next question, should the pixel units be identical, or should one of > them be more capable than the other, to handle some of less common but > important render combinations. Then at least one pixel unit could > still run instead of falling all the way back to software. This is > analogous to the way the original Pentium was organized. It makes a > lot of sense to me. The idea is to make them totally unified. > With alternative (4) above there is a lot of flexibility. For example, > if only one of the pixel units is capable of multitexturing, the other > can be busy with single-textured triangles. But a fairly complex > scheduler stage would be needed in front of trapezoid setup, so maybe > this is a good idea but for a later rev. This makes sense, but you point out some of the problems. If space gets too tight, I may consider something like this. When memory bandwidth usage reaches a certain level, it doesn't matter how many pipelines there are. > It does seem well worth the effort to implement two parallel pixel units > right from the beginning, doubling the fill rate. It's probably > sensible to make them identical and save some design time for the first > rev. Making one and instantiating twice would make some of it easier, but I think I can save some space just coding them together into one. > In case (1) a nice optimization would be to do the perspective divide > only every second pixel and linearly interpolate for the other. This > adds more logic, but it could conceivably be a way to get to a 4x pixel > unit design within the limited supply of multipliers. I've thought about things like that. I'm not sure what effect it would have on the image quality. > Just thinking aloud here and trying to get my mind wrapped around the > tradeoffs involved. That's what we're here for! :) _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
