Hi Timothy, On Wednesday 02 February 2005 15:56, Timothy Miller wrote: > On Wed, 2 Feb 2005 15:32:45 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > So this is all by way of convincing myself that if we do the DMA in > > a fairly simple-minded way, it doesn't work out too badly. Any > > major oversights? > > No, this is good analysis. One thing: My initial estimate for > maximum triangle throughput was 1 million triangles/second. That's > from an estimate of 32-word command packets on 33mhz PCI. Someone > else told me that that's not bad, considering the CPU overhead just > for computing the geometry in the first place.
So far, so good. Using the simple minded "small triangle" compression, 3 million triangles/second seems like a more worthy goal. That's 100 thousand triangles/frame at 30 frames/sec, not too shabby for a PCI card. The on-card triangle setup doesn't look like a bottleneck at all. Maybe it's time to take a stab at estimating that. As a wild guess, each trapezoid will need 2 or 3 clocks for non-perspective setup, or more if it is done iteratively to save multipliers. So we'd have to drop below a handful of pixels/triangle before setup becomes a bottleneck. By spending some extra real estate, I imagine the setup overhead could be pipelined away even for single pixel triangles, and the host ought to be able to cull zero pixel triangles unless we're computing coverage masks, which for sure won't happen on the initial rev. Per-parameter perspective divides should probably be done on the host for the time being, meaning the 8.8 fixed point format is not appropriate and the temptation is to go to 24 bit fp per non-geometry parameter, adding an extra 6 bytes per textured triangle and messing up the internal alignment a little. With a little bit of repacking, I think we can still hit 3 million triangles/second even before putting in the effort to implement more efficient primitives. As far as host CPU requirements go, it would be a shame to let that lovely Ath64 just sit there idle. SSE2 and 3Dnow are perfect for this task. For simple textured triangles at 5 million triangles/second we're only asking for about 60 million divides/second on machines that deliver well over a gigaflop, and there's plenty of room for optimization. So there will be lots of CPU left over for game physics. All of the above "in my humble opinion" of course. Regards, Daniel _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
