One other thing I'm thinking about: [a] We're going to be wanting to process some number of pixels in parallel. [b] We're going to have trouble scheduling instructions to make best use of functional units.
So, let's take advantage of that. Let's assume we can have data dependencies that make different pixels require different instruction flow. We can pull a Niagara and feed instructions for four threads through a smaller number of execution units. So, our add/mul units are capable of both vector and scalar computations, so we have two such units (or two of each type; whatever) and can schedule two vector computations per clock or some arbitrary assortment of scalars on one or both. On empirical analysis of resource contention, we may add some functional units later, but the idea is to remain reasonably small. Just like with Niagara, we have lots of opportunities to avoid control and data hazzards, so we don't need to account for them. (We may want to have some locks in place, but we can afford to just stall.) For each pixel, even the effective memory read latency is smaller. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
