One other thing I'm thinking about:

[a] We're going to be wanting to process some number of pixels in parallel.
[b] We're going to have trouble scheduling instructions to make best
use of functional units.

So, let's take advantage of that.  Let's assume we can have data
dependencies that make different pixels require different instruction
flow.  We can pull a Niagara and feed instructions for four threads
through a smaller number of execution units.  So, our add/mul units
are capable of both vector and scalar computations, so we have two
such units (or two of each type; whatever) and can schedule two vector
computations per clock or some arbitrary assortment of scalars on one
or both.  On empirical analysis of resource contention, we may add
some functional units later, but the idea is to remain reasonably
small.

Just like with Niagara, we have lots of opportunities to avoid control
and data hazzards, so we don't need to account for them.  (We may want
to have some locks in place, but we can afford to just stall.)  For
each pixel, even the effective memory read latency is smaller.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to