Timothy Miller wrote:

So, let's take advantage of that.  Let's assume we can have data
dependencies that make different pixels require different instruction
flow.  We can pull a Niagara and feed instructions for four threads
through a smaller number of execution units.  So, our add/mul units
are capable of both vector and scalar computations, so we have two
such units (or two of each type; whatever) and can schedule two vector
computations per clock or some arbitrary assortment of scalars on one
or both.  On empirical analysis of resource contention, we may add
some functional units later, but the idea is to remain reasonably
small.

Don't get too carried away with Niagara comparisons. A GPU
has to execute exactly the same shader program for every
pixel in a given triangle/primitive. There is a small amount
of data that varies for each primitive: the coords/normal/
tex coords at vertex level; color/texcoords for fragments;
which is about a dozen 4x32 bit registers at most. There's
a K or two of OpenGL state that the shader can read but not
write to as well, plus a K (?) or so of app state with
the same restriction.

Now that shaders have branches it's not guaranteed that
they all execute in lockstep, but there is a very high
probability that all the execution units will need to read
from the same memory location at the same time. Brute
force replication might work better than dynamic scheduling.

--
        Hugh Fisher
        DCS, ANU
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to