Timothy Miller wrote:
But what I was
thinking was that if they all needed the vmul unit on one cycle but
not on the next, then two of the threads' instructions could be
scheduled on one cycle and two on the next. What are the chances that
we'll get a long stream of vmuls all in a row with no breaks? In that
case, it would definitely be better to have four completely
independent functional units.
Four vmuls (actually dot products) in a row is very common
for matrix multiplies. The sample shaders I've got, from
the OpenGL Shading Language book and GPU Gems, are all very
math intensive. I doubt you're going to be able to share
ALUs between threads. On the other hand, condition/branch
logic probably could be.
But on the gripping hand any statistics from generation
1 and 2 shaders are going to be biased in favour of math
ops because that was before branches became widespread. So
it is possible that shader code will have an instruction
mix more like generic C/C++ over the next few years. I'd
bet on heavy floating point staying though.
There are definitely some things we would want to do about multiple
threads accessing the same (or nearby) memory locations.
You'll probably get some sequential access patterns across
threads rather than within them. If a horizontal span of
fragments is being done in parallel by 2/4/N threads, it's
quite likely (especially for a 2D GUI) that thread #0 will
need texel P+0, thread #1 P+1, ...
Sheesh, I'm glad I'm a software person and don't have to
worry about designing and building this stuff :-)
--
Hugh Fisher
DCS, ANU
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)