On 4/17/06, Lourens Veen <[EMAIL PROTECTED]> wrote:

>
> You can still get high throughput with pipelined functional units. It
> doesn't matter much if it takes ten cycles to multiply two numbers (or
> vectors of numbers), as long as you can provide two new numbers to
> multiply every cycle, and read out the result of the calculation that
> started ten cycles ago. Throughput will still be ok (or at least as
> good as it gets at the given clock rate).
>

One of the things we're forgetting is that static scheduling is way
behind the curve, but dynamic scheduling requires lots of extra
hardware.  Unless we hand-code most of what we run on this or have
some massive peep-hole optimizer library, we're always going to get
sub-optimal code.

The only way to keep the computing units busy with a new fragment
every cycle is to avoid data dependency hazards.  We can only do that
if we can overlap the processing for different fragments (like
threads).  Then we have to keep track of multiple processor states.

Only slightly related, the statistics I have on branch delay slots say
that they're only fillable about 60% of the time and they're only
useful to the computation about 80% of the time when they're filled,
making delay slots only useful about 50% of the time.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to