On Saturday 15 April 2006 18:21, Timothy Miller wrote:
> On 4/15/06, Lourens Veen <[EMAIL PROTECTED]> wrote:
> > Can't we combine this with Timothy's MISC idea? Have a "CPU" with
> > load/move/store, and a bunch of functional units that can each
> > perform a complex (think Altivec/3DNow!/SSE3 or even more complex
> > than that, like a dot product) instruction. Newer processors can
> > simply have more functional units, and could be backwards
> > compatible with their predecessors.
>
> The idea I keep thinking about is to have a pipeline of general
> functional units.  As a fragment passes down the pipeline, it's like
> executing instructions.  If the number of instructions to be executed
> exceeds the pipeline length, the fragment gets forwarded back up to
> the beginning.  Loops would get unrolled to the pipeline length;
> longer ones would work via the forwarding mechanism.  Any sequence of
> instructions shorter than the pipeline length would get padded with
> NOOPs.

So basically it would be a pipeline of processors. But that is possibly 
less efficient than having a single MISC "scheduler" in the middle, and 
a lot of functional units around it. Each processor in your pipeline 
only ever does one instruction, and all the hardware for the other 
functions it can perform is idle. In contrast, separate functional 
units could all work at the same time, if they could get data quickly 
enough. Perhaps there should be multiple MISC cores, they're likely to 
become a bottleneck.

> The problem is that any more than a few general purpose registers
> would make every pipeline stage a massive amount of logic, limiting
> the number of stages.  But the idea is to get great throughput at a
> low clock rate.  We cannot design something to run at 500MHz.

You can still get high throughput with pipelined functional units. It 
doesn't matter much if it takes ten cycles to multiply two numbers (or 
vectors of numbers), as long as you can provide two new numbers to 
multiply every cycle, and read out the result of the calculation that 
started ten cycles ago. Throughput will still be ok (or at least as 
good as it gets at the given clock rate).

Lourens

Attachment: pgpAZau8cv5UT.pgp
Description: PGP signature

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to