On 4/19/06, Robin Mélinand <[EMAIL PROTECTED]> wrote:

> Data stack manipulation instructions are like mov commands, but instead
> they don't move the lengthy data stored in the registers, they just
> alter the register order.
> Optional "modules" like ALU always take their operands form the register
> labeled "top" and "next" (0 and 1) and put the result back in "top".
> If the design is such that data stack manipulation instructions can
> re-order the stack before the ALU gives its result, you acheive the same
> clock cycle usage as with Timothy Baldridge MISC idea.

I had a lengthy discussion about stack machines with my computer arch
professor.  As a general-purpose CPU, I had the idea to combine a
stack machine with register renaming.  The ISA is stack-based, but it
gets translated into register-based micro-ops on a 32-entry register
file.

The first advantage is that operands are implicit, making the
instructions very short.  This reduces bandwidth required for
instruction fetch (mostly just for cache misses) by a significant
margin.

Another advantage to this whole approach is that you can maintain ILP
because any "push" onto the stack is really just the allocation of a
new register.  Usually a stack architecture has resource conflicts
over the top of the stack; if you're waiting on an instruction to
execute, one which is going to push something onto the stack, you
cannot do anything else to the stack, because the stack ordering
serializes instructions.  But if you do renaming, then the target of
an earlier instruction is just a pending location on the stack; if you
push other operands or reshuffle the stack, there is no hazzard.

Commonly the way operands are handled on stack machines is with stack
shuffle instructions.  You can't always do everything with pushes and
pops from the top, so you push on all your data, execute an
instruction, reorder the stack, execute, etc.  It turns out that those
rotates, picks, and dups you have to do often impose enough overhead
so as to offset the benefit of using a stack ISA.  Essentially, we end
up encoding the register numbers in terms of stack offsets for
shuffling, thereby bringing the instruction byte count back up from
the ideal.  And on top of that, we've added this translation front-end
to convert from a stack ISA to basically a RISC micro-op encoding. 
(Translation front-ends are evil for control hazzards!)

It's true that if the compiler is sufficiently intelligent, you can
avoid a lot of the shuffles and dups, but it's not as wonderful as you
might first think. I encourage others to do some of their own
back-of-the-envelope calculations on this to see how it would go.

In our case, we can store relatively large programs in on-chip RAM
blocks, making instructions relatively cheap to fetch.  Certainly, the
smaller your instructions, the more you can fit in the buffer, but you
don't gain any performance unless programs get so large that we have
to store them in graphics memory.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to