On 4/19/06, Robin Mélinand <[EMAIL PROTECTED]> wrote:
> Data stack manipulation instructions are like mov commands, but instead > they don't move the lengthy data stored in the registers, they just > alter the register order. > Optional "modules" like ALU always take their operands form the register > labeled "top" and "next" (0 and 1) and put the result back in "top". > If the design is such that data stack manipulation instructions can > re-order the stack before the ALU gives its result, you acheive the same > clock cycle usage as with Timothy Baldridge MISC idea. I had a lengthy discussion about stack machines with my computer arch professor. As a general-purpose CPU, I had the idea to combine a stack machine with register renaming. The ISA is stack-based, but it gets translated into register-based micro-ops on a 32-entry register file. The first advantage is that operands are implicit, making the instructions very short. This reduces bandwidth required for instruction fetch (mostly just for cache misses) by a significant margin. Another advantage to this whole approach is that you can maintain ILP because any "push" onto the stack is really just the allocation of a new register. Usually a stack architecture has resource conflicts over the top of the stack; if you're waiting on an instruction to execute, one which is going to push something onto the stack, you cannot do anything else to the stack, because the stack ordering serializes instructions. But if you do renaming, then the target of an earlier instruction is just a pending location on the stack; if you push other operands or reshuffle the stack, there is no hazzard. Commonly the way operands are handled on stack machines is with stack shuffle instructions. You can't always do everything with pushes and pops from the top, so you push on all your data, execute an instruction, reorder the stack, execute, etc. It turns out that those rotates, picks, and dups you have to do often impose enough overhead so as to offset the benefit of using a stack ISA. Essentially, we end up encoding the register numbers in terms of stack offsets for shuffling, thereby bringing the instruction byte count back up from the ideal. And on top of that, we've added this translation front-end to convert from a stack ISA to basically a RISC micro-op encoding. (Translation front-ends are evil for control hazzards!) It's true that if the compiler is sufficiently intelligent, you can avoid a lot of the shuffles and dups, but it's not as wonderful as you might first think. I encourage others to do some of their own back-of-the-envelope calculations on this to see how it would go. In our case, we can store relatively large programs in on-chip RAM blocks, making instructions relatively cheap to fetch. Certainly, the smaller your instructions, the more you can fit in the buffer, but you don't gain any performance unless programs get so large that we have to store them in graphics memory. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
