Read this: http://yarchive.net/comp/zero_register.html
The conclusion drawn is that with enough registers (e.g. 32 or more), the cost of having the special-purpose register is low compared to the benefit in reducing instruction count and reducing special cases in the logic. I still don't know how well this applies to GPUs, but I'm trying to make some other fixes to the ISA, like adding PC-relative branches and some other things, and I find that some options basically cry out for a hard-wired zero register. Actually, one of the debates in my mind is this: Choice 1: PC-rel branch (PC+reg+offset) Absolute jump (reg+reg) Choice 2: PC-rel branch (PC+offset) Absolute jump (reg) If I go with choice 1, I need a zero register. If I go with choice 2, I don't. The other much bigger affect of choice 1, however, is that of the critical path in the decode logic. Choice 2 uses either an adder (logic delay A) or a register file lookup (logic delay B). Choice 1's circuit delay is the sum of those (A+B). That sucks, so I don't think I want to do it. (Even if I still have a zero register.) Modifying the ISA that I published earlier, what is the benefit of having a zero register (thereby freeing up the space for the wr flag): - We could add one more bit to immediates - We could add one more opcode bit That's about it. What advantage is there to having 17-bit immediates? (And 13-bit offsets?) Not much, on average. I also don't know what we'd get out of having more opcode bits. For RR-type instructions, we have the "function" extension to the opcode, and we have plenty of room there. There's then another 15 opcodes, where space is tight, but I don't know what I'd put in there if I had more room. BTW, I've already decided to drop MUX and support only conditional MOV. The other benefit is that the RI-type MUXI instruction is replaced with an unconditional load-immediate (LI) instruction that can load a 23-bit constant into any register. So what we're left with, in terms of benefit of having a zero register, is that zero is a common operand. Not having to clear a register first eliminates and instruction and any stalls associated with real and spurious dependencies. When would this help? - Assuming I choose Choice 2 above, it makes no difference for branches. - Most ALU operations have immediate versions, so no help there. - The LD instruction could load from an absolute address 0 to 1023, which MIGHT be useful if that's where we store a table of constants. This is especially useful for float constants, which can't fit into 23 bits (LI). (ST has an equivalent case.) - The dependency handling logic for single-input RR type instructions is slightly simpler. (Except for a barrel processor that doesn't have this problem.) Again, that's about it. There is a strong argument for being able to cancel writeback, but very little call for a zero register input for _this_ ISA. Let's see some more debate. :) -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
