Read this:  http://yarchive.net/comp/zero_register.html

The conclusion drawn is that with enough registers (e.g. 32 or more), the
cost of having the special-purpose register is low compared to the benefit
in reducing instruction count and reducing special cases in the logic.

I still don't know how well this applies to GPUs, but I'm trying to make
some other fixes to the ISA, like adding PC-relative branches and some
other things, and I find that some options basically cry out for a
hard-wired zero register.

Actually, one of the debates in my mind is this:

Choice 1:
PC-rel branch (PC+reg+offset)
Absolute jump (reg+reg)

Choice 2:
PC-rel branch (PC+offset)
Absolute jump (reg)

If I go with choice 1, I need a zero register.  If I go with choice 2, I
don't.

The other much bigger affect of choice 1, however, is that of the critical
path in the decode logic.  Choice 2 uses either an adder (logic delay A) or
a register file lookup (logic delay B).  Choice 1's circuit delay is the
sum of those (A+B).  That sucks, so I don't think I want to do it.  (Even
if I still have a zero register.)

Modifying the ISA that I published earlier, what is the benefit of having a
zero register (thereby freeing up the space for the wr flag):
- We could add one more bit to immediates
- We could add one more opcode bit

That's about it.  What advantage is there to having 17-bit immediates?
 (And 13-bit offsets?)  Not much, on average.

I also don't know what we'd get out of having more opcode bits.  For
RR-type instructions, we have the "function" extension to the opcode, and
we have plenty of room there.  There's then another 15 opcodes, where space
is tight, but I don't know what I'd put in there if I had more room.

BTW, I've already decided to drop MUX and support only conditional MOV.
 The other benefit is that the RI-type MUXI instruction is replaced with an
unconditional load-immediate (LI) instruction that can load a 23-bit
constant into any register.

So what we're left with, in terms of benefit of having a zero register, is
that zero is a common operand.  Not having to clear a register first
eliminates and instruction and any stalls associated with real and spurious
dependencies.  When would this help?

- Assuming I choose Choice 2 above, it makes no difference for branches.
- Most ALU operations have immediate versions, so no help there.
- The LD instruction could load from an absolute address 0 to 1023, which
MIGHT be useful if that's where we store a table of constants.  This is
especially useful for float constants, which can't fit into 23 bits (LI).
 (ST has an equivalent case.)
- The dependency handling logic for single-input RR type instructions is
slightly simpler.  (Except for a barrel processor that doesn't have this
problem.)

Again, that's about it.  There is a strong argument for being able to
cancel writeback, but very little call for a zero register input for _this_
ISA.


Let's see some more debate.  :)


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to