(Except for one case
involving an immediate.)

Yes, that was the reason I did it. I considered it a free feature that we can compute (- REG + signed_const), and I planned to fix it up in the assembler by providing a "sub" with may be replaced with either an "add" or a "rsub". There are also other derived instructions we can consider
like "move", "jump", "noop", "neg", ... just to make life easier when
programming it.

If we are planning on having any sort of stack, that would be a
useful option. Have the stack grow down from the top of memory, and being
able to PUSH in a single cycle should be useful.

   `define QOP_MULT  6

If we combine these with the QMODE_ARITH mode, then we get the instructions

   and rX, rcY, rZ     ; rZ := rX & rcY
   or rX, rcY, rZ
   xor rX, rcY, rZ
lsh rX, rcY, rZ ; rZ := if rcY < 0 then rX >> -rcY else rX << rcY
   add rX, rcY, rZ     ; rZ := rcY + rZ
   rsub rX, rcY, rZ    ; rZ := rcY - rX
mult rX, rcY, rZ ; rZ := rX * rcY ; Note: always signed!

Now that you mention that, perhaps we'd like to have signed and
unsigned multiply instructions.

The built-in multipliers are 18*18 -> 36.  I think you need four of
them to make 36*36->72.  They're signed at 36 bits, so for us to do
signed or not, we just need to decide whether or not we replicate the
high bits of each operand out to the full word length.

Hmmm. Maybe we'll start thinking of code and seed what we need? If we
chain the multiplies like this, isn't that a receipe to loose clock
speed?  Maybe we can manage with 18*18 or 18*32 multiplies?  The most
common case I'd guess is 32 bits * small constant.  If 32*32 cases are
rare, and we gain speed by reducing width, we may gain in general by
using use more instructions here?

The other thing to be figured out is how to deal with the upper 32
bits.  If you multiply two 32-bit numbers, you get a 64-bit word.
[...]

Yours suggestions here sounds fine to be, I don't have any better.
Again, we should probably check what we'll need.

I have been reading a bunch of Xilinx documentation lately, and
they use different strategies. In app note 467, they have a simple
35x35 multiplier. There is also a double-pumping strategy used in
the matrix multipliers in app note 284. I am not saying that Xilinx is
always right, but they probably have experience with their own products.
I personally like the double pumping - at 100Mhz it should not be a problem.

Another thought is - do we even need to have a multiplier? It seems to
me that none of the things that this CPU is doing really *need* multiplication,
and if we can get the design to run faster by removing it...
The only thing that I can think of is the VGA translation, which probably needs
a multiplier for the memory mapping where (x,y) = x+y*MAX_X.
I think we should probably keep it in, but it is something to consider. Just because everyone else has multipliers in their RISC designs does not mean we need it for ours. At the very least, it leaves more for the shader pipeline!


=== Branching ===

Finally, branches are different from the above, since they don't use the ALU. The ALU is skipped here partly because it's very critical to load
the target address back into stage 1 (instruction fetch) as soon as
possible. That also means the ALU bits in the instruction word can be used to specify the condition of the branch. The relevant bits of the
instruction word are

However, I presume that the return address is forwarded through the
ALU to WB so that it appears in the register file. I guess we'll want to generate an instruction that splices in the return address and adds
it to zero and is also a no-op for MEM.  We need to diagram this out
so that we know when the return address is actually available to be
used.

The current branching instruction will pass the program counter just
past the branching instruction into register z.  Because of the
one-cycle delay of the branching, to instruction just-passed the branch
will be executed again at return.  If we don't want that, we could add
another increment.

That is probably a good idea. If we don't, that is sure to cause some
debugging headaches down the road.


_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to