Re: [Open-graphics] The Central Processor

Nicholas S-A Wed, 28 Mar 2007 17:16:45 -0800

 (Except for one case
involving an immediate.)
Yes, that was the reason I did it. I considered it a free featurethatwe can compute (- REG + signed_const), and I planned to fix it upin theassembler by providing a "sub" with may be replaced with either an"add"or a "rsub". There are also other derived instructions we canconsider
like "move", "jump", "noop", "neg", ... just to make life easier when
programming it.


If we are planning on having any sort of stack, that would be a

useful option. Have the stack grow down from the top of memory, andbeing

able to PUSH in a single cycle should be useful.

   `define QOP_MULT  6
If we combine these with the QMODE_ARITH mode, then we get theinstructions
   and rX, rcY, rZ     ; rZ := rX & rcY
   or rX, rcY, rZ
   xor rX, rcY, rZ
lsh rX, rcY, rZ ; rZ := if rcY < 0 then rX >> -rcY else rX<< rcY
   add rX, rcY, rZ     ; rZ := rcY + rZ
   rsub rX, rcY, rZ    ; rZ := rcY - rX
mult rX, rcY, rZ ; rZ := rX * rcY ; Note: alwayssigned!


Now that you mention that, perhaps we'd like to have signed and
unsigned multiply instructions.

The built-in multipliers are 18*18 -> 36.  I think you need four of
them to make 36*36->72.  They're signed at 36 bits, so for us to do
signed or not, we just need to decide whether or not we replicate the
high bits of each operand out to the full word length.

Hmmm. Maybe we'll start thinking of code and seed what we need?If we

chain the multiplies like this, isn't that a receipe to loose clock
speed?  Maybe we can manage with 18*18 or 18*32 multiplies?  The most
common case I'd guess is 32 bits * small constant.  If 32*32 cases are
rare, and we gain speed by reducing width, we may gain in general by
using use more instructions here?

The other thing to be figured out is how to deal with the upper 32
bits.  If you multiply two 32-bit numbers, you get a 64-bit word.
[...]


Yours suggestions here sounds fine to be, I don't have any better.
Again, we should probably check what we'll need.


I have been reading a bunch of Xilinx documentation lately, and
they use different strategies. In app note 467, they have a simple
35x35 multiplier. There is also a double-pumping strategy used in
the matrix multipliers in app note 284. I am not saying that Xilinx is
always right, but they probably have experience with their own products.

I personally like the double pumping - at 100Mhz it should not be aproblem.


Another thought is - do we even need to have a multiplier? It seems to

me that none of the things that this CPU is doing really *need*multiplication,

and if we can get the design to run faster by removing it...

The only thing that I can think of is the VGA translation, whichprobably needs

a multiplier for the memory mapping where (x,y) = x+y*MAX_X.

I think we should probably keep it in, but it is something toconsider. Justbecause everyone else has multipliers in their RISC designs does notmeanwe need it for ours. At the very least, it leaves more for the shaderpipeline!

=== Branching ===
Finally, branches are different from the above, since they don'tuse theALU. The ALU is skipped here partly because it's very criticalto load
the target address back into stage 1 (instruction fetch) as soon as
possible. That also means the ALU bits in the instruction wordcan beused to specify the condition of the branch. The relevant bitsof the
instruction word are
However, I presume that the return address is forwarded through the
ALU to WB so that it appears in the register file. I guess we'llwantto generate an instruction that splices in the return address andadds
it to zero and is also a no-op for MEM.  We need to diagram this out
so that we know when the return address is actually available to be
used.
The current branching instruction will pass the program counter just
past the branching instruction into register z.  Because of the
one-cycle delay of the branching, to instruction just-passed thebranch
will be executed again at return.  If we don't want that, we could add
another increment.


That is probably a good idea. If we don't, that is sure to cause some
debugging headaches down the road.


_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] The Central Processor

Reply via email to