The ideas are

  * Move cheap unary operators from stage 3 to stage 2, e.g. istead of
    implementing {add, sub} in 3, implement only {add} and optional
    negation on one of the operands.  The motivation is that parallel
    functions in the ALU sits idle while one is working, and by moving
    some computation to the previous stage we can instead exploit more
    combinations.


The other advantage of this is that the ALU latency is reduced for single operations (e.g. add in this case), so with some tricks the ALU can be synthesized
faster.

  * Let the ALU run independent of whether we do arithemetic or memory
operations. In case of memory operations, the output of the ALU is
    the address, thus giving us more powerful addressing "for free".
    For store, the second operand is the stored value so we need to
    weaken the addressing by forcing immediate mode for the second
    argument to the ALU.


Are we going to have some sort of parallel ALU arch? If so, we could have
one ALU dedicated to addressing for memory operations. We can compute
address offsets using a simpler ALU, and only allow a subset for that one for
normal operation. This allows for commands like:
STORE ax + b
FETCH rega*regb*memc (since the multiplier is FPGA hardware)
STORE (a-b)*c
and such in ONE cycle! (maybe not the second one)

  * Only basic one-cycle operations goes in the CPU.  If we need any
    multi-cycle operation (div?), we could add it as IO ports, using
    the "modularised RISC" idea,
    http://lists.duskglow.com/open-graphics/2006-April/005307.html.

Below is the sketch.  I hope you can read the Unicode.  Some
conventions:

  * I have taken the liberty to misuse logical operators as bitwise
    logical operators.

  * Conditional bitwise not:
        (¬)^s x = | x     if s = 0
                  | ¬ x   if s = 1

To clarify for people who find this confusing, I think that this
is raising the not operation to the sth power (so if s is zero, this
just doesn't change x). Right?


 1. Instruction Fetch

Fetch one instruction of the form (qmode, qop, ix, sx, mx, iy, sy, iz, c)
    where

        qmode   2 bits  0 = ARITH, 1 = FETCH, 2 = STORE, 3 = BRANCH
qop 3 bits ALU function if qmode ≠ BRANCH, else branch condition
        ix      5 bits  first operand register index
        sx      1 bit   first operand bitwise not option
        mx      1 bit   first operand increment by 1 option
        iy      5 bits  second operand register index
        sy      1 bit   second operand bitwise not option
        iz      5 bits  write-back register
        c       9 bits  immediate value

 2. Register Fetch, Operand Computation, and Branch Condition

  2a. Operand Computation.

    Here we compute

        x := (¬)^sx r_ix + mx

        y := | (¬)^sy ⌊2^c r_iy⌋  if iy ≠ 0 and qmode ≠ STORE
             | c                  if iy = 0  or qmode = STORE

I am confused about this:
⌊2^c r_iy⌋
Is that the floor operation?
This just uses the immediate as a bit shift, right?

That is pretty clever if it is what I think it is.


    If we expand the first on in terms of (sx, mx) we get the 4
    operations:

        x := | r_ix       if sx = 0, mx = 0
             | ¬ r_ix     if sx = 1, mx = 0
             | r_ix + 1   if sx = 0, mx = 1
             | - r_ix     if sx = 1, mx = 1

    I assume splitting up negation into bitwise-not and increment does
    not add significantly to the complexity.  If that is not the case,
    we should drop the mx from the instruction and simplify x to

        x := (-1)^sx r_ix

    Register r_0 is hardcoded to 0 for reads.

This might cause some issues. Based on the way that y is implemented,
one cannot have it be based on r_0 (since iy=0 signals immediate, I think). So, if you want to not input, then ix = 0 and iy = register number, with c=0 and
sy = 1. If you want to add one, you use the ALU add operation.

Ok, I guess it doesn't cause a problem. never mind ;-)


  2b. Branch Condition.

        do_branch := | false    if qmode ≠ BRANCH
                     | x = 0    if qmode = BRANCH and qop_0 = 0
                     | x ≥ 0    if qmode = BRANCH and qop_0 = 1

The x ≥ 0 test is cheap (check upper bit). If the x = 0 test is too expensive here, then we could add an equality comparison to the ALU
    which returns -1 for true and 0 for false.


That looks good.

 3. ALU

        z := | x + y    if qop = ADDSUB
             | x ⋅ y    if qop = MULT
             | x ∧ y    if qop = AND
             | x ∨ y    if qop = OR
             | x ⊻ y    if qop = XOR
             | ⌊2^y x⌋  if qop = SHIFT
             | ...

Note that the choice of x and y in SHIFT gives the maximum flexibility
    by not duplicating the stage 2 shift operation.


I like that!

 4. Memory and IO

    Optionally fetch from or store to local memory location z:

        z' := z                 if qmode = ARITH or BRANCH
        z' := M_z               if qmode = FETCH
        M_z := r_iy             if qmode = STORE


so, with my ALU idea if qmode = STORE, M_z := z2. I don't know if that is possible - if not, it should at least run through the stage 2 +1 logic (by switching
 the qmode=STORE test from y to x in stage 2).
If this doesn't make much sense, ignore it.
We only have 512 instructions, so this is a matter of code
compression and speed. Since the qmode = STORE instructions are going
to pass down the pipeline like any other instruction, we might as well use the
logic.

 5. Write-Back

        r_iz := z'  if iz ≠ 0 and qmode ≠ 11

    We could add a bitwise-not here, and drop OR from the ALU, but I'm
    running out of instruction bits.

Yeah, either way would work.

I like the design!

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to