The ideas are
* Move cheap unary operators from stage 3 to stage 2, e.g. istead of
implementing {add, sub} in 3, implement only {add} and optional
negation on one of the operands. The motivation is that parallel
functions in the ALU sits idle while one is working, and by moving
some computation to the previous stage we can instead exploit more
combinations.
The other advantage of this is that the ALU latency is reduced for
single
operations (e.g. add in this case), so with some tricks the ALU can
be synthesized
faster.
* Let the ALU run independent of whether we do arithemetic or memory
operations. In case of memory operations, the output of the
ALU is
the address, thus giving us more powerful addressing "for free".
For store, the second operand is the stored value so we need to
weaken the addressing by forcing immediate mode for the second
argument to the ALU.
Are we going to have some sort of parallel ALU arch? If so, we could
have
one ALU dedicated to addressing for memory operations. We can compute
address offsets using a simpler ALU, and only allow a subset for that
one for
normal operation. This allows for commands like:
STORE ax + b
FETCH rega*regb*memc (since the multiplier is FPGA hardware)
STORE (a-b)*c
and such in ONE cycle! (maybe not the second one)
* Only basic one-cycle operations goes in the CPU. If we need any
multi-cycle operation (div?), we could add it as IO ports, using
the "modularised RISC" idea,
http://lists.duskglow.com/open-graphics/2006-April/005307.html.
Below is the sketch. I hope you can read the Unicode. Some
conventions:
* I have taken the liberty to misuse logical operators as bitwise
logical operators.
* Conditional bitwise not:
(¬)^s x = | x if s = 0
| ¬ x if s = 1
To clarify for people who find this confusing, I think that this
is raising the not operation to the sth power (so if s is zero, this
just doesn't change x). Right?
1. Instruction Fetch
Fetch one instruction of the form (qmode, qop, ix, sx, mx, iy,
sy, iz, c)
where
qmode 2 bits 0 = ARITH, 1 = FETCH, 2 = STORE, 3 = BRANCH
qop 3 bits ALU function if qmode ≠ BRANCH, else
branch condition
ix 5 bits first operand register index
sx 1 bit first operand bitwise not option
mx 1 bit first operand increment by 1 option
iy 5 bits second operand register index
sy 1 bit second operand bitwise not option
iz 5 bits write-back register
c 9 bits immediate value
2. Register Fetch, Operand Computation, and Branch Condition
2a. Operand Computation.
Here we compute
x := (¬)^sx r_ix + mx
y := | (¬)^sy ⌊2^c r_iy⌋ if iy ≠ 0 and qmode ≠ STORE
| c if iy = 0 or qmode = STORE
I am confused about this:
⌊2^c r_iy⌋
Is that the floor operation?
This just uses the immediate as a bit shift, right?
That is pretty clever if it is what I think it is.
If we expand the first on in terms of (sx, mx) we get the 4
operations:
x := | r_ix if sx = 0, mx = 0
| ¬ r_ix if sx = 1, mx = 0
| r_ix + 1 if sx = 0, mx = 1
| - r_ix if sx = 1, mx = 1
I assume splitting up negation into bitwise-not and increment does
not add significantly to the complexity. If that is not the case,
we should drop the mx from the instruction and simplify x to
x := (-1)^sx r_ix
Register r_0 is hardcoded to 0 for reads.
This might cause some issues. Based on the way that y is implemented,
one cannot have it be based on r_0 (since iy=0 signals immediate, I
think).
So, if you want to not input, then ix = 0 and iy = register number,
with c=0 and
sy = 1. If you want to add one, you use the ALU add operation.
Ok, I guess it doesn't cause a problem. never mind ;-)
2b. Branch Condition.
do_branch := | false if qmode ≠ BRANCH
| x = 0 if qmode = BRANCH and qop_0 = 0
| x ≥ 0 if qmode = BRANCH and qop_0 = 1
The x ≥ 0 test is cheap (check upper bit). If the x = 0 test
is too
expensive here, then we could add an equality comparison to the
ALU
which returns -1 for true and 0 for false.
That looks good.
3. ALU
z := | x + y if qop = ADDSUB
| x ⋅ y if qop = MULT
| x ∧ y if qop = AND
| x ∨ y if qop = OR
| x ⊻ y if qop = XOR
| ⌊2^y x⌋ if qop = SHIFT
| ...
Note that the choice of x and y in SHIFT gives the maximum
flexibility
by not duplicating the stage 2 shift operation.
I like that!
4. Memory and IO
Optionally fetch from or store to local memory location z:
z' := z if qmode = ARITH or BRANCH
z' := M_z if qmode = FETCH
M_z := r_iy if qmode = STORE
so, with my ALU idea if qmode = STORE, M_z := z2. I don't know if
that is
possible - if not, it should at least run through the stage 2 +1
logic (by switching
the qmode=STORE test from y to x in stage 2).
If this doesn't make much sense, ignore it.
We only have 512 instructions, so this is a matter of code
compression and speed. Since the qmode = STORE instructions are going
to pass down the pipeline like any other instruction, we might as
well use the
logic.
5. Write-Back
r_iz := z' if iz ≠ 0 and qmode ≠ 11
We could add a bitwise-not here, and drop OR from the ALU, but I'm
running out of instruction bits.
Yeah, either way would work.
I like the design!
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)