On 2007-03-18, Timothy Normand Miller wrote:
> We typically see the pipeline as: logic - register - logic - register, etc.
>
> Well, sometimes, the register is built into the logic in a way that
> won't let us insert extra logic. In particular, the block RAMs are
> synchronous. Read data is available one cycle after the address is
> asserted. That means we can't insert logic between the "address MUX"
> and the "pipeline register".
>
> We have a similar problem with the distributed RAM we'd use for the
> 32-entry register file. In this case, we CAN do asynchronous reads
> from the RAM, but it's inefficient. Moreover, the address mux is
> heavy-weight enough that we may want to limit the pipeline stage to
> just that RAM lookup.
So, the first two stages are fully saturated by two lookups. I see.
> generally have opcodes for things like A&~B, and "not" operations are
> generally "free".
I wouldn't worry about the instruction set, we can always have the first
assembler use a "sensible" subset, and exploit exotic combinations by
hand-coding critical loops in the last stage of development. (A smarter
assembler would be able to merge instructions.)
If I understand things right, the 5-stage design already exhibits some
exotic behaviour in register dependency: Even if we short-wire outputs
from the ALU when the output register of one instruction is the input
register to the next, there is still one stage before it is written to
the register file.
> > * Let the ALU run independent of whether we do arithemetic or memory
> > operations. In case of memory operations, the output of the ALU is
> > the address, thus giving us more powerful addressing "for free".
> > For store, the second operand is the stored value so we need to
> > weaken the addressing by forcing immediate mode for the second
> > argument to the ALU.
>
> MIPS does it this way. Addressing is "reg_val + immediate_constant".
> That's why they put the MEM stage right after the ALU. I've thought
> about the fact that we can support a "reg + reg" mode for reads, but
> I'm bothered by the lack of orthogonality. If we have the opcode
> "space", we can consider adding it. How useful would it be?
I the lack of orthogonality is due to STORE, since we need to enforce an
immediate operand, but checking for STORE is just a two-bit comparison.
Or do you refer to the sematic asymmetry between FETCH and STORE? My
philosophy is to make the hardware as simple and powerful as possible
and let the code-generating software deal with oddities.
> > * Conditional bitwise not:
> > (¬)^s x = | x if s = 0
> > | ¬ x if s = 1
>
> I get the unicode, but I don't know what you're meaning here. Can you
> rewrite to look like C code?
if (s = 0)
return x;
else
return ~x;
> > 1. Instruction Fetch
> >
> > Fetch one instruction of the form (qmode, qop, ix, sx, mx, iy, sy, iz,
> > c)
> > where
[...]
> Have a look at how MIPS breaks it up.
Ok.
> For the MEM/IO stage, there are four operations: Do nothing (forward
> to next stage), perform write of B to address from A, read from
> address A, or cause shunt from address in A to address in B. That's a
> few bits for an opcode.
So it's possible to move a value from one address to another in the same
instruction within the same block RAM? Do we need it?
> Finally, the WB stage... there's a matter of deciding whether or not
> something from the previous stage will be written back to a register.
> So, we need to encode a register number, and we also need to know
> whether or not to write back.
>
> MIPS reserves a reg zero as a "bit bucket". Writes are thrown away,
> and reads always return zero. We can use this implicitly for some
> instructions.
Great, then we just mandate iz = 0 for STORE.
> Some of these opcodes we'll want to encode explicitly -- a field is
> reserved in the instruction. For others, we'll encode them implicitly
> -- some other opcode is "decoded" somewhere to indicate the desired
> operation.
>
> > 2. Register Fetch, Operand Computation, and Branch Condition
> >
> > 2a. Operand Computation.
> >
> > Here we compute
> >
> > x := (¬)^sx r_ix + mx
> >
> > y := | (¬)^sy ⌊2^c r_iy⌋ if iy ≠ 0 and qmode ≠ STORE
> > | c if iy = 0 or qmode = STORE
>
> Can we get English or C translations of these?
Sorry,
x =
if (sx == 1) then
return ~r[ix] + mx
else
return r[ix] + mx
y =
if (iy != 0 && qmode != STORE) {
if (sy == 1)
return r[iy] << c
else
return r[iy] << c
} else
return c
> > I assume splitting up negation into bitwise-not and increment does
> > not add significantly to the complexity. If that is not the case,
> > we should drop the mx from the instruction and simplify x to
> >
> > x := (-1)^sx r_ix
>
> I think about the best we can do is combine add and sub into an addsub
> and make and, or, xor, and not sorta share some logic. Then there's
> shifting, comparisons, and a number of other things. In an FPGA, you
> get better results by not trying to be too efficient with your logic,
> while for an ASIC, you can get greater benefits from this sort of
> thing. It depends.
addsub sounds fine. (I was thinking the alternative was to have two
separate (and expensive) units for this.) I understand we are making
LEGO towers here.
> Note that any ASIC technology we select later is likely to be similar
> enough that we might as well just do it this way.
I presume we don't want to change too much, since that would increase
the risk of introducing bugs. Is it fair to argue that we have
significantly more space on the ASIC than on the FGPA, so whatever fits
in the latter fits in the former? That is, we'd mostly need to do stuff
like replacing standard units of one technology with that of the other?
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)