On 2007-03-18, Timothy Normand Miller wrote:
> Yes, unfortunately. Also, we make a convention out of registering
> outputs (consistent with the way that these RAMs work), so we have to
> figure out how to do all our math from inputs inward.
So, if I understand this correctly, looking at your previous sketch, the
branch computation should be move from stage 2 (decode_wb) to stage 1
(fetch):
// Compute branch
assign branch_addr = indirect_branch ? data_a : ins_out[...];
assign do_branch = unconditional_branch || (branch && somefunc(data_b));
I think that can be done by passing data_a and data_b as input instead
of passing branch_addr and do_branch.
> >If I understand things right, the 5-stage design already exhibits some
> >exotic behaviour in register dependency: Even if we short-wire outputs
> >from the ALU when the output register of one instruction is the input
> >register to the next, there is still one stage before it is written to
> >the register file.
>
> True. What's we'll do is something like this: The inputs to the ALU
> (the ALU logic, not the ALU stage) will come through a MUX. One of
> the sources is the register file. The other sources are the
> registered outputs of the ALU stage or the registered outputs of the
> MEM stage. We'll need some clever sort of table that tracks which
> registers are where. Each operand to the ALU needs to be compared
> against only a few register numbers, so it's not too bad. The ALU may
> forward more numbers, but it has only one result. Same for the MEM
> stage. So we compare each input register to the ALU against two
> register numbers. If we can do it one stage in advance (indeed, I
> believe we can), even better.
Now, maybe I don't know how exactly over building blocks looks like, but
what strikes me is that in the spirit of minimising the hardware, we
let register 1 always be the output of the ALU, and register 2 always
the output of the memory stage. If the registers in the register file
is compatible with our stage output registers, then that should
eliminate the extra muxing, and we still have 29 registers left.
> I think someone suggested the idea of setting up background moves.
> That is, we don't move the words. We tell some other logic to do some
> number of words. This also frees us from having to deal with pipeline
> stalls, because we can just request to move N words (more than are
> perhaps available at the time), and it takes an unknown amount of
> time, and we pick it up later by checking status.
>
> We're going to end up with an intricate network of queues. Read
> request queues, read return queues, write queues, and queues into
> which we put requests to do reads and writes. :)
I'm also like this. Seems to be a cheap way to avoid tying up the CPU
with trivia. I think there are many specifics to work out, but we can
probably do the processing pipeline without worrying too much about
this.
> >Sorry,
> >
> >x =
> > if (sx == 1) then
> > return ~r[ix] + mx
> > else
> > return r[ix] + mx
> >
> >y =
> > if (iy != 0 && qmode != STORE) {
> > if (sy == 1)
> > return r[iy] << c
> > else
> > return r[iy] << c
> > } else
> > return c
>
> With the loss of the stage 2 math, what do these become now?
x = r[ix]
y = (iy == 0? c : r[iy])
If we follow the registered-output convention, I guess the 'y'
computation goes into the ALU. I can see I already made the mistake of
using the 'iy' if the next instruction in the 'iy == 0' computation.
I think your registered-output convention makes sense!
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)