Re: [Open-graphics] The Central Processor

Petter Urkedal Sun, 01 Apr 2007 04:49:16 -0700

Thanks to Paul, Nicholas and Tim for the feedback.

I have a new verion which I'd like to check into the repository.  The
register-fetch is the most changed, esp to handle register-forwarding.
I'll mention some of the other changes below.  The new version is at
http://www.eideticdew.org/~urkedal/ogp/, but I'll remove the special
treatment of r31.

On 2007-03-28, Timothy Normand Miller wrote:
> Now that you mention it, simple negation is rather expensive.  Less so
> than an add or subtract, but not that much less.  You for sure would
> not want to feed the result of that into a barrel shifter.  Too much
> delay.
> 
> However, consider this:
> 
> input [31:0] x;
> input [5:0] shift;
> output [31:0] z;
> wire [63:0] y = {x, 31'b0};
> assign z = y >> (shift + 32);
> 
> The shift+32 is cheap, because all you're really doing is inverting
> the high bit of the shift amount.  The only question is how much more
> expensive is a 64-bit shifter than a 32-bit shifter.

Taking that argument into another direction... Since adding constants to
the RHS of a shift operator is free, we can turn an expensive
2-complement into a cheap 1-complement, so we have a cheap way to get an
almost correct shift operator:

    case ...
        `QOP_SHL:  res_o <= y[31]? x[31:1] >>> ~y[4:0] : x << y[4:0];

The just almost correct, since I've taken Paul's suggestion of ignoring
oddities for long-shifts, but it correctly handles signed RHS registers.

We can leave it open whether we want arithmetic shift, logical shift, or
both.  If we go with all-signed, then arithmetic shift is the most
logical choice.  On the other hand, shifts are often used to select
bitfields in a word, and logical shifts allows writing x >> SHIFT
instead of (x >> SHIFT) & MASK when selecting an unsigned uppermost
bitfield (though, x >> SHIFT would be correct for a signed uppermost
bitfield).  See also below for my suggestion of a byte-shuffeling move
instruction.

> This is why I added 2 to the PC (well, next_pc or whatever it was),
> and passed that down the pipeline.  This way, what's stored is the
> branch target.
> 
> Let's say, however, that as part of the feedback from REG to FETCH, we
> pass not just the register value that contains the branch target but
> also an offset (imm only?   or selectably reg value?).  Then the
> address we branch to is the sum of two numbers, so we can add in an
> offset.  This way, the return address stored can be 1 less than where
> we want to go, so the "RET" opcode actually includes an offset.
> 
> The problem is that we end up with the the program file load address
> being fed from not just MUXes but also an adder, which is too many
> levels of logic.
> 
> The truth is that adding a constant of 2 is expensive but not so
> horribly expensive that we shouldn't consider it.

What I've done here is that I've put the PC increment into the ALU,
since it needs to go though that anyway, but currently it is separate
from the ALU ADDSUB unit.  If some extra muxing here does not cause a
timing bottleneck, then it's easy to change the ALU to re-use the ADDSUB
unit for the PC increment.

> >That is, the bitbucket is special only when use as an X operand, to it
> >can still be used for temporary values.  But, I think I've made the
> >mistake of putting a computation between regfile[ix] and it's
> >registration.  If I've understod your previous comments, we should
> >instead mux after the lookup:
> >
> >    always ...
> >        x_o_if_reg <= regfile[ix]
> >        x_o_if_pc <= pc + 1
> >    assign x_o = ... x_o_if_reg ... x_o_if_pc ... 0
> >
> >Right?
> 
> Yes.
> 
> >
> >So, we'll probably be better off storing 0 in the bit-bucket.
> 
> R0 should be the bitbucket in all cases.  It's very important to the
> simplicity of our instruction set to be able to always rely on R0
> being zero on reads and being a place where we can throw away results.

If we add a move instruction, then we don't need to treat any register
specially from the CPU's point of view.  We can make the move instruction
a bit more powerful, as well:

Since we may be working with pixels, I suppose it could be useful to
allow the move instruction to optionally shuffle around the bytes, and
optionally mask out all but the lower byte/word.  The move instruction
does not use the x-register bits of the instruction, so we have 5 free
bits.  What I currently have in the code is

// Byte-shuffeling is used only for move instructions.  It reuses the bits of
// the x-register to specify what to shuffle.
wire[4:0] shuffle = insn[`X_REG_BITS];
wire[31:0] yw =  // Bit 0: swap words
    shuffle[0]? {y[15:0], y[31:16]} : y;
wire[31:0] yb =  // Bit 1: swap bytes
    shuffle[1]? {shuffle[2] & yw[23:16], yw[31:24], yw[7:0], yw[15:8]} : yw;
wire[31:0] ys =  // Bits 2..3: Filter all but low byte/word:
    {{16{shuffle[3]}} & yb[31:16], {8{shuffle[2]}} & yb[15:8], yb[7:0]};

[... and, under case operator ...]

            `QOP_MOVE: res_o <= ys;     // move with optional byte-shuffle

The y-register bits are set to binary 01100 for plain moves.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] The Central Processor

Reply via email to