On 2009-10-27, Yann Guidon wrote:
> Mojn,
> 
> Petter Urkedal wrote:
> > 4.  I like the idea of adding minimum and maximum functions in the
> > instruction set; if we need them, that is.  They should not require much
> > logic.  But in this case, note that there is a difference between signed
> > and unsigned.  Do we want both?  On the other hand it only takes 2 to 3
> > cycles to compute any of these if my idea of the instruction set is
> > correct.
> 
> Contrary to Timothy's opinion, I think that
> UMIN/SMIN and UMAX/SMAX are worth it if you have enough bits in
> the opcode. They spare conditional jumps and similar control stuffs,
> and the logic is quite simple : take the carry out of the add/sub
> unit, XOR with one bit of the opcode (which discriminates
> MAX and MIN) and send the result to the "write enable" signal
> of the register set's write port.

Reusing the add/sub unit this way seems like a good idea.

I made a sketch of how to implement the integer part of the ALU by
combining unary modifiers on the inputs with a binary function on the
following stages:

    ======  ======  ======  ==================
    bop     moda    modb    final instructions
    ======  ======  ======  ==================
    and     not_x   not_y   and, andn, nor
    nand    not_x   not_y   nand, orn, or
    xor     [?]     not_y   xorn
    add     [?]     neg_x   sub [1]
    shift   s/u     neg_x   lsl, lsr, asl, asr [1]
    mul     [?]     [?]     mul
    min     s/u     not_xy  smin, smax, umin, umax [2]

where

    bop -- basic binary operation
    moda, modb -- modifier bits
    s/u -- selects between signed and unsigned
    not_x -- bitwise negation of the x operand
    not_y -- bitwise negation of the y operand
    not_xy -- bitwise negation of both operands
    neg_x -- inversion of the x operand

I've left out div and the upper bits of multiply for now.  We can split
up {shift, min} into {sshift, ushift, smin, umin} so that moda is only
used for a single purpose.  I opted for compressing the instruction word
over simplifying logic.

[1] The "neg" modifiers are in fact implemented as "not" at this stage,
with one flag passed down to the next stage.  For "add" the flag
connects to the carry bit.  For "shift" I'm sure we can find another
simple solution, esp since the effective width is only 5 bits plus sign.

[2] The "not" modifiers only applies to the inputs to the add/sub unit.
The actual write-back is taken from the original operands.

[?] It's not obvious what are the most useful functions here, but we can
get "not" for free if we don't have a strong preference.

The s/u modifier is just passed down to the binary unit.  For shift,
it's used for filling the upper bits of right shifts.  For min/max:

> I don't remember exactly but you can perform signed/unsigned
> integer MIN/MAX by XORing the MSB of the operands.

Let's see.  Assume we have the unsigned order.  The signed order is the
same if the operands are both non-negative or both negative.  If only
one is negative then the negative operand is the larger, according to
unsigned comparison.  So yes, the signed order is the 3-port XOR of the
signs of the operands and the unsigned order.

> This is pointless for FP. However if the critical datapath
> must be balanced, I would put the XORs at the same place,
> that is just before the ADD/SUB unit, instead of at the end
> (XOR of both operands instead of one result).

Maybe starting with the integer ordering for FP isn't that too far off.
Looking at IEEE 754,

  * +0, normalised and subnormal positive floats, and Inf are correctly
    ordered by the integer ordering,

  * -0, normalised and subnormal negative floats, and -Inf are ordered
    opposite to their integer order, and

  * we probably don't care about the ordering of NaN.

> Furthermore, I have seen (in old/past studies) that MIN/MAX
> are used a lot in signal processing and graphics, for
> clipping to coordinates and stuffs like that, so since
> it's so "simple" to implement, I usually include it
> in my CPUs' instruction sets, along with a complete ROP2
> (8 boolean operations : OR/ORN/NOR/AND/ANDN/NAND/XOR/XORN)

That sounds right to me.  (And every cycle spend on trivial operations
is waisting the floating point adder and multiplier, so it's important
to do what we can to optimise the trivial parts within what the
instruction width admits.)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to