On 2009-10-27, Yann Guidon wrote:
> Mojn,
>
> Petter Urkedal wrote:
> > 4. I like the idea of adding minimum and maximum functions in the
> > instruction set; if we need them, that is. They should not require much
> > logic. But in this case, note that there is a difference between signed
> > and unsigned. Do we want both? On the other hand it only takes 2 to 3
> > cycles to compute any of these if my idea of the instruction set is
> > correct.
>
> Contrary to Timothy's opinion, I think that
> UMIN/SMIN and UMAX/SMAX are worth it if you have enough bits in
> the opcode. They spare conditional jumps and similar control stuffs,
> and the logic is quite simple : take the carry out of the add/sub
> unit, XOR with one bit of the opcode (which discriminates
> MAX and MIN) and send the result to the "write enable" signal
> of the register set's write port.
Reusing the add/sub unit this way seems like a good idea.
I made a sketch of how to implement the integer part of the ALU by
combining unary modifiers on the inputs with a binary function on the
following stages:
====== ====== ====== ==================
bop moda modb final instructions
====== ====== ====== ==================
and not_x not_y and, andn, nor
nand not_x not_y nand, orn, or
xor [?] not_y xorn
add [?] neg_x sub [1]
shift s/u neg_x lsl, lsr, asl, asr [1]
mul [?] [?] mul
min s/u not_xy smin, smax, umin, umax [2]
where
bop -- basic binary operation
moda, modb -- modifier bits
s/u -- selects between signed and unsigned
not_x -- bitwise negation of the x operand
not_y -- bitwise negation of the y operand
not_xy -- bitwise negation of both operands
neg_x -- inversion of the x operand
I've left out div and the upper bits of multiply for now. We can split
up {shift, min} into {sshift, ushift, smin, umin} so that moda is only
used for a single purpose. I opted for compressing the instruction word
over simplifying logic.
[1] The "neg" modifiers are in fact implemented as "not" at this stage,
with one flag passed down to the next stage. For "add" the flag
connects to the carry bit. For "shift" I'm sure we can find another
simple solution, esp since the effective width is only 5 bits plus sign.
[2] The "not" modifiers only applies to the inputs to the add/sub unit.
The actual write-back is taken from the original operands.
[?] It's not obvious what are the most useful functions here, but we can
get "not" for free if we don't have a strong preference.
The s/u modifier is just passed down to the binary unit. For shift,
it's used for filling the upper bits of right shifts. For min/max:
> I don't remember exactly but you can perform signed/unsigned
> integer MIN/MAX by XORing the MSB of the operands.
Let's see. Assume we have the unsigned order. The signed order is the
same if the operands are both non-negative or both negative. If only
one is negative then the negative operand is the larger, according to
unsigned comparison. So yes, the signed order is the 3-port XOR of the
signs of the operands and the unsigned order.
> This is pointless for FP. However if the critical datapath
> must be balanced, I would put the XORs at the same place,
> that is just before the ADD/SUB unit, instead of at the end
> (XOR of both operands instead of one result).
Maybe starting with the integer ordering for FP isn't that too far off.
Looking at IEEE 754,
* +0, normalised and subnormal positive floats, and Inf are correctly
ordered by the integer ordering,
* -0, normalised and subnormal negative floats, and -Inf are ordered
opposite to their integer order, and
* we probably don't care about the ordering of NaN.
> Furthermore, I have seen (in old/past studies) that MIN/MAX
> are used a lot in signal processing and graphics, for
> clipping to coordinates and stuffs like that, so since
> it's so "simple" to implement, I usually include it
> in my CPUs' instruction sets, along with a complete ROP2
> (8 boolean operations : OR/ORN/NOR/AND/ANDN/NAND/XOR/XORN)
That sounds right to me. (And every cycle spend on trivial operations
is waisting the floating point adder and multiplier, so it's important
to do what we can to optimise the trivial parts within what the
instruction width admits.)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)