You have some good points about some ops being redundant, with respect to a CMP instruction. But we want other instructions to produce condition codes, and we want them to be meaningful. Of course, GE, GT, etc. don't make any sense on the condition results of an ADD.
So, let's reduce this by removing the symmetrical comparisons. We need: - T - LT signed - LT unsigned - LE signed - LE unsigned - C - ~C - N - ~N - V - ~V - Z (EQ) - ~Z (NE) Doing this, I can only reduce it to 13, which still requires 4 bits. On another note, I was wondering if it would be useful to have two different kinds of conditional moves. One moves or does nothing, depending on the condition codes. The other moves one input operand or the other depending on the truth value of the code, kinda like the ?: operator in C. On Wed, Jun 6, 2012 at 6:44 PM, <[email protected]> wrote: > Hello ! > > > On Wed, 6 Jun 2012 15:35:56 -0400, Timothy Normand Miller wrote: >> >> Andre and I had debated over whether the shader pipeline should >> support condition codes (like most processors) or not (like MIPS). >> With MIPS, this reduces the amount of state that flows from one >> instruction to the next. But our GPU shader doesn't need any >> restrictions on this since we do away with pipeline hazards by having >> more round-robin threads than there are pipeline stages. After some >> discussion, I am entertaining the idea of not only having condition >> codes but also predicating every instruction's execution based on >> them, just like ARM's 32-bit ISA. Predicating like Itanium didn't >> work our well, because it has rather limited utility for such an >> elaborate system, but in the limited scope to which ARM has put it, it >> has worked out well. So, what conditions? I didn't bother looking up >> ARM's conditions. Instead, I pulled out an MC68000 manual. We need a >> 4-bit field in the instruction, plus a 5th bit to indicate whether or >> not the instruction changes the codes. >> >> ~C -- carry clear >> C -- carry set >> ~Z -- not equal >> Z -- equal >> ~V -- not overflow >> V -- overflow >> ~N -- non-negative >> N -- negative >> N&V + ~N&~V -- greater than or equal (for signed ints) >> N&~V + ~N&V -- less than (signed) >> ~C&~Z -- greater than (unsigned) >> Z + N&~V + ~N&V -- less than or equal (signed) >> C + Z -- less than or equal (unsigned) >> N&V&~Z + ~N&~V&~Z -- greater than (signed) > > > My issue with these conditions is that several, > in my totally humble opinion, are redundant. > For example, greater or less than, is pointless : > just swap the arguments. (OK sometimes we have > to use immediate data so it won't work as is) > > You want signed or unsigned comparisons ? > Just indicate it inside the comparison instruction, > and save some precious bits in the predication/condition codes > that are embedded in ALL instructions. > My drawings show that it only requires a pair of XOR > gates at the start of the add/sub unit, with no critical > datapath impact. > > So in the YASEP I have one carry and one zero flag, > but 4 comparison instructions that change the carry computation > (whether signed, unsigned, greater or lower). > Note that "Greater or equal" is equivalent to "not less than" > so only 4 instructions are necessary and the Carry condition > is negated on demand. > > Some of these considerations have led me to design > condition codes as described at the bottom of > http://yasep.org/#!doc/forms > * no less/greater : just check the MSB or a carry. That works for unsigned comparisons but not for signed comparisons. > * I use odd/even a lot too : just check the LSB. Implicitly or by masking and comparing to zero? > * Equal or Zero (register is cleared) : OR all the bits > of a register (can be "cached"). > * I also did a concession to the Carry flag and Zero flag > (last written value was zero) to ease some code. > > That uses 7 bits, including a 4-bits register number. > If you really are into barebones/gates/energy efficiency, > at least consider putting the comparisons as instructions, > not conditional flags. > > Look at the POWER architecture too, they use other neat tricks. > > >> T -- always true > > >> F -- always false (no-op) > > Can be useful for other things as well. > Don't waste it. > > >> FCMP will generate codes that have equivalent meaning. >> >> The hardware for this requires an 8-input MUX and an additional XOR >> for the output: >> >> D0 = 1 >> D1 = C >> D2 = Z >> D3 = V >> D4 = N >> D5 = N xor V >> D6 = C or Z >> D7 = (N xor V) or Z > > > The synthesizer will do as it wishes anyway... > > >> BTW, when I say that I recommend something like this, what I mean is >> that I suggest adding it as an option that can be enabled. At the >> very least, we'll want predicated move and we'll want min/max >> instructions to avoid branching. > > certainly. > > >> In a GPU, conditional branching is >> to be avoided, because it can lead to branch divergence. > > > I expect to read another post from you on this matter :-) > > >> On a completely unrelated note, when I was studying fuzzy logic back >> in the 90's, I came up with these formulas, which we'd probably never >> want to use in a hardware implementation. They're just helpful for >> doing algebra with min/max fuzzy logic. >> >> min(a,b) = (a + b - abs(a-b)) / 2 >> max(a,b) = (a + b + abs(a-b)) / 2 > > > Or we can do an "exchange" with 3 XORs :-P > > > > _______________________________________________ > Open-graphics mailing list > [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics > List service provided by Duskglow Consulting, LLC (www.duskglow.com) -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
