Re: [Open-graphics] OGA2 Basic Shader Instructions

Petter Urkedal Wed, 28 Oct 2009 01:45:18 -0700

On 2009-10-27, Timothy Normand Miller wrote:
> On Mon, Oct 26, 2009 at 5:45 PM, Petter Urkedal <[email protected]> wrote:
> >> > 4.  I like the idea of adding minimum and maximum functions in the
> >> > instruction set; if we need them, that is.  They should not require much
> >> > logic.  But in this case, note that there is a difference between signed
> >> > and unsigned.  Do we want both?  On the other hand it only takes 2 to 3
> >> > cycles to compute any of these if my idea of the instruction set is
> >> > correct.
> >>
> > On 2009-10-26, Timothy Normand Miller wrote:
> >> We CAN implement this with branches, which are a bit cheaper because
> >> we don't have a delay slot.  We'll softof automatically have both
> >> signed and unsigned variants of branch instructions in some cases.
> >
> > I'm hoping we can support a fairly complete set of branch conditions.
> > If we have space in the instruction world, conditional write-back would
> > reduce min and max to two instructions, that seems questionable at the
> > moment.  Still, I'm not against min/max primitive if they're common.
> 
> We can either have a write-back flag, or we can make r0 the bit bucket.


I was really thinking of conditioned instructions, not just a fixed
throw-away of the result.  E.g. computation of min as

        ;; Computation of r1 := min {r1, r2}
        sub r1, r2, r0
        move r2, r1 if_neg r0

Though without flags, the conditional modifier is only applicable to
unary instructions, which severely limits its usability.

> >> Will we need special float branch instructions?  We may need special
> >> ge/le instructions as usual since we can't recover those from the
> >> difference, but for floats and ints, the sign bits are in the same
> >> place and zero is always zero.
> >
> > You are not thinking of two-operand compare-and-branch instructions, but
> > rather test instructions which store 0 or 1?  I outlined an option of
> > using bits 32 and 33 of the BRAM below before I realised what you meant.
> 
> Well, actually, that's an excellent idea.  The sign bit is only one
> bit, so we don't need anything special for that.  When doing
> write-back we can check to see if the value is zero and set a flag
> bit.  We could also store whether or not the computation was the
> result of an overflow.  This would be like processor flags, but one
> set for each register.  (And since we get 4 bits free, why not use
> them!)
> 
> The main problem with the overflow bit is that it would get lost on a
> context switch.  Right now, we can't do context switches, so it
> doesn't matter.  But what if we change our minds in the future?

By context switches, do you mean writing to main memory?  That would
require logic to recode 36 bit words into 32 bit words, e.g. by shifting
the top four bits into a 32 bit register and writing it between every 8
normal writes.  Though, this may bring up other issues with our memory
cache and dealing with memory alignment.

> Having a zero flag is a good idea since it's easy to recompute every
> time.  I'm not bothered by having a few extra compare instructions.
> Most will just rely on subtract, but some signed ones will require
> special compares, which are know about from the MIPS instruction set.
> 
> Here are some ideas for "summary flags" that make use of the extra
> bits in the register word:
> - zero
> - fp infinity
> - fp NaN

Contrary to carry and overflow, these are computable from the lower 32
bits.  So, this would be purely an optimisation, but do we need it
considering the depth of the pipeline?

Here is another thought.  We don't need write-back for branches.  If 8
bits suffice for short jumps, that leaves room for two operands.  Thus,
we could make branch instructions which share logic with the arithmetic
instructions, up to the point where the write-back or jump happens.
However, but the implied functionality seems redundant, and this
approach may leave us no bits for selecting the condition on which to
branch.  So instead they could be based solely on subtraction:

    ifeq ri, rj, target
    ifneq ri, rj, target
    ifule ri, rj, target  ; if (uint)ri ≤ (uint)rj then jump_to target
    ifult ri, rj, target
    ifsle ri, rj, target
    ifslt ri, rj, target

The latter four instructions will coincide with min and max logic.  The
idea here is to test and act on the carry and overflow flags on the same
cycle they are generated, so that we don't need to save them.  If 8 bit
target is too narrow, the compiler can generate

    ifule ri, rj, l0
    jump target
l0:

which is still the same number of instructions (and fewer cycles on the
average) as

    sub ri, rj, rk
    ifnpos rk, taget

If we don't have a register fixed to zero, we may also want

    ifzero ri, target
    ifnzero ri, target  ; nonzero
    ifnpos ri, target   ; non-positive
    ifneg ri, target
    ifpos ri, target
    ifnneg ri, target

Moreover, these can have 16 bit relative targets.  We probably still
want

    jump far_target, wb  ; used for calls
    jump far_target

> I'm not sure what else.  We have sign for free.  We can live without
> overflow.  What else might we want to know about quickly?

I think not having overflow basically means that we can't easily branch
on a comparison where the difference of the operands exceed the signed
range.  That can be a problem if the shader specification require that
"if (x <= y)" does the right thing for the full 32 bit range of x and y.
The solution may be as you have mentioned to have dedicated compare
instructions.  After all one rarely use both the result and the
overflow/carry flag of the same subtraction except for implementing
multi-word integers.

> > Also, these are constant shifts only.
> > In HQ we did variable shifts and even treated negative and big exponents
> > (RHSs) correctly.  I don't think it costed us that much hardware.
> 
> Why do they have to be constant shifts?  We can use a decoder to turn
> a binary value into a one-hot and make that the multiplier.  The
> decoder plus the multiplier will be smaller than a barrel shifter,
> especially considering that the multiplier has multiple uses.

My mistake, I though the suggestion was to use the multiplication
instruction as is.  Still, how does this compare to sharing the logic
with rotate and the other shift?  I guess an n-cold is a easy to make as
a 1-hot, so given the (sign-independent) rot-result, it remains to AND
it with either an |y|-cold upper or lower mask, depending on the sign of
y.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] OGA2 Basic Shader Instructions

Reply via email to