Re: [Open-graphics] OGA2 Basic Shader Instructions

Petter Urkedal Mon, 26 Oct 2009 15:00:43 -0700

On 2009-10-26, Timothy Normand Miller wrote:
> Also, no flags.  I'm very much a fan of the MIPS approach because it
> doesn't create any of these extra inter-instruction dependencies
> through a narrow straw.


Yes.  I think read too much out of the Google doc.

> > 3.  Finally on this chain of though, f2u and f2i will be equivalent
> > except for overflow detection.  Again we can use the carry flag for the
> > unsigned case and overflow for the signed case.
> 
> Without any flags, we might need the extra instructions.  Or we could
> support only f2i and let software do a branch on the sign bit for
> clamping.

Ah, I though we'd do modulus 2^32 truncation (or whatever it's called),
but now that you mention it, clamping the result is much more useful for
graphics, and in that case I think it's best to keep the current spec.

> > 4.  I like the idea of adding minimum and maximum functions in the
> > instruction set; if we need them, that is.  They should not require much
> > logic.  But in this case, note that there is a difference between signed
> > and unsigned.  Do we want both?  On the other hand it only takes 2 to 3
> > cycles to compute any of these if my idea of the instruction set is
> > correct.
> 
> We CAN implement this with branches, which are a bit cheaper because
> we don't have a delay slot.  We'll softof automatically have both
> signed and unsigned variants of branch instructions in some cases.

I'm hoping we can support a fairly complete set of branch conditions.
If we have space in the instruction world, conditional write-back would
reduce min and max to two instructions, that seems questionable at the
moment.  Still, I'm not against min/max primitive if they're common.
 
> Will we need special float branch instructions?  We may need special
> ge/le instructions as usual since we can't recover those from the
> difference, but for floats and ints, the sign bits are in the same
> place and zero is always zero.

You are not thinking of two-operand compare-and-branch instructions, but
rather test instructions which store 0 or 1?  I outlined an option of
using bits 32 and 33 of the BRAM below before I realised what you meant.
 
> > 5.  To complete the set of shift instructions we need the arithmetic and
> > logic shift distinction (aka signed and unsigned shifts).
> 
> There are just three.  Andre wants to use the multiplier as a way to
> do shifts.  I like that idea.  However, the problem I see is that
> while it's easy to do left shifts, right shifts are another matter.
> If we have a 64-bit product, then it's just 32-shift and use the upper
> 32 bits.  But if we have only a 48-bit product, we can't do that.

I'm not convinced about using the multiplier.  Will we support constants
in the instruction word?  I though we would not, and then these shifts
would turn into two instructions.  Also, these are constant shifts only.
In HQ we did variable shifts and even treated negative and big exponents
(RHSs) correctly.  I don't think it costed us that much hardware.

> > 6.  There may be more flags than we need.  For integer division by
> > zero we can use the overflow, since that's the only way a division can
> > be out of range.  For floating point division by zero, infinity seems
> > like a natural choice, but I recall there was some discussion on the
> > list some time ago about special requirement for the Inf and NaN
> > semantics for rendering so I'm not sure whether we need to differentiate
> > Inf and zero division for float.
> 
> Yeah.  I hate flags.  Let's see if we can do without them.

Okay, no flags.

> > 7.  Assuming we've agreed to not run threads in dependent groups, do we
> > care about the loop unit?  Maybe a decrement-and-jump-if-nonzero would
> > do?
> 
> MAYBE.  But without it, it's only an overhead of two instructions.
> With the special instruction, it's one instruction.  And with loop
> hardware, maybe we can do it in zero.  But those one or two
> instructions only matter much if the loop is REALLY short.  And if it
> is, we can sometimes unroll.

Okay, I can see this may be of interest, since we also want to save
program space.

> > 8.  We are going with conventional flag-based branch instructions,
> > right?  How many bits of address do we need in the branch instructions?
> > Do we need computed branches?
> 
> (a) Not if we can avoid it.  Mind you, in this case, it's not so evil,
> but I still don't like it.  Convince me otherwise by showing me how it
> makes both the hardware simpler and the software faster.

No, I agree.  We should have space for at least one operand in the
instruction word for branches.  The flag-less ISA naturally supports the
negative and zero flags, but lacks the equivalent of carry and overflow
flags for integer.  Floating point have Inf and NaN, so if we can
condition on these, that'll pass for overflow.  Integer is more tricky.
We could utilise the extra width of the BRAM register, and put the two
flags in bit 32 and 33.

> (b) Ah, bits in address.  Good question.  It's fairly long.  We need
> to support lengthy shader programs.  Note that we no longer have a
> local program file; instead, we have a proper icache.

As mentioned above, we could differentiate long condition-less jumps and
short relative branches if we need to.

> (c) Someone else will have to tell us about computed branches.

(Just quoting this as a reminder to the readers.)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] OGA2 Basic Shader Instructions

Reply via email to