On Mon, Oct 26, 2009 at 4:06 PM, Petter Urkedal <[email protected]> wrote:
> I've had a closer look at the basic instructions listed at
> http://docs.google.com/View?id=dfsp4qpd_41dtrrskfb#Operation_to_support_in_shader_8117638312455733
> to which I have some comments and questions.
>
> 1.  The document list both signed and unsigned additive operations.
> These are equivalent, except possibly for the flags.  I suggest not to
> differentiate signed and unsigned additive instructions, and instead
> adapt the conventional semantics for carry and overflow flags.

Are they listed as separate instructions?  Or just different
"datatypes"?  In any case, I agree with you.  Signed for integers only
matters for multiplication. (There are no divide instructions.)

> 2.  Do we need to support extraction of the upper 32 bits of a
> multiplication?  If not, then mult and umult are also equivalent except
> possibly for the flags.  I think we can reuse the carry flag for
> overflow of an unsigned multiplication in analogy to the additive
> instructions.

Good question.  I'm pretty sure we can live without it.  For floats,
we need 48-bit products (which we shift).  Maybe it'll save us some
logic to just not support 64-bit multiplies.

Also, no flags.  I'm very much a fan of the MIPS approach because it
doesn't create any of these extra inter-instruction dependencies
through a narrow straw.

> 3.  Finally on this chain of though, f2u and f2i will be equivalent
> except for overflow detection.  Again we can use the carry flag for the
> unsigned case and overflow for the signed case.

Without any flags, we might need the extra instructions.  Or we could
support only f2i and let software do a branch on the sign bit for
clamping.

> 4.  I like the idea of adding minimum and maximum functions in the
> instruction set; if we need them, that is.  They should not require much
> logic.  But in this case, note that there is a difference between signed
> and unsigned.  Do we want both?  On the other hand it only takes 2 to 3
> cycles to compute any of these if my idea of the instruction set is
> correct.

We CAN implement this with branches, which are a bit cheaper because
we don't have a delay slot.  We'll softof automatically have both
signed and unsigned variants of branch instructions in some cases.

Will we need special float branch instructions?  We may need special
ge/le instructions as usual since we can't recover those from the
difference, but for floats and ints, the sign bits are in the same
place and zero is always zero.

> 5.  To complete the set of shift instructions we need the arithmetic and
> logic shift distinction (aka signed and unsigned shifts).

There are just three.  Andre wants to use the multiplier as a way to
do shifts.  I like that idea.  However, the problem I see is that
while it's easy to do left shifts, right shifts are another matter.
If we have a 64-bit product, then it's just 32-shift and use the upper
32 bits.  But if we have only a 48-bit product, we can't do that.

>
> 6.  There may be more flags than we need.  For integer division by
> zero we can use the overflow, since that's the only way a division can
> be out of range.  For floating point division by zero, infinity seems
> like a natural choice, but I recall there was some discussion on the
> list some time ago about special requirement for the Inf and NaN
> semantics for rendering so I'm not sure whether we need to differentiate
> Inf and zero division for float.

Yeah.  I hate flags.  Let's see if we can do without them.

> 7.  Assuming we've agreed to not run threads in dependent groups, do we
> care about the loop unit?  Maybe a decrement-and-jump-if-nonzero would
> do?

MAYBE.  But without it, it's only an overhead of two instructions.
With the special instruction, it's one instruction.  And with loop
hardware, maybe we can do it in zero.  But those one or two
instructions only matter much if the loop is REALLY short.  And if it
is, we can sometimes unroll.

> 8.  We are going with conventional flag-based branch instructions,
> right?  How many bits of address do we need in the branch instructions?
> Do we need computed branches?

(a) Not if we can avoid it.  Mind you, in this case, it's not so evil,
but I still don't like it.  Convince me otherwise by showing me how it
makes both the hardware simpler and the software faster.
(b) Ah, bits in address.  Good question.  It's fairly long.  We need
to support lengthy shader programs.  Note that we no longer have a
local program file; instead, we have a proper icache.
(c) Someone else will have to tell us about computed branches.


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to