On Mon, Oct 26, 2009 at 4:06 PM, Petter Urkedal <[email protected]> wrote: > I've had a closer look at the basic instructions listed at > http://docs.google.com/View?id=dfsp4qpd_41dtrrskfb#Operation_to_support_in_shader_8117638312455733 > to which I have some comments and questions. > > 1. The document list both signed and unsigned additive operations. > These are equivalent, except possibly for the flags. I suggest not to > differentiate signed and unsigned additive instructions, and instead > adapt the conventional semantics for carry and overflow flags.
Are they listed as separate instructions? Or just different "datatypes"? In any case, I agree with you. Signed for integers only matters for multiplication. (There are no divide instructions.) > 2. Do we need to support extraction of the upper 32 bits of a > multiplication? If not, then mult and umult are also equivalent except > possibly for the flags. I think we can reuse the carry flag for > overflow of an unsigned multiplication in analogy to the additive > instructions. Good question. I'm pretty sure we can live without it. For floats, we need 48-bit products (which we shift). Maybe it'll save us some logic to just not support 64-bit multiplies. Also, no flags. I'm very much a fan of the MIPS approach because it doesn't create any of these extra inter-instruction dependencies through a narrow straw. > 3. Finally on this chain of though, f2u and f2i will be equivalent > except for overflow detection. Again we can use the carry flag for the > unsigned case and overflow for the signed case. Without any flags, we might need the extra instructions. Or we could support only f2i and let software do a branch on the sign bit for clamping. > 4. I like the idea of adding minimum and maximum functions in the > instruction set; if we need them, that is. They should not require much > logic. But in this case, note that there is a difference between signed > and unsigned. Do we want both? On the other hand it only takes 2 to 3 > cycles to compute any of these if my idea of the instruction set is > correct. We CAN implement this with branches, which are a bit cheaper because we don't have a delay slot. We'll softof automatically have both signed and unsigned variants of branch instructions in some cases. Will we need special float branch instructions? We may need special ge/le instructions as usual since we can't recover those from the difference, but for floats and ints, the sign bits are in the same place and zero is always zero. > 5. To complete the set of shift instructions we need the arithmetic and > logic shift distinction (aka signed and unsigned shifts). There are just three. Andre wants to use the multiplier as a way to do shifts. I like that idea. However, the problem I see is that while it's easy to do left shifts, right shifts are another matter. If we have a 64-bit product, then it's just 32-shift and use the upper 32 bits. But if we have only a 48-bit product, we can't do that. > > 6. There may be more flags than we need. For integer division by > zero we can use the overflow, since that's the only way a division can > be out of range. For floating point division by zero, infinity seems > like a natural choice, but I recall there was some discussion on the > list some time ago about special requirement for the Inf and NaN > semantics for rendering so I'm not sure whether we need to differentiate > Inf and zero division for float. Yeah. I hate flags. Let's see if we can do without them. > 7. Assuming we've agreed to not run threads in dependent groups, do we > care about the loop unit? Maybe a decrement-and-jump-if-nonzero would > do? MAYBE. But without it, it's only an overhead of two instructions. With the special instruction, it's one instruction. And with loop hardware, maybe we can do it in zero. But those one or two instructions only matter much if the loop is REALLY short. And if it is, we can sometimes unroll. > 8. We are going with conventional flag-based branch instructions, > right? How many bits of address do we need in the branch instructions? > Do we need computed branches? (a) Not if we can avoid it. Mind you, in this case, it's not so evil, but I still don't like it. Convince me otherwise by showing me how it makes both the hardware simpler and the software faster. (b) Ah, bits in address. Good question. It's fairly long. We need to support lengthy shader programs. Note that we no longer have a local program file; instead, we have a proper icache. (c) Someone else will have to tell us about computed branches. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
