On Wed, Oct 28, 2009 at 3:30 PM, Petter Urkedal <[email protected]> wrote: > On 2009-10-27, Andre Pouliot wrote: >> 2009/10/26 Petter Urkedal <[email protected]> >> > 1. The document list both signed and unsigned additive operations. >> > These are equivalent, except possibly for the flags. I suggest not to >> > differentiate signed and unsigned additive instructions, and instead >> > adapt the conventional semantics for carry and overflow flags. >> > >> >> In the operations the 2 are differentiated because of 2 reason. The flag, >> also there is going to be a bit field in the instruction to indicate the >> data type to the pipeline. > > Do we need such a bit? A large part of the instructions are > sign-agnostic, and I don't think it simplifies the logic since we can > always assign numbers 2n and 2n + 1 to those instructions which have > signedness, using the lower bit of the instruction number to indicate > signedness for just those instructions.
We may be able to partition the opcode into fields. That would simplify decoding, but it might leave some opcodes unassignable. Instead, we might want to try to simply order the opcodes so that they're relatively easy to decode. We'll have some logic that takes 8 bits in and gets quite a lot of bits out in the decode stage. > >> Maybe not that useful for the add but for the >> multiplication or other operations it can be critical. > > Only the upper 32 bits of the multiplication depend on the sign. I suggest that we, for now, support only a 32-bit multiply. This way, we can avoid this problem. If some developers later complain about being unable to get at the upper bits, we'll add two more instructions, signed upper half and unsigned upper half. So for now, we reserve opcodes. I wonder if anyone's keeping track of all these opcodes we're reserving. :) > >> > 2. Do we need to support extraction of the upper 32 bits of a >> > multiplication? If not, then mult and umult are also equivalent except >> > possibly for the flags. I think we can reuse the carry flag for >> > overflow of an unsigned multiplication in analogy to the additive >> > instructions. >> > >> >> The cost for supporting the selection of the upper or lower part of the >> multiplication is small since we can will probably use a signed 33 bits >> multiplier. Yes it does carry a cost to doing it that way but the multiplier >> will be reuse to do the shift operation in both direction. It should also >> support rotate instruction. > > We have 18 × 18 -> 36 multipliers at our disposal. A 32 × 32 -> 64 > multiplier takes 4 of these, whereas 32 × 32 -> 32 takes 3. Also the > final adder has the width of result. So, I think it costs enough that > we should carefully consider whether we need it. The same multipliers will be used for float, which means we need 24x24 -> 48 (unsigned) at least. That's still three. But if we're going to use the multipliers for shifts, we need the full 32x32 multiplier. That is unless you can come up with a clever way to do a right shift with a multiplier. If we think of the multiplier as a ring (in this case, it doesn't qualify as a galois field), we may be able to find a multiplier for any multiplicand that will give us any 32-bit product we want. The trouble is that what that multiplier needs to be could be so complicated to compute that it's not worth using the multiplier. I did some tinkering with 16-bit numbers, and I can find no single multiplier that works with all multiplicands that will result in the correct value for shifting the multiplicand right by 4 bits. > >> > 6. There may be more flags than we need. For integer division by >> > zero we can use the overflow, since that's the only way a division can >> > be out of range. For floating point division by zero, infinity seems >> > like a natural choice, but I recall there was some discussion on the >> > list some time ago about special requirement for the Inf and NaN >> > semantics for rendering so I'm not sure whether we need to differentiate >> > Inf and zero division for float. >> > >> >> The flag are the one present in the openCL and openGL document xonsidering >> the amount of register we will need to have adding those the cost is minimal >> for the flexibility. > > What do the specs say about how these are used? Should they be signals, > or is it sufficient that they are admissible as conditions for control > flow? For floating point is it sufficient to be able to verify whether > a register holds NaN or Inf and must division by zero be distinct from > Inf? I took from Hugh's comment that it doesn't matter. I'm guessing that when we convert float to fixed (uint8) for writing to the framebuffer, we can convert inf and nan to the max value. >> > 8. We are going with conventional flag-based branch instructions, >> > right? How many bits of address do we need in the branch instructions? >> > Do we need computed branches? >> > >> >> If we go with branch instruction with address in them we should be able to >> support around a 24 bits address(3*8 bits operand) it should be more than >> enough for any kernel who is running. > > 24 bits address will only be possible if we branch on flags, but as > mentioned, we can always have near and far jumps. Worst case, we can combine a conditional branch with an unconditional branch. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
