Re: [Open-graphics] OGA2 Basic Shader Instructions

Timothy Normand Miller Thu, 29 Oct 2009 07:59:55 -0700

On Thu, Oct 29, 2009 at 7:24 AM, André Pouliot <[email protected]> wrote:
> Petter Urkedal wrote:
>> On 2009-10-27, Andre Pouliot wrote:
>>
>>> 2009/10/26 Petter Urkedal <[email protected]>
>>>
>>>> 1.  The document list both signed and unsigned additive operations.
>>>> These are equivalent, except possibly for the flags.  I suggest not to
>>>> differentiate signed and unsigned additive instructions, and instead
>>>> adapt the conventional semantics for carry and overflow flags.
>>>>
>>>>
>>> In the operations the 2 are differentiated because of 2 reason. The flag,
>>> also there is going to be a bit field in the instruction to indicate the
>>> data type to the pipeline.
>>>
>>
>> Do we need such a bit?  A large part of the instructions are
>> sign-agnostic, and I don't think it simplifies the logic since we can
>> always assign numbers 2n and 2n + 1 to those instructions which have
>> signedness, using the lower bit of the instruction number to indicate
>> signedness for just those instructions.
>>
> We will need it for making the difference between signed, unsigned and
> float for the alu pipeline.


This bit doesn't have to be explicit.  Some operations are agnostic.
For those that aren't, it can be implied by the opcode.  For instance,
we don't need four shift instructions.  We need LSR, ASR, and LSL.
There's no ASL.  So we define three opcodes.  For that matter, we may
never have to explicitly compute this "bit".  It depends on what we're
doing.

Which integer instructions actually have signedness?  ASR is the only
one I can think of.  Later, if we can get the top 32 of a multiply,
then there would be two separate instructions, depending on
signedness.  We'll also need to consider conversion instructions,
although for now, let's require that we do tests and branches.  The
compiler knows about signedness.  Here's how to do a signed conversion
with just a u2f instruction:

int x;  // initially
if (x is negative) {
    x = abs(x)
    x = u2f(x)
    set sign bit of x
} else {
    x = u2f(x)
}

I suggest that we do it exactly this way for now.

As for converting the other way... I think we can live with an f2u
instruction, although it'll be a bit more complex to deal with the
overflow and underflow cases.

>>
>>> Maybe not that useful for the add but for the
>>> multiplication or other operations it can be critical.
>>>
>>
>> Only the upper 32 bits of the multiplication depend on the sign.
>>
>>
>>>> 2.  Do we need to support extraction of the upper 32 bits of a
>>>> multiplication?  If not, then mult and umult are also equivalent except
>>>> possibly for the flags.  I think we can reuse the carry flag for
>>>> overflow of an unsigned multiplication in analogy to the additive
>>>> instructions.
>>>>
>>>>
>>> The cost for supporting the selection of the upper or lower part of the
>>> multiplication is small since we can will probably use a signed 33 bits
>>> multiplier. Yes it does carry a cost to doing it that way but the multiplier
>>> will be reuse to do the shift operation in both direction. It should also
>>> support rotate instruction.
>>>
>>
>> We have 18 × 18 -> 36 multipliers at our disposal.  A 32 × 32 -> 64
>> multiplier takes 4 of these, whereas 32 × 32 -> 32 takes 3.  Also the
>> final adder has the width of result.  So, I think it costs enough that
>> we should carefully consider whether we need it.
>>
> The multiplier serve as a barrel shift for the data alignment of the
> adder. It also serve for doing right and left shift. If we were only
> doing the 32 x32 multiplication doing a 32 bit result you would be right.

Oh, yeah.  I didn't even think about that.  You're right.  Way to
minimize the logic here!

>>
>>>> 6.  There may be more flags than we need.  For integer division by
>>>> zero we can use the overflow, since that's the only way a division can
>>>> be out of range.  For floating point division by zero, infinity seems
>>>> like a natural choice, but I recall there was some discussion on the
>>>> list some time ago about special requirement for the Inf and NaN
>>>> semantics for rendering so I'm not sure whether we need to differentiate
>>>> Inf and zero division for float.
>>>>
>>>>
>>> The flag are the one present in the openCL and openGL document xonsidering
>>> the amount of register we will need to have adding those the cost is minimal
>>> for the flexibility.
>>>
>>
>> What do the specs say about how these are used?  Should they be signals,
>> or is it sufficient that they are admissible as conditions for control
>> flow?  For floating point is it sufficient to be able to verify whether
>> a register holds NaN or Inf and must division by zero be distinct from
>> Inf?
>>
> I don't remember how it's written, but considering we want to play with
> many threads at once. I prefer to go with something simple that's easy
> to understand and that can be send to the instruction dispatcher.
>
> NaN and Inf are the 2 main corner case that can cause problem with
> floating point calculus. If we want to detect when a problem happen we
> must detect those. Zero and negative are also supported for float if we
> want to do branching based on those condition. Still most branch should
> be suppress with the min max instruction.
>
> Division by zero is a corner case that can be generated by the divisor.
> Inf source would be send from the alu. both of those unit are split for
> practical reason. I already have a good idea on how to make the alu, I
> have a schematic drawn and I should publish it soon in electronic
> version and cleaned. For the divisor there also many possibility, how to
> do it. I can's say which one look more interesting right now.
>
>>
>>>> 8.  We are going with conventional flag-based branch instructions,
>>>> right?  How many bits of address do we need in the branch instructions?
>>>> Do we need computed branches?
>>>>
>>>>
>>> If we go with branch instruction with address in them we should be able to
>>> support around a 24 bits address(3*8 bits operand) it should be more than
>>> enough for any kernel who is running.
>>>
>>
>> 24 bits address will only be possible if we branch on flags, but as
>> mentioned, we can always have near and far jumps.
>>
> We could support relative jump but will it be useful? If possible for
> the time being I prefer to have a short instruction list. Once we have
> something working, some more instruction will be considered because of
> need or for performance issue.

Since a conditional branch requires a source operand, that leaves only
16 bits for the address, which may not be enough.  Thus, I was
thinking that this could be relative.


-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] OGA2 Basic Shader Instructions

Reply via email to