2012/5/28 Andre Pouliot <[email protected]>:
> Hi Xiaohan,
>
> 2012/5/28 Ma, Xiaohan <[email protected]>:
...
>>>> Each step of the pipeline have it's own thread. For example for a
>>>> pipeline with a depth of 8 you would have more than 8 program running.
>>
>> But still some of the threads need to wait for other threads if there's data
>> dependency. Can't see the benefits here.
>>
> For shader program the data dependency is low. That's why there is
> also more than 8 thread, if one lock and wait for some data you can
> still use the pipeline at 100% efficiency. You don't have an empty
> time slot some other program use it.

Be carefull this technic did not scale at all. Intel use only 2
threads on P4, because caches efficiency decrease with the number of
threads. If there is too much thread, you will have only cache miss.
AMD instead of duplicate decoder and register bank, duplicate also the
pipeline, so one of the programme could have the double of ressource
available. It's like you completly separate decoding stage and
pipeline.

>
>>>> The 32 thread could be controlled by 8 different program.
>>
>> How to handle control dependencies (or right now we just use ARM-like
>> conditional execution)?
>
> There's some capacity for control but the instruction set was tough to
> reduce the need for control. Not a lot of branch. There was some
> instruction defined to find the max or min between two value and store
> that value without a need for a register with flag in it. A few other
> stuff like that. It was mostly geared toward executing a program with
> no branch.

http://code.entropywave.com/orc/ could be a starting point for
inspiration of instruction set.
>
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to