Hello Nicolas,

On Tue, 26 Jun 2012 10:55:53 +0200, Nicolas Boulay wrote:
GPU instruction set did not need to be as compact as a cpu instruction
set. Shader code are small.

So you can add few fields present on all instruction, like itanium
does. You could target 64 bits instructions, instead of 32 bits.

I remember that Tim said he wanted the instruction word size to be undetermined, yet,
and not necessarily a power of two, so he could design a word structure
that suits the actual needs of the pipeline. So, as far as I understand,
size does not matter, but the smaller the better.

Beside that, some people did not like conditionnal instruction at all
(it was a Linus Torvalds post), because branch prediction is easy to
do quite right, but data prediction is need for conditionnal code and
it's much harder.

http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

Out of order cpu looks like an horrible big mess, but if you keep it
raisonnable, you can save a lot of cpu cycle without too much hardware
(branch prediction, few register renaming)

That's another interesting post :-) however it has only a limited relevance here :

- CMOV is only tested by Linus on x86. Everybody agrees that x86 sucks.
     where are the tests with ARM and Itanic ?
 - Linux shows that CMOV performance varies with the microarchitecture,
    so in itself CMOV is not "bad or good". It depends.
 - you are speaking about a general-purpose, application-class CPU
    while OG is a GPU (not even GPGPU)
 - The cost of branch prediction in OG is not the same, since the
    execution path is totally different (a single instruction stream
    feeds several execution pipelines)
 - "(and 95+% of all branches are [predictable])" :
     - at what cost ? (logic complexity, surface, big lookup
           tables of past branches and addresses etc.)
     - is flushing the pipeline a good idea ?
     - for what kinds of datasets ?
 - The title says it all : "cmov-a-bad-idea-on-out-of-order-cpus"
    (note the "on ooo cpus")
    it's out of question to do OoO in OG, as Tim said. Its pipeline
    is about thoughput, not latency. Just stream all the data through
     many parallel units and process them.

Could you find programming and architectural documentation about GPUs ?

Nicolas
Yann

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to