> The other option would be to
> work some magic trying to fuse threads into vector ops. when we
> see multiple scalar instructions, but this seems overly complex.

My understanding is that this is effectively how much modern shader hardware 
works[1]. While you have many threads, many/most are executing the same code 
fragment. As long as your vector unit has per-lane enable selects then a 
single instruction steal/vector unit effectively executes a set of "scalar" 
threads in parallel.  For example an if-then-else block is implemented by 
executing both the "then" and "else" clauses as a single linear block and 
using the lane enable toggles to discard the unwanted results. It's no 
coincidence that the functionality exposed by OpenGL/OpenCL fits this 
technique much better than arbitrary application code you might find on a 
general purpose CPU.

Paul

[1] I'm pretty sure at least ATI and Larabee do this. Not sure about nVidia.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to