There was some misunderstanding about the current architecture. The section on the documentation on how it work wasn't present and the previous element not clear the text that follow will now be present in the document.
Explanation on how the architecture work. Each shader is constituted of multiple part. The basic unit is the ALU(Scalar). The alu is pipelined so it can executed multiple instruction at once. To prevent dependency each stage execute an instruction from a different kernel(Program). This allow to run multiple kernel at once on an ALU and prevent collision of data dependency. A kernel currently executing cannot send another instruction to the ALU. For each ALU we usually have k more kernel running than pipeline stage. The k more kernel help to saturate the ALU, because some kernel will be waiting either a barrier dependency or data access. Since the processing load is mostly the same, multiple ALU can be controlled by the same kernel. Each ALU run it own thread(data set). That mean that a shader will be running at once multiple threads controlled by multiple programs. The register could be represented as a matrix of [M x N] M being the kernel and N being the threads. Each element of the register containing O element of memory. Every cycle that mean there are N new threads executed by 1 kernel. There are P(pipeline stages) other kernel being executed at the same time in the ALU and 1 giving it's results. So that mean that we have normally N*P threads executing at once, P kernel being executed but P is smaller than M.
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
