There was some misunderstanding about the current architecture. The section
on the documentation on how it work wasn't present and the previous element
not clear the text that follow will now be present in the document.

Explanation on how the architecture work.
Each shader is constituted of multiple part. The basic unit is the
ALU(Scalar). The alu is pipelined so it can executed multiple instruction at
once. To prevent dependency each stage execute an instruction from a
different kernel(Program). This allow to run multiple kernel at once on an
ALU and prevent collision of data dependency. A kernel currently executing
cannot send another instruction to the ALU. For each ALU we usually have k
more kernel running than pipeline stage. The k more kernel help to saturate
the ALU, because some kernel will be waiting either a barrier dependency or
data access.

Since the processing load is mostly the same, multiple ALU can be controlled
by the same kernel. Each ALU run it own thread(data set). That mean that a
shader will be running at once multiple threads controlled by multiple
programs. The register could be represented as a matrix of [M x N] M being
the kernel and N being the threads. Each element of the register containing
O element of memory.

Every cycle that mean there are N new threads executed by 1 kernel. There
are P(pipeline stages) other kernel being executed at the same time in the
ALU and 1 giving it's results.

So that mean that we have normally N*P threads executing at once, P kernel
being executed but P is smaller than M.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to