Timothy Normand Miller wrote:
One of the design details that seems to be hard to present is the MIMD
architecture. At first glance, it looks like a SIMD architecture.
But all of you are right to point out that shader workloads are
primarily scalar.
I'd like to see some evidence for this.
Some years ago I wrote a bunch of demonstration GPU shader programs
in low level ARB/nVidia assembly. You can still find them at:
<http://cs.anu.edu.au/~Hugh.Fisher/3dstuff/lowlevel.html>
80% of the instructions are vector, only 20% scalar. The ratio of
scalar instructions increases very slightly with the more complex
shaders to perhaps 25%. The single most common instruction is DP,
Dot Product, of three or four operands from a vertex/color/matrix.
If you're using shaders to emulate the original fixed function
OpenGL/Direct3D pipelines, the ratio of SIMD to scalar will be
even higher.
OK, my shaders are old, and predate Shader Model 3.0 and widespread
use of high level languages. They still do what every 3D engine
spends most of its time doing: multiply vertices by a matrix, and
RGB/RGBA colors by other colors.
I'm happy to be proved wrong on this, but let's do so on the basis
of real world shaders written by graphics programmers.
We might include vector instructions if we find
them helpful, but they'll just translate into optimized use of the
scalar ALUs. We'll have three basic datatypes: float32, int32, and
uint32. (And int32 and uint32 will mostly be treated identically
except in cases of mult, div, and conversions.) All other data types
will simply be converted on the way in/out of some other resource.
Since we have a successful MIPS-like architecture in HQ, we'll just
extend this (conceptually). The MIPS architecture, lacking things
like carry/overflow/etc flags is just simply easier to implement.
Those instances where we end up requiring a couple extra instructions
are a worthwhile tradeoff to allow us to have only the register file
and program counter in the active thread context.
I'd suggest a MIPS with each floating point reg extended to 128 bits
as 4 x 32 / 2 x 64 floats with every add/etc instruction now being
SIMD. For you Intel folk, think of it as using SSE for everything.
--
Hugh Fisher
CECS, ANU
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)