Timothy Normand Miller wrote:
But you've also convinced me further of my other argument. If we add
vector instructions, we'll make 60% of the instructions optimistically
4x faster.
100/(60/4 + 40) = 1.82
So we get an 82% speedup on a single thread. That's hardly worth the
cost of quadrupling the hardware requirement. With an area
constraint, we'll end up getting 45% the throughput we would have had
with scalar engines. Hell, even if they're only 3x larger, we still
have a 40% performance loss. There's no way they'll be less than 2x
larger, in which case, we have about a 10% performance hit.
Thanks for explaining the hardware issues in detail to this
software guy. OK, I now understand that a bunch of scalar ALUs
in parallel can deliver better overall throughput than a
smaller number of SIMD for this kind of workload.
So, we are (most likely)to have a SIMD instruction set (not
all SIMD) which runs on parallel scalar threads. Any single
SIMD instruction
MUL R1.rgba, R2.rgba -> R3.rgba
executes on each operand in sequence
MUL R1.r, R2.r -> R3.r
MUL R1.g, R2.g -> R3.g
MUL R1.b, R2.b -> R3.b
MUL R1.a, R2.a -> R3.a
but overall performance doesn't suffer, because there can be
N other threads executing the same shader code in parallel
for different vertices/fragments.
--
Hugh Fisher
CECS, ANU
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)