Andre Pouliot wrote:
Yes all the vector ops will be emitted as scalar ops. The program do get
longer but scalar ops can be made shorter since we have less instruction to
support.
SIMD instructions are the same length as scalar instructions on any
sensible CPU: see MIPS and PowerPC. Four scalar MADD instructions
are going to be four times longer than the equivalent vector MADD.
(Uh, you *are* intending to use fixed length instructions, right?
Please tell me you're not thinking of variable length opcodes?)
We are doing only one fetch to execute on multiple data since most
of the data is controlled by the same program, we parallelize the data set
but we consider each result independently from the others executed at the
same time(different threads). The organisation of memory would be
essentially the same between a SIMD or our current architecture. Both
require 256 bits memory acces for a add operation and a 128 bits memory
write. Control is also the same the FPGA don't allow memory wider than 32
bits port access with a single memory block. Because of those requirement
either the current architecture or a SIMD one would require 2 memory bloc by
ALU. The connection is mostly wire no read ahead for the data.
OK, I see the point. But won't a SIMD design be much easier to
speed up when the port width increases to 64/128 bits in a future
version?
Those optimization were to improve 3D rendering and scientific processing on
a general purpose processor. You don't have the same requirement and
workload as a GPU. Different problem and context require different solution.
It's exactly the same requirements and workload! 3D vertices have to be
multipled by a 4x4 transform matrix. Doesn't matter whether it's on the
CPU or GPU.
--
Hugh Fisher
CECS, ANU
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)