Le 2013-03-18 15:38, Nicolas Boulay a écrit :
2013/3/18 Timothy Normand Miller <[email protected]>
For CPUs, 32 was found to be optimal by some paper published back in
the early 90s, I think. 16 was a second best, while 64 had
diminishing returns. Im not sure how this applies to GPUs,
however. One problem with doubling the RF size is that you slow it
down.
This number came without superpipelining and superscalaire in mind.
not to mention renamed, out-of-order architectures...
Unrolling loops is a good way to avoid instruction for the control
flow, and removing dependencies between instructions but this need at
least twice the number of register.
From memory of the F-CPU design, if we consider a constant stream
of computation instructions with 2 reads 1 writes that can not overlap
(the dependencies are loose and the code is unrolled to fit the
pipeline),
32 registers => up to 11 instructions "in flight" in the pipeline at a
time
without dependencies. That means a 5-deep, 2-wide superpipeline.
64 was chosen for F-CPU because the pipeline could be pushed to 3-wide
by 7 deep
or 5 deep * 4 wide. That was the end of the 90s :-)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)