On 8/20/06, Jon Smirl <[EMAIL PROTECTED]> wrote:
> But I don't know if you read my earlier post where I described the > theoretical 1000x throughput difference between fixed-function and > programmable designs. Something isn't matching here. I know NV/ATI are removing the fixed function pipeline from future designs. Maybe what is missing is that these GPUs are SIMD and can also dispact multiple instructions in parallel for each data stream. I have never seen the actual designs but I suspect they look like VLIW. The GPGPU people should know. I believe the NV7800 can process 24 data streams in parallel while executing 2 simultaneous instructions on each stream. For example each stream can do two FP mul/add instructions in a single clock. They make everything SIMD compatible (adjusting for loops and branches) in the compile phase. AFAIK the shaders are running at 500-600Mhz clocks.
I didn't really do a good job with my reasoning before, so here's a reexplanation: Fixed-function design, one pixel wide: All operations are separate macro pipeline stages, which are further subdivided. With 100 pipeline stages, 100 fragments are in flight in parallel. Due to pipelining, we could run at, say, 100MHz in an FPGA. Programmable shader design, one pixel wide: All operations are shader instructions, executed sequentially. For any instruction, only a small portion of the hardware is utilized. If the average number of instructions to compute a fragment is, say, 10, and they can be arranged to take only 10 cycles, then the GPU can only push out one pixel every 10 cycles. Due to more feedback in the architecture of the shader, it runs at only 50MHz in the FPGA. This is for relative comparison, so don't worry about the fact that they'd all run much faster in an ASIC. Anyhow, given this hypothetical example, the fixed-function design can theoretically process pixels 20 times as fast as the shader in a given architecture. This is why the shaders have to run at 500Mhz to fully utilize their memory bandwidth, while the fixed-function design can run at a much lower clock rate to get the same performance. And get much better power utilization. And remember that for 99% of what desktop users need, even OGA is overkill. A programmable shader is out of the ballpark. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
