On 8/20/06, Jon Smirl <[EMAIL PROTECTED]> wrote:

> But I don't know if you read my earlier post where I described the
> theoretical 1000x throughput difference between fixed-function and
> programmable designs.

Something isn't matching here. I know NV/ATI are removing the fixed
function pipeline from future designs. Maybe what is missing is that
these GPUs are SIMD and can also dispact multiple instructions in
parallel for each data stream. I have never seen the actual designs
but I suspect they look like VLIW. The GPGPU people should know.

I believe the NV7800 can process 24 data streams in parallel while
executing 2 simultaneous instructions on each stream. For example each
stream can do two FP mul/add instructions in a single clock. They make
everything SIMD compatible (adjusting for loops and branches) in the
compile phase. AFAIK the shaders are running at 500-600Mhz clocks.

I didn't really do a good job with my reasoning before, so here's a
reexplanation:

Fixed-function design, one pixel wide:  All operations are separate
macro pipeline stages, which are further subdivided.  With 100
pipeline stages, 100 fragments are in flight in parallel.  Due to
pipelining, we could run at, say, 100MHz in an FPGA.

Programmable shader design, one pixel wide:  All operations are shader
instructions, executed sequentially.  For any instruction, only a
small portion of the hardware is utilized.  If the average number of
instructions to compute a fragment is, say, 10, and they can be
arranged to take only 10 cycles, then the GPU can only push out one
pixel every 10 cycles.  Due to more feedback in the architecture of
the shader, it runs at only 50MHz in the FPGA.

This is for relative comparison, so don't worry about the fact that
they'd all run much faster in an ASIC.

Anyhow, given this hypothetical example, the fixed-function design can
theoretically process pixels 20 times as fast as the shader in a given
architecture.

This is why the shaders have to run at 500Mhz to fully utilize their
memory bandwidth, while the fixed-function design can run at a much
lower clock rate to get the same performance.  And get much better
power utilization.

And remember that for 99% of what desktop users need, even OGA is
overkill.  A programmable shader is out of the ballpark.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to