Re: [Open-graphics] Re: Fwd: Direction of graphics card design

James Richard Tyrer Tue, 22 Aug 2006 21:42:42 -0700

Timothy Miller wrote:

On 8/20/06, Jon Smirl <[EMAIL PROTECTED]> wrote:
But I don't know if you read my earlier post where I describedthe theoretical 1000x throughput difference betweenfixed-function and programmable designs.
Something isn't matching here. I know NV/ATI are removing the fixed
function pipeline from future designs. Maybe what is missing isthat these GPUs are SIMD and can also dispact multiple instructionsin parallel for each data stream. I have never seen the actualdesigns but I suspect they look like VLIW. The GPGPU people shouldknow.
I believe the NV7800 can process 24 data streams in parallel while
executing 2 simultaneous instructions on each stream. For exampleeach stream can do two FP mul/add instructions in a single clock.They make everything SIMD compatible (adjusting for loops andbranches) in the compile phase. AFAIK the shaders are running at500-600Mhz clocks.
I didn't really do a good job with my reasoning before, so here's areexplanation:
Fixed-function design, one pixel wide: All operations are separatemacro pipeline stages, which are further subdivided. With 100pipeline stages, 100 fragments are in flight in parallel. Due topipelining, we could run at, say, 100MHz in an FPGA.


I think that you are missing something here.  With your example, you are
throwing more hardware at the problem, therefore it will run faster.
Now it is true that with fixed function hardware that you don't have to
throw as much hardware at the problem as you would with fully
programmable hardware, but you could achieve the same results with fully
programmable hardware -- it would just require more hardware.

Programmable shader design, one pixel wide: All operations areshader instructions, executed sequentially. For any instruction,only a small portion of the hardware is utilized.


Well not exactly true.  If the hardware has MD of four 32 bit words, you
are only going to use all of it except when you use four word vectors as
one of the variables in the operation.

Would it be possible to dynamically allocate hardware resources
depending on the op and the data?

If the average number of instructions to compute a fragment is, say,10, and they can be arranged to take only 10 cycles, then the GPU can
 only push out one pixel every 10 cycles.


You can pipeline fully programmable shaders too.  But, it would be
better to use them in parallel to increase throughput.  If you had 10 in
parallel, you could start a fragment every clock cycle and output a
pixel every clock.

Due to more feedback in the architecture of the shader, it runs at
only 50MHz in the FPGA.
This is for relative comparison, so don't worry about the fact thatthey'd all run much faster in an ASIC.
Anyhow, given this hypothetical example, the fixed-function designcan theoretically process pixels 20 times as fast as the shader in agiven architecture.


But how much more hardware does it have?

This is why the shaders have to run at 500Mhz to fully utilize their
 memory bandwidth, while the fixed-function design can run at a much
lower clock rate to get the same performance. And get much betterpower utilization.
And remember that for 99% of what desktop users need, even OGA isoverkill. A programmable shader is out of the ballpark.

Except that it is what the future needs. That is, only withprogrammable hardware can we be sure of having a viable product for thefuture.

OTOH, this is not an either/or question. If there are very commonfragment shader operations, I see nothing wrong with having a hardwareconfigured shader to do them. The issue is whether they will beutilized enough so that there will be a net hardware savings vs. fullyprogrammable.


--
JRT
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Re: Fwd: Direction of graphics card design

Reply via email to