On Saturday 24 May 2008 12:12:27 Dieter wrote:
> It looks like ATI plans to take a similar path with GPUs
> as with CPUs.  Rather than keep making GPUs bigger and bigger
> with resulting increases in cost, power consumption and heat,
> it looks like they plan to make smaller GPUs and use more
> than one together to build high end products.  The little I've
> read indicates that they will be using multiple dies.
>
> This approach has some manufacturing advantages.  If you can
> build a range of product with a single type of die, it would
> cost less to manufacture.  Only one mask to have made.  Larger
> quantities of a single chip.  With smaller dies, a defect would
> spoil a smaller percentage of the wafer.  Yield would increase.
>
> I know a bit about SMP, but close to nothing about the
> Crossfire/SLI style multiple GPU systems.  How well does it
> scale?

Way back we discussed the needed precision in the multipliers and 
reciprocals for the 3D engine, and it became quite clear that we'd have 
to chop up the trapezoids that we're rendering into smaller bits. 
Essentially, we'd tile the screen to keep the spans small and limit 
accumulation of roundoff errors.

Once you have that, having a separate renderer for each tile seems easy 
enough. Tiling like that has been done on video cards before (PowerVR 
was the first consumer-level card that did it I think) and it's being 
done on a larger scale with tiled display walls (one machine for every 
couple of monitors).

Essentially, it's a blackboard style architecture, or even a tuple space 
kind of thing if you just DMA the command sets to a separate piece of 
memory and have the processors scoop them up and execute them whenever 
they are available.

The main problem seems to me getting all these processors to access the 
framebuffer at the same time. I suppose the memory would just have to 
be fast enough to keep up with the renderer though, you'd have the same 
problem with a single very fast GPU.

> How much extra work is it to create a multiple GPU 
> system?  Would it be feasible for OGP to go this route?  If
> we can, this could allow us to be *far* more competitive
> while keeping chip fab costs down.

Right, I've been thinking about this in the context of a completely free 
PC. Build a ground plate that supplies power and cooling, and then 
stack a bunch of cubes containing a CPU (at say, 586 level of 
performance), some memory, and some fast interconnect to all sides on 
top of it. Some cubes would have external I/O on them. Need more 
computing power, simply add more cubes, which would be cheap 
individually because they would be made in large volumes.

The challenge would of course be the operating system, because you're 
not going to hand-rewrite your software to run efficiently on your 
particular topology of cubes, so it would have to be partitioned 
dynamically. Essentially it'd be a microcluster, with all the 
advantages and disadvantages that come with it. But that's something 
for the other mailinglist I guess.

> It looks like ray tracing and radiosity are going to become
> more and more important.  Does OGP need to do anything to
> be ready for this?  (e.g. architecture to support it)

It's been a decade or so since I've dealt with those, but let's see what 
I can remember..  Back then radiosity was cool because it was used in 
Quake II as a (very slow) preprocessing step for calculating lightmaps. 
IIRC, the main part of a radiosity calculation is calculating the 
transfer function. Given two polygons, it tells you how much they "see" 
of each other, and then uses that to figure out how much of the light 
radiated by one ends up on the other polygon. It's linear algebra, 
probably a bunch of dot and cross products. You do that for each pair 
of polygons to calculate a transfer matrix, and then take the initial 
luminosities of the polygons and multiply them by the matrix repeatedly 
until you get to a steady state, or until you get to the shipping 
deadline on your game. I'm not sure about the details, but it sounds 
about right. So, maybe we should explore that DSP idea again.

If you want correct shadows, you also have to take any objects in 
between the two faces into account for the transfer function, which is 
where the ray tracing part comes from. I think most radiosity renderers 
from that era would just shoot one or a few rays between the polygons 
and multiply the result by the proportion that got through, Monte Carlo 
style. Or you can forego the radiosity strategy completely and do 
everything by ray tracing. I think they also use ray tracing for real 
time 3D sound.

Anyway, it seems that it's all linear algebra, lots of adds, mults, and 
mult-adds. And that this too could be parallelised, come to think of 
it...

Cheers,

Lourens, who should really get started on his LinuxTag presentation 
rather than writing long posts about parallel graphics hardware :-)

Attachment: pgpgiRddEIAZn.pgp
Description: PGP signature

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to