On Saturday 24 May 2008 12:12:27 Dieter wrote: > It looks like ATI plans to take a similar path with GPUs > as with CPUs. Rather than keep making GPUs bigger and bigger > with resulting increases in cost, power consumption and heat, > it looks like they plan to make smaller GPUs and use more > than one together to build high end products. The little I've > read indicates that they will be using multiple dies. > > This approach has some manufacturing advantages. If you can > build a range of product with a single type of die, it would > cost less to manufacture. Only one mask to have made. Larger > quantities of a single chip. With smaller dies, a defect would > spoil a smaller percentage of the wafer. Yield would increase. > > I know a bit about SMP, but close to nothing about the > Crossfire/SLI style multiple GPU systems. How well does it > scale?
Way back we discussed the needed precision in the multipliers and reciprocals for the 3D engine, and it became quite clear that we'd have to chop up the trapezoids that we're rendering into smaller bits. Essentially, we'd tile the screen to keep the spans small and limit accumulation of roundoff errors. Once you have that, having a separate renderer for each tile seems easy enough. Tiling like that has been done on video cards before (PowerVR was the first consumer-level card that did it I think) and it's being done on a larger scale with tiled display walls (one machine for every couple of monitors). Essentially, it's a blackboard style architecture, or even a tuple space kind of thing if you just DMA the command sets to a separate piece of memory and have the processors scoop them up and execute them whenever they are available. The main problem seems to me getting all these processors to access the framebuffer at the same time. I suppose the memory would just have to be fast enough to keep up with the renderer though, you'd have the same problem with a single very fast GPU. > How much extra work is it to create a multiple GPU > system? Would it be feasible for OGP to go this route? If > we can, this could allow us to be *far* more competitive > while keeping chip fab costs down. Right, I've been thinking about this in the context of a completely free PC. Build a ground plate that supplies power and cooling, and then stack a bunch of cubes containing a CPU (at say, 586 level of performance), some memory, and some fast interconnect to all sides on top of it. Some cubes would have external I/O on them. Need more computing power, simply add more cubes, which would be cheap individually because they would be made in large volumes. The challenge would of course be the operating system, because you're not going to hand-rewrite your software to run efficiently on your particular topology of cubes, so it would have to be partitioned dynamically. Essentially it'd be a microcluster, with all the advantages and disadvantages that come with it. But that's something for the other mailinglist I guess. > It looks like ray tracing and radiosity are going to become > more and more important. Does OGP need to do anything to > be ready for this? (e.g. architecture to support it) It's been a decade or so since I've dealt with those, but let's see what I can remember.. Back then radiosity was cool because it was used in Quake II as a (very slow) preprocessing step for calculating lightmaps. IIRC, the main part of a radiosity calculation is calculating the transfer function. Given two polygons, it tells you how much they "see" of each other, and then uses that to figure out how much of the light radiated by one ends up on the other polygon. It's linear algebra, probably a bunch of dot and cross products. You do that for each pair of polygons to calculate a transfer matrix, and then take the initial luminosities of the polygons and multiply them by the matrix repeatedly until you get to a steady state, or until you get to the shipping deadline on your game. I'm not sure about the details, but it sounds about right. So, maybe we should explore that DSP idea again. If you want correct shadows, you also have to take any objects in between the two faces into account for the transfer function, which is where the ray tracing part comes from. I think most radiosity renderers from that era would just shoot one or a few rays between the polygons and multiply the result by the proportion that got through, Monte Carlo style. Or you can forego the radiosity strategy completely and do everything by ray tracing. I think they also use ray tracing for real time 3D sound. Anyway, it seems that it's all linear algebra, lots of adds, mults, and mult-adds. And that this too could be parallelised, come to think of it... Cheers, Lourens, who should really get started on his LinuxTag presentation rather than writing long posts about parallel graphics hardware :-)
pgpgiRddEIAZn.pgp
Description: PGP signature
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
