Stephen A. Lawrence wrote:


Jed Rothwell wrote:
Robin van Spaandonk wrote:

Somewhat off topic, but see:

http://www.intel.com/research/platform/terascale/teraflops.htm?iid=newstab+supercomputing


I wonder what they charge for it?

It is NFS (Not For Sale). It is just a prototype device. It does not
do any useful computation, but it does useless work at
record-breaking speed. (Which, come to think of it, is how you might
describe Windows.) This one was designed to test the new "mesh"
interconnections between the cores. This interconnection scheme can
be scale up to thousands of cores, apparently.

Physically, a mesh scales arbitrarily, so hardware designers love them.

In terms of software algorithms which use mesh communication, however, the scaling is horrible. Traffic density at the middle of the mesh goes up in direct (polynomial) relationship with the number of nodes so you can't scale very far before the central links are saturated.

This is something you can _see_ in action. Just drive along Interstate 80 near Chicago and look at all the trucks -- and look at the license plates, and see how many are local. Very few.

The United States is mesh connected, and there's a bit of a choke point just south of the Great Lakes where all traffic between New England and the western states must choose among a relative handful of reasonably direct routes. If we tried to double the size of the country in both dimensions (NS and EW), while keeping the same sort of distribution network and the same population density everywhere (and extending that into the newly annexed regions), the highway system in the middle would most likely jam up completely. (For this exercise, assume we could magically turn the adjacent oceans into dry land. If we extended everything over the "new territories" with unchanged densities, we'd have four times the area, and we'd consequently quadruple the population, and if we kept the same highly nonlocal distribution patterns, truck traffic in the middle of the country would also roughly quadruple.)


To break the bottleneck you need to add "long lines" so distant nodes can communicate and at that point it's not a simple mesh anymore.

The diameter is also pretty bad compared to fancier architectures. (Of course, this is the flip side of the link saturation problem, as sending a message along paths with many hops uses a lot of interconnect resources.)

Adding dimensions to the mesh improves things. Wrapping around the ends, to turn the mesh into a hypertorus, improves things quite a lot but it requires connections from opposite sides of the mesh. That's how BlueGene is architected, by the way.


They are still working
on the memory, which is some secret new configuration. See:

http://www.theinquirer.net/default.aspx?article=37572

They figure it will be available in a practical version in about 5
years.

Japanese researchers last year demonstrated a 512-core math
coprocessor that may achieve 2-PFlops next year. See:

http://www.channelregister.co.uk/2006/11/06/japan_512-core_co-pro/

- Jed





Reply via email to