Stephen A. Lawrence wrote:
Jed Rothwell wrote:
Robin van Spaandonk wrote:
Somewhat off topic, but see:
http://www.intel.com/research/platform/terascale/teraflops.htm?iid=newstab+supercomputing
I wonder what they charge for it?
It is NFS (Not For Sale). It is just a prototype device. It does not
do any useful computation, but it does useless work at
record-breaking speed. (Which, come to think of it, is how you might
describe Windows.) This one was designed to test the new "mesh"
interconnections between the cores. This interconnection scheme can
be scale up to thousands of cores, apparently.
Physically, a mesh scales arbitrarily, so hardware designers love them.
In terms of software algorithms which use mesh communication, however,
the scaling is horrible. Traffic density at the middle of the mesh goes
up in direct (polynomial) relationship with the number of nodes so you
can't scale very far before the central links are saturated.
This is something you can _see_ in action. Just drive along Interstate
80 near Chicago and look at all the trucks -- and look at the license
plates, and see how many are local. Very few.
The United States is mesh connected, and there's a bit of a choke point
just south of the Great Lakes where all traffic between New England and
the western states must choose among a relative handful of reasonably
direct routes. If we tried to double the size of the country in both
dimensions (NS and EW), while keeping the same sort of distribution
network and the same population density everywhere (and extending that
into the newly annexed regions), the highway system in the middle would
most likely jam up completely. (For this exercise, assume we could
magically turn the adjacent oceans into dry land. If we extended
everything over the "new territories" with unchanged densities, we'd
have four times the area, and we'd consequently quadruple the
population, and if we kept the same highly nonlocal distribution
patterns, truck traffic in the middle of the country would also roughly
quadruple.)
To break
the bottleneck you need to add "long lines" so distant nodes can
communicate and at that point it's not a simple mesh anymore.
The diameter is also pretty bad compared to fancier architectures. (Of
course, this is the flip side of the link saturation problem, as sending
a message along paths with many hops uses a lot of interconnect resources.)
Adding dimensions to the mesh improves things. Wrapping around the
ends, to turn the mesh into a hypertorus, improves things quite a lot
but it requires connections from opposite sides of the mesh. That's how
BlueGene is architected, by the way.
They are still working
on the memory, which is some secret new configuration. See:
http://www.theinquirer.net/default.aspx?article=37572
They figure it will be available in a practical version in about 5
years.
Japanese researchers last year demonstrated a 512-core math
coprocessor that may achieve 2-PFlops next year. See:
http://www.channelregister.co.uk/2006/11/06/japan_512-core_co-pro/
- Jed