On Oct 27, 2006, at 11:36 AM, Michael L Torrie wrote:
Besides all this, computing is evolving to be distributed nowadays,
with
a non-unified memory architecture. Nodes do not share memory; they
communicate with a protocol. There's a reason why in super-computing
MPI and other message-passing protocol schemes are king. Threads
obviously don't make sense in any kind of distributed
architecture. Now
I believe that OSes and computing systems will be designed to hide
this
fact from the programs, allowing normal programs to be spread
dynamically across nodes. Maybe through some system that emulates
shared memory and local devices (mapping remote ones). Even in a
system
that emulates shared memory (say by swapping pages of memory across
nodes), your threads may think they are not copying memory
(accessing it
directly) but are not. Besides that fact, I think it's probably a bad
idea to code with any particular assumptions about the underlying
machine architecture (vm or not).
There have been efforts to build distributed shared memory systems,
but I think they are fundamentally misguided. Even with today's high
speed, low-latency interconnect fabrics, remote memory access is
still significantly slower than local memory access to the point that
hiding it behind an abstraction layer is counterproductive. In order
to predict the performance of your system, you still need to know
exactly when an access is local and when it is remote. Considering
that the point of these systems is high performance, abstracting away
an important factor in performance is not particularly wise.
This is especially true the less tighly connected your compute nodes
get. A multi-processor computer with a Hypertransport bus can
probably get away with abstracting away local vs. remote memory
access. In a multi-node cluster connected by an Infiniband fabric,
latency differences between local and remote access become
significant, but one can typically assume fairly low latency and
fairly high reliability and bandwidth. A cluster with gigabit
ethernet moves to higher latency and lower bandwidth, and a grid
system consisting of nodes spanning multiple networks makes treating
remote operations like local ones downright insane.
Add these details to the increased difficulty of programming in a
shared-state concurrency system, and it starts to look like a pretty
bad idea. There are plenty of established mechanisms for concurrency
and distribution that work well and provide a model simple enough to
reason about effectively. Letting people used to writing threaded
code in C/C++/Java on a platform with limited parallelism carry their
paradigms over to highly-parallel systems is NOT a good idea in this
case. Retraining them to use MPI, tuple space, or some other
reasonable mechanism for distributed programming is definitely worth
the effort.
--Levi
/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/