On 10/27/06, Levi Pearson <[EMAIL PROTECTED]> wrote:
There have been efforts to build distributed shared memory systems,
but I think they are fundamentally misguided.  Even with today's high
speed, low-latency interconnect fabrics, remote memory access is
still significantly slower than local memory access to the point that
hiding it behind an abstraction layer is counterproductive.  In order
to predict the performance of your system, you still need to know
exactly when an access is local and when it is remote.  Considering
that the point of these systems is high performance, abstracting away
an important factor in performance is not particularly wise.

I agree with Levi.  The simplicity that's provided by an abstraction
like that is tempting, but the details that are being hidden are too
dramatic to the point that it isn't helpful, but harmful.  When going
off box is hundreds/thousands of times slower than local access,
you're going to want to dictate that as a developer.

This is especially true the less tighly connected your compute nodes
get.  A multi-processor computer with a Hypertransport bus can
probably get away with abstracting away local vs. remote memory

I believe LinuxNetworx does this, but like you said, they have
intimate control over the nodes involved and their interconnect.  It's
more similar to an integrated circuit than a loose network cluster.

access.  In a multi-node cluster connected by an Infiniband fabric,
latency differences between local and remote access become
significant, but one can typically assume fairly low latency and
fairly high reliability and bandwidth.   A cluster with gigabit
ethernet moves to higher latency and lower bandwidth, and a grid
system consisting of nodes spanning multiple networks makes treating
remote operations like local ones downright insane.

What he said.

Add these details to the increased difficulty of programming in a
shared-state concurrency system, and it starts to look like a pretty
bad idea.  There are plenty of established mechanisms for concurrency
and distribution that work well and provide a model simple enough to
reason about effectively.  Letting people used to writing threaded
code in C/C++/Java on a platform with limited parallelism carry their
paradigms over to highly-parallel systems is NOT a good idea in this
case.  Retraining them to use MPI, tuple space, or some other
reasonable mechanism for distributed programming is definitely worth
the effort.

                --Levi

I agree completely.

-Bryan

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to