Re: Concurrency, was Re: Doh! Stupid Programming Mistakes

Levi Pearson Fri, 27 Oct 2006 11:30:10 -0700

On Oct 27, 2006, at 11:36 AM, Michael L Torrie wrote:

Besides all this, computing is evolving to be distributed nowadays,with
a non-unified memory architecture.  Nodes do not share memory; they
communicate with a protocol.  There's a reason why in super-computing
MPI and other message-passing protocol schemes are king.  Threads
obviously don't make sense in any kind of distributedarchitecture. NowI believe that OSes and computing systems will be designed to hidethis
fact from the programs, allowing normal programs to be spread
dynamically across nodes.  Maybe through some system that emulates
shared memory and local devices (mapping remote ones). Even in asystem
that emulates shared memory (say by swapping pages of memory across
nodes), your threads may think they are not copying memory(accessing it
directly) but are not.  Besides that fact, I think it's probably a bad
idea to code with any particular assumptions about the underlying
machine architecture (vm or not).

There have been efforts to build distributed shared memory systems,but I think they are fundamentally misguided. Even with today's highspeed, low-latency interconnect fabrics, remote memory access isstill significantly slower than local memory access to the point thathiding it behind an abstraction layer is counterproductive. In orderto predict the performance of your system, you still need to knowexactly when an access is local and when it is remote. Consideringthat the point of these systems is high performance, abstracting awayan important factor in performance is not particularly wise.

This is especially true the less tighly connected your compute nodesget. A multi-processor computer with a Hypertransport bus canprobably get away with abstracting away local vs. remote memoryaccess. In a multi-node cluster connected by an Infiniband fabric,latency differences between local and remote access becomesignificant, but one can typically assume fairly low latency andfairly high reliability and bandwidth. A cluster with gigabitethernet moves to higher latency and lower bandwidth, and a gridsystem consisting of nodes spanning multiple networks makes treatingremote operations like local ones downright insane.

Add these details to the increased difficulty of programming in ashared-state concurrency system, and it starts to look like a prettybad idea. There are plenty of established mechanisms for concurrencyand distribution that work well and provide a model simple enough toreason about effectively. Letting people used to writing threadedcode in C/C++/Java on a platform with limited parallelism carry theirparadigms over to highly-parallel systems is NOT a good idea in thiscase. Retraining them to use MPI, tuple space, or some otherreasonable mechanism for distributed programming is definitely worththe effort.


                --Levi

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Re: Concurrency, was Re: Doh! Stupid Programming Mistakes

Reply via email to