On Oct 25, 2006, at 11:51 AM, Bryan Sant wrote:
I always just go with threads. But then again, I do a lot of desktop
software, where interaction between components is frequent and shared
memory is more efficient, reliable, and convenient than message
passing via pipes or some other IPC mechanism. I'm not saying that
Levi's points aren't valid, on the contrary, they are. Memory space
protection provided by a process is valuable... Valuable if you're
using C, or some other language that can stomp on or leak memory. If
you're using a language with memory management (Perl, C#, Java, Lisp),
then the protection provided by processes has little value and some
down sides.
You're conflating two different problems here. First, there is the
problem of memory safety. C and C++ allow you to fairly easily write
into memory that you should not write into. Memory-safe languages
don't let you do it at all. Memory safety requires that all
primitive memory allocation and especially deallocation be managed by
the language runtime, typically via a garbage collector.
The second problem is the nondeterministic interleaving of execution
that exists in the shared-memory concurrency model. Every heap
variable is, by default, shared by all threads. Since the scheduler
can switch between threads at arbitrary times, a program that uses
heap variables naively will almost certainly behave unpredictably and
not do what you want. Enter locks. They allow you to re-serialize
the execution of your program in certain areas, so only one thread
can run at a time. This solves one problem, but creates a few more.
First of all, you must remember to put locks in all the right
places. Some higher level languages help out quite a bit with this,
but if you're doing raw pthreads in C, it's pretty easy to screw up
and create a race condition, where nondeterminism creeps into your
program again. And in any language higher-level than assembly, it's
entirely possible that an operation that looks atomic on the surface
(i.e., can't be broken down any further in that language) actually
consists of many machine operations, so the scheduler could switch to
a different thread /in the middle/ of that operation. Doing shared-
memory concurrency safely in a high-level language requires a lot of
information about the implementation of that language, which kind of
defeats the purpose.
Second, you are hampered in your ability to create new abstractions.
When multiple shared resources are involved, you must be careful to
obtain and release the locks in the correct order. This is a pain,
it creates concerns that cross abstraction barriers, and is generally
an impediment to good software design practices.
Finally, locks can create performance issues. The purpose of a lock
is to serialize your program, and if there are too many of them, your
amount of parallelism drops through the floor and you end up with a
serial program. In the worst case, you can deadlock and bring the
program to a halt. Getting good performance with locks along with
elimination of 100% of race conditions and deadlocks is a very hard
thing to do. As the amount of concurrency goes up, the performance
penalty of locks and the chance of hitting a lurking race condition
goes up, too.
So, I hope that made the distinction between the problems caused by
lack of memory safety and the problems caused by shared-state
concurrency clear. Regardless of the problems, both are still
sometimes the right solution. They just shouldn't be the DEFAULT
solution for a programmer who wants to write correct code, in
general. Some particular high-level languages and programming
environments make using any other concurrency paradigm at least as
difficult; programmers in such environments are simply screwed, and
should demand better tools.
You can achieve a much more natural programming model by
using threads and semaphores, than processes and marshaled messages.
What feels natural to do is largely defined by the language you are
using, so that is only true for a subset of languages. I would
argue that languages that make shared-state concurrency the most
natural way to approach a problem ought to be redesigned so that
shared-state concurrency is well-supported when necessary, but
alternatives feel just as (or more, preferably) natural.
You have also left out one important option from your list, though;
threads that by default share nothing, but can explicitly ask for
regions of memory to be shared. Combine that with software
transactional memory (aka optimistic or lock-free concurrency) and
message-passing and deterministic concurrency whenever they are
appropriate, and you can use the tool that suits your problem and
eliminate the possibility of large classes of programming errors,
just like memory protection eliminates another large class of
programming errors.
--Levi
/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/