On 7/13/10 1:08 PM, Andrew Whitworth wrote:
I've been talking to Chandon a lot about his GSoC project, and one
problem that he's going to start running into is the issue of sharing
pointers (specifically PMCs) across threads, and the mechanisms
necessary to lock and protect them. Unfortunately, implementing a
whole system of synchronization or locking primitives is probably
outside the scope of his GSoC project, so we can't rely on something
like that being designed and implemented by the end of his project
work in August.
My question to the larger parrot community is this: Assuming that in a
month we have a more-or-less working and usable threading
implementation in Parrot, how do we want to handle sharing of data?
As safely, cleanly, and with as little complexity as possible.
Do we want a global interpreter lock, like Python uses?
The GIL is the single biggest problem with CPython today, and if they
could get rid of it without breaking a ton of existing code, they would.
So, definitely not that.
A new implementation of STM?
STM isn't a magic silver bullet either. Even the greatest advocates of
STM (e.g. Simon Peyton-Jones) will tell you that STM doesn't actually
resolve the problem of deadlocks. (And since deadlocks are one of the
central problems STM was trying to eliminate, that's saying a lot.) And,
there tends to be a high cost associated with re-executing the
transaction code repeatedly until it succeeds.
COW clones for shared PMCs?
The case where two threads have independent copies of a variable really
isn't a problem. (COW might save a bit of memory, but you might as well
just make a copy.) The difficult and costly part of truly shared
variables is when two threads are making changes to a variable and need
to see each other's changes.
A library of
locking primitives (probably with some version of a limited GIL to
protect interpreter-global data)?
There are several things you could mean by this, but I'm guessing you
mean a set of thread-safe PMCs, that can be shared across threads. It's
a possibility.
Or, do we want to maybe start planning for a new architecture entirely
and use a message-passing system like erlang?
That is the direction the most recent round of refactors and the current
PDD are heading, and the direction we'll continue heading through Lorito.
This could be
interesting, but does nothing to make threading usable in parrot in
the next few months.
It's a bit strong to say "unusable".
Make a list of the features Chandon needs (not wishlist, but absolute
essentials for his GSoC project), and we'll make sure they work. Though,
does he really even need shared variables for his project? The idea was
to prototype a new style of threading, the GSoC project doesn't require
him to drop it in as a whole replacement for the current implementation.
I've been thinking a lot about concurrency lately for Lorito. In the
longer-term:
We don't have a single stock answer of "X concurrency model will rule
the world" because no one does. It would be a really, really bad idea to
sell our souls to any one concurrency model available today, because
they're all broken in some way. The Parroty way is to provide the
building blocks for multiple different approaches, without dictating one.
The wider world has thrown up its hands at concurrency and gone to the
cloud. Cloud architectures are essentially the extreme case of the
Erlang concurrency model, independent units of code where you don't have
to think about parallelism, combined with message passing remotely over
the network rather than within a single process on a single machine.
Some features that would give us an advantage in this model are a
lightweight interpreter with rapid startup time (hey presto, Lorito).
Message passing internally is the other most important model to support,
and should be easy to integrate with message-passing externally.
The other two are shared and unshared threads. Threads with no shared
variables are easy. Threads with shared variables (that aren't
message-passing) are more complex in that there are multiple ways to do
it, and we need to decide which ways to support, which ways to allow
(i.e. you can use it, but not together with some other sets of features
or some other concurrency models), which we can emulate, and if there
are any models we want to explicitly disallow.
I'd put chromatic's "unshared by default with explicit shared variables"
in the "supported" category. We can gain some added safety by only
allowing specific thread-safe or locking variable types to be shared,
and by segmenting shared-variable memory off from the regular pools.
This form of sharing can safely interact with message-passing (which is
basically just unshared, plus one shared message channel).
Python's GIL and Perl 5 ithreads I'd put in the "allowed" category, i.e.
we make it possible, but if you're going that way, it's a whole-hog
option. Don't expect it to play well with other concurrency models
running in the same interpreter at the same time.
Allison
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev