RFC 178 (v1) Lightweight Threads

Perl6 RFC Librarian Wed, 30 Aug 2000 23:19:15 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Lightweight Threads

=head1 VERSION

  Maintainer: Steven McDougall <[EMAIL PROTECTED]>
  Date: 30 Aug 2000
  Mailing List: [EMAIL PROTECTED]
  Version: 1
  Number:  178

=head1 ABSTRACT

A lightweight thread model for Perl.

=over 4

=item *

All threads see the same compiled subroutines

=item *

All threads share the same global variables

=item *

Threads can create thread-local storage by C<local>izing global variables

=item *

All threads share the same file-scoped lexicals

=item *

Each thread gets its own copy of block-scoped lexicals upon execution
of C<my>

=item *

Open code can only be executed by a thread that compiles it

=item *

The interpreter guarantees data coherence

=back 

=head1 DESCRIPTION

The overriding design principle in this model is that there is one
program executing in multiple threads. One body of code; one set of
global variables; many threads of execution. I like this model because

=over 4

=item *

I understand it

=item *

it does what I want

=item *

I think it can be implemented

=back


=head2 Notation

=over 4

=item I<main> and I<spawned> threads

We'll call the first thread that executes in a program the I<main>
thread. It isn't distinguished in any other way. All other threads are
called I<spawned> threads.

=item I<open code>

Code that isn't contained in a BLOCK.

=back

Examples are written in Perl5, and use the thread programming model
documented in C<Thread.pm>. Discussions of performance and
implementation issues are all based on the Perl5 internals; obviously,
these are subject to change.


=head2 All threads see the same compiled subroutines

Subroutines are typically defined during the initial compilation of a
program. C<use>, C<require>, C<do>, and C<eval> can later define
additional subroutines or redefine existing ones. Regardless, at any
point in its execution, a program has one and only one collection of
defined subroutines, and all threads see this collection.

Example 1

    sub foo      { print 1 }
    sub hack_foo { eval 'sub foo { print 2 }' }
    foo();
    Thread->new(\&hack_foo)->join;
    foo();

Output: 12. The main thread executes C<foo>; the spawned thread
redefines C<foo>; the main thread executes the redefined subroutine.


Example 2

    sub foo      { print 1 }
    sub hack_foo { eval 'sub foo { print 2 }' }
    foo();
    Thread->new(\&hack_foo);
    foo();

Output: 11 or 12, according as the main thread does or does not make
the second call to C<foo()> before the spawned thread redefines it. If
the user cares which happens first, then they are responsible for
doing their own synchronization, for example, with C<join>, as shown
in Example 1.

Code refs (like all Perl data objects) are reference counted. Threads
increment the reference count upon entry to a subroutine, and
decrement it upon exit. This ensures that the op tree won't be garbage
collected while the thread is executing it.
    

=head2 All threads share the same global variables

Example 3

    #!/my/path/to/perl
    $a = 1;
    Thread->new(\&foo)->join;
    print $a;
    
    sub foo { $a++ }

Output: 2. C<$a> is a global, and it is the I<same> global in both the
main thread and the spawned thread.


=head2 Threads can create thread-local storage by C<local>izing global
variables

Example 4

    #!/my/path/to/perl
    $a = 1;
    Thread->new(\&foo);
    print $a;
    
    sub foo { local $a = 2 }

Output: 1. The spawned thread gets it's own copy of C<$a>. The copy of
C<$a> in the main thread is unaffected. It doesn't matter whether the
assignment in C<foo> executes before or after the C<print> in the main
thread. It doesn't matter whether the copy of C<$a> goes out of scope
before or after the C<print> executes.


As in Perl5, C<local>ized variables are visible to any subroutines
called while they remain in scope.

Example 5

    #!/my/path/to/perl
    $a = 1;
    Thread->new(\&foo);
    bar();
    
    sub foo 
    { 
        local $a = 2;
        bar();
    }
    
    sub bar { print $a }

Output: 12 or 21, depending on the order in which the calls to C<bar>
execute.


Dynamic scopes are not inherited by spawned threads.

Example 6

    #!/my/path/to/perl
    $a = 1;
    foo();
         
    sub foo 
    { 
        local $a = 2;
        Thread->new(\&bar)->join;
    }
    
    sub bar { print $a }

Output: 1. The spawned thread sees the original value of C<$a>.


=head2 All threads share the same file-scoped lexicals

Example 7

    #!/my/path/to/perl
    my $a = 1;
    Thread->new(\&foo)->join;
    print $a;
    
    sub foo { $a = 2 }

Output: 2. C<$a> is a file-scoped lexical. It is the same variable in
both the main thread and the spawned thread.


=head2 Each thread gets its own copy of block-scoped lexicals upon
execution of C<my>

Example 8

    #!/my/path/to/perl
    foo();
    Thread->new(\&foo);
    
    sub foo 
    {
        my $a = 1; 
        print $a++;
    }

Output: 11. This result is guaranteed, even if the statements execute
in this order

        Main thread     Spawned thread
        my $a = 1;
                        my $a = 1;
        print $a++;
                        print $a++

C<$a> is a block-scoped lexical variable. Every time a thread executes
the C<my>, a new variable is created, completely unrelated to any
other variable in any thread.


=head2 Open code can only be executed by a thread that compiles it

Threads execute BLOCKs

    new   Thread \&foo
    new   Thread sub { ... }
    async Thread     { ... }

This means that code that is not contained in a BLOCK can only be
executed by a thread that compiles it.


Example 9

    #!/my/path/to/perl
    Thread->new(\&foo)->join;
    Thread->new(\&foo)->join;
    print $a;
    
    sub foo { require Bar; }

    # Bar.pm
    $a++;

Output: 1. C<require> won't compile the same file twice, so the
increment only executes in the first spawned thread.


Example 10

    #!/my/path/to/perl
    Thread->new(\&foo)->join;
    Thread->new(\&foo)->join;
    print $a;
    
    sub foo { do 'Bar.pm'; }

    # Bar.pm
    $a++;

Output: 2. C<do> will compile the same file repeatedly, so the
increment executes in both spawned threads.


Example 11

    #!/my/path/to/perl
    Thread->new(\&foo)->join;
    Thread->new(\&foo)->join;
    
    sub foo { do 'Bar.pm'; }

    # Bar.pm
    my $a = 1;
    print $a++;

Output: 11. The C<my> creates a new file-scoped lexical each time it
executes.


Example 12

    #!/my/path/to/perl
    $a++; 
    async { do $0 } if $a < 2;
    print $a;

Output: 12 or 21. Evil, but straightforward. The main thread and the
spawned thread both compile and execute the program.


=head2 The interpreter guarantees data coherence

Any implementation of threads implicitly defines a boundary between
the interpreter and the programmer: a boundary between what the
interpreter guarantees about the execution of threads and what the
programmer must ensure through synchronization mechanisms. This
boundary must be 

=over *

=item *

precise 

=item *

intuitive

=back

Otherwise, programmers will forever be guessing (wrong) about what
exactly they can rely on the interpreter for.

This RFC proposes that the boundary be placed at I<data coherence>.
Data coherence means that primitive operations in Perl always produce
a result that could be obtained if the operation executed atomically.

Example 13

    #!/my/path/to/perl
    
    $a = "abcdef";
    $b = "123456";
    
    my $t1 = async { $a = $b };
    my $t2 = async { $b = $a };
    
    $t1->join;
    $t2->join;
    
    print "$a $b";

Output: `abcdef abcdef' or `123456 123456'

Assignment is a primitive operation. Even though the code doesn't
synchronize execution of the two C<async> blocks, data coherence
guarantees that each assignment executes as if it were atomic. In
particular, we are guaranteed not to get output like `ab34ef ab34ef'.

Controlling the order in which the two assignment statements execute
is a I<data synchronization> problem. The interpreter doesn't make any
guarantees regarding data synchronization. If the programmer cares
about this, they must add synchronization mechanisms (e.g. mutexes) to
the code.

To fully specify a data coherence model, we have to decide which
operations are primitive. As a starting point, we can take all the
operators documented in C<perlop.pod> and all the functions documented
in C<perlfunc.pod> as primitive.

Example 14

    #!/my/path/to/perl
    
    my $t1 = async { push @a, (1, 2, 3 ) };
    my $t2 = async { push @a, (4, 5, 6 ) };
    
    $t1->join;
    $t2->join;
    
    print @a;

Output: 123456 or 456123

C<push> is a primitive operation. Data coherence guarantees that we
won't get output like `142536'.


Example 15

    #!/my/path/to/perl
    
    $a = 'aa';
    my $t1 = async { $a = 'bb' };
    my $t2 = async { $a = 'cc' };
    
    print "$a $a $a";

Possible output: `aa cc bb'

Variable interpolation is a primitive operation. There are three
interpolations in the C<print> statement; C<$a> is liable to change
between any of them. Data coherence does guarantee that we won't get
output like `ac cc bb'.


=head1 IMPLEMENTATION

Perl6 could have either cooperative threads or preemptive threads.

=head2 Cooperative threads

RFC 47 proposes that "there...be one event loop for all of Perl". This
event loop would dispatch op codes, deliver signals, and invoke
callbacks. It would be a natural extension of this architecture for
the event loop to dispatch op codes for multiple threads of a Perl
program.

The big advantage of cooperative threads is that the Perl interpreter
remains a single-threaded program. A Perl program may have many
threads, but the interpreter has only one: it runs an event loop and
it dispatches op codes. Because it is single-threaded, the interpreter
is not subject to race conditions, and requires no synchronization
code.

Cooperative threads have several disadvantages

=over 4

=item *

The interpreter has to do asynchronous I/O. But Perl6 may support
asynchronous I/O per RFC 47, and the interpreter has an event loop to
run the callbacks...

=item *

The interpreter can't preempt XSUBs. But XSUBs don't I<have> to be
ill-behaved. The XS interface could expose a C<yield()> call for XSUBs
to call during time-consuming operations.

=item *

We don't get Symmetric MultiProcessing (SMP). No way around this one.
Boo, Hiss.

=back


=head2 Preemptive threads

The interpreter is implemented on top of a native threading package,
such as PThreads. Each Perl thread runs in its own native thread. We
get SMP, and we always get control back from XSUBs. (Although XSUBs
can still crash the interpreter...)

The big drawback of preemptive threads is that the interpreter itself
becomes a multi-threaded program, with all attendant synchronization
requirements. If Perl6 gets preemptive threads, expect race conditions
to become the kind of ongoing headache that memory leaks were for
Perl4 and Perl5.


=head2 C<local>

C<local> needs some new functionality in a multi-threaded program.

Global variables are bound to globs at compile time. The glob holds a
pointer to the actual data value.

    Perl source                 Internal data representation
    $a = 'abcd';                *a -> 'abcd'

In a single-threaded program, C<local> creates a new data value and
installs it in the glob. The new data value holds a pointer to the old
value, so that the old value can be restored when the new one goes out
of scope.

    { local $a = 42; }          *a -> 42 -> "abcd"

In a multi-threaded program, each thread can create its own local
values for a global. The glob can keep these separate by having an
array of pointers, indexed by thread ID. The glob also needs a pointer
to the original value from the main thread, because that is the value
that spawned threads see if they don't localize the variable.

    $a = 1;                     # ID 0; main thread
    { 
                local $a = 2;
        async {       $a = 3 }; # ID 1; sets original value
        async { local $a = 4 }; # ID 2
    }
    
                                *foo
                                ORIGINAL --------+
                                                 |
                                [0] -> 2 -> 3 <--+
                                [2] -> 4

Threads come and go, but the IDs keep incrementing, so the array will
usually be sparse. A B* tree might be a suitable implementation.

Threads that don't localize the variable don't get an entry in the
array; instead, they follow the ORIGINAL pointer. However, the
interpreter still has to search the array to discover this.


=head2 Primitive operations

Primitive operations typically lock their operands to avoid race
conditions

    Perl source         C Implementation
    $a = $b             lock(a.mutex);
                        lock(b.mutex);
                        free(a.pData);
                        a.length = b.length;
                        a.pData  = malloc(a.length);
                        memcpy(a.pData, b.pData, a.length);
                        unlock(a.mutex);
                        unlock(b.mutex);
                
Two threads that need to lock the same variables could deadlock. One
way to avoid this is to lock variables in storage order

                        if (a < b)
                        {
                            lock(a.mutex);
                            lock(b.mutex);
                        }
                        else
                        {
                            lock(b.mutex);
                            lock(a.mutex);
                        }


=head1 DISCUSSION

=head2 Globals and Reentrancy

RFC1 "Implementation of Threads in Perl" proposes that, by default,
threads be isolated in separate data spaces.

=over 4

=item *

Each thread gets its own copy of all global variables. A special stash
named C<global::> provides shared storage between threads.

    $a          # different in different threads
    $global::a  # shared between different threads

=item *

Each thread reC<use>'s all its modules, so that it any module data can
be reinitialized for that thread.

=back

Discussion on perl6-language-flow has further suggested that each
thread get its own copy of file-scoped lexicals. A C<:shared>
attribute could be used to declare file-scoped lexical that are shared
between threads.

    my $a           # different in different threads
    my $a : shared  # shared between different threads


We'll call this an I<isolated> data model. The rational for adopting
an isolated data model is that it will make existing Perl5 modules
reentrant.


This RFC proposes that Perl not take any special steps to isolate
threads in separate data spaces. Globals are shared unless localized,
and file-scoped lexicals are shared unless a thread recompiles the
file. We'll call this a I<shared> data model.

I prefer a shared data model because

=over 4

=item *

It does what I want.

=item *

One of the goals of Perl6 is to get out from under the backwards
compatibility constraints that have boxed in Perl5. Organizing the
threading model around the need to make Perl5 modules reentrant seems
inconsistent with this.

=item *

The collection of Perl5 modules that an isolated data model can rescue
from reentrancy problems may be vanishingly small; conversely, it may
break modules that genuinely need global data.

=back


This isn't something that we can argue about with thought experiments.
The modules are out there on CPAN; we have to look and see how they
behave. I took a quick stroll through the modules that are installed
on my own system; here is a small, non-random sample of what I found.

=over 4

=item C<Sys::Hostname>

C<Sys::Hostname> gets the system hostname and caches it in
C<$Sys::Hostname::host>. This will work correctly in a shared data
model, even without any synchronization mechanism. An isolated data
model will defeat the cache, forcing every thread to look up the
hostname itself.

=item C<Set::IntSpan>

Set::IntSpan uses one global: C<$Set::IntSpan::Empty_String>. All
C<Set::IntSpan> must see the same value for this global. Applications
typically set this global once and then leave it untouched; methods in
C<Set::IntSpan> read it, but do not write it. This works correctly in
a shared data model; it breaks in an isolated data model.

=item C<Time::Local>

Time::Local caches the start times of months in
C<%Time::Local::cheat>. This works correctly in shared data model; an
isolated data model defeats the cache.

Some methods in C<Time::Local> store temporary values in package
globals, e.g. C<$Time::Local::ym>. This works correctly in an isolated
data model, and breaks in a shared data model.

=item C<File::Find>

C<File::Find> stores the name of the current file in
C<$File::Find::name>, and the current directory path in
C<$File::Find::path>. This works in an isolated data model, and breaks
in a shared data model.

However, C<File::Find> also C<cd>s to the directory where the current
file is. This isn't reentrant, and it can't be made reentrant, because
a process has only one CWD, which is shared by all threads. This means
that the C<File::Find> interface is intrinsically broken under
threads.

=item C<Term::Complete>

C<Term::Complete> stores key codes in globals:
C<$Term::Complete::complete>, C<$Term::Complete::kill>,
C<$Term::Complete::erase1>, and C<$Term::Complete::erase2>. This is
reentrant in an isolated data model, and not in a shared data model.

However, C<Term::Complete> isn't even reentrant I<under Perl5>. If two
different parts of an application both use C<Term::Complete>, they
don't need threads to fight over the values of its globals. I'm hard
pressed to see that the design of Perl6 should be driven by the need
to fix modules that are broken in Perl5.

=back

Again, this sample of modules isn't large, or random, or
representative. But it does show that

=over 4

=item *

globals don't necessarily cause concurrency problems

=item *

not all concurrency problems can be fixed with an isolated data model

=back


=head2 Other concurrency mechanisms

RFCs 27 and 31 discuss coroutines. RFC 47 discusses asynchronous I/O.
I'm happy to have other concurrency mechanisms in Perl, but I want
threads, and I don't want to give up any features of threads on the
grounds that you can do the same thing with some other concurrency
mechanism.


=head1 REFERENCES

RFC   1: Implementation of Threads in Perl 

RFC  27: Coroutines for Perl

RFC  31: Co-routines

RFC  47: Universal Asynchronous I/O 

RFC xxx: Thread Programming Model
RFC 178 (v1) Lightweight Threads

Reply via email to