RFC 185 (v1) Thread Programming Model

Perl6 RFC Librarian Thu, 31 Aug 2000 16:49:55 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Thread Programming Model

=head1 VERSION

  Maintainer: Steven McDougall <[EMAIL PROTECTED]>
  Date: 31 Aug 2000
  Mailing List: [EMAIL PROTECTED]
  Version: 1
  Number: 185
  Status: Developing

=head1 ABSTRACT

  use Thread;
  
  $thread  = new Thread \&func     , @args;
  $thread  = new Thread sub { ... }, @args;
             async { ... };
  $result  = join $thread;
             
  $thread  = this Thread;
  @threads = all  Thread;
  
  $thread1 == $thread2 and ...
  yield();
    
  critical { ... };   # one thread at a time in this block

  $mutex = new    Mutex;
           lock   $mutex;
  $ok    = try    $mutex;
           unlock $mutex;
  
        $semaphore = new Semaphore $initial;
  $ok = $semaphore->up($n);
        $semaphore->down;
  
  $event = auto    Event; 
  $event = manual  Event;
           set    $event;
           reset  $event;
           wait   $event;
  
  $timer = Timer->delay($seconds);
  $timer = Timer->alarm($time);
  $timer->wait;

  $readable = $fh->readable;
  $writable = $fh->writable;
  $failure  = $fh->failure;
   
  $ok = wait_all(@objects);
  $i  = wait_any(@objects);


=head1 DESCRIPTION

C<Thread> provides the programming interface to Perl6 threads. It
includes a rich set of synchronization facilities.


=head2 Thread

=over 4

=item I<$thread> = C<new> C<Thread> \&I<func>, I<@args>

Executes I<func>(I<@args>) in a separate thread. The return value is
a reference to a C<Thread> object that manages the thread.


=item I<$thread> = C<new> C<Thread> C<sub> { ... }, I<@args>

Executes an anonymous subroutine in a separate thread, passing it
I<@args>. The return value is a reference to a C<Thread> object that
manages the thread.

The subroutine executes in its enclosing lexical context. References
to lexical variables in the enclosing context are bound at thread
creation time, in a manner analogous to closures.


=item C<async> BLOCK

Executes BLOCK in a separate thread. Syntactically, C<async> BLOCK
works like C<do> BLOCK. In particular, it does not return a C<Thread>
object. If you want the thread object, use one of the C<new> C<Thread>
forms shown above.

The BLOCK executes in its enclosing lexical context. References to
lexical variables in the enclosing context are bound at thread
creation time, in a manner analogous to closures.


=item I<$thread> = C<this> C<Thread>

Returns a reference to the C<Thread> object that manages the current
thread.


=item I<@threads> = C<all> C<Thread>

Returns a list of references to all existing C<Thread> objects in the
program.

=item I<$result> = C<join> I<$thread>

=item I<@result> = C<join> I<$thread>

Blocks until I<$thread> terminates. May be called repeatedly,
by any number of threads.

Returns the last expression evaluated in I<$thread>. This expression
is evaluated in list context inside the thread.

If C<join> is called in list context, it returns the entire list; if
C<join> is called in scalar context, it returns the first element of
the list.


=item I<$thread1> == I<$thread2>

Evaluates to true iff I<$thread1> and I<$thread2> reference the same
C<Thread> object.


=item C<yield>()

Gives the interpreter an opportunity to switch to another thread. The
interpreter is not obligated to take this opportunity, and the calling
thread may regain control after an arbitrarily short period of time.

=back


=head2 Critical section

C<critical> is a new keyword. Syntactically, it works like C<do>. 

  critical { ... }; 

The interpreter guarantees that only one thread at a time can execute
a C<critical> block.


=head2 Mutex

=over 4

=item I<$mutex> = C<new> C<Mutex>

Creates and returns a new C<Mutex> object.

=item C<lock> I<$mutex>

If I<$mutex> is unlocked, locks it and returns immediately. If
I<$mutex> is locked, blocks until I<$mutex> is unlocked.

=item I<$ok> = C<try> I<$mutex>

If I<$mutex> is unlocked, locks it and returns true. If I<$mutex> is
locked, returns false. C<try> never blocks.

=item I<$ok> = C<unlock> I<$mutex>

If I<$mutex> was locked by the calling thread, unlocks it and returns
true. If I<$mutex> is not locked, or was locked by another thread,
does nothing and returns false.

=back

=head2 Semaphore

A semaphore manages a number, called a I<count>. The count is always
between zero and a system-dependent maximum. C<up> and C<down> are
guaranteed to execute atomically.

=over 4

=item I<$semaphore> = C<new> C<Semaphore> I<$n>

Creates and returns a new C<Semaphore> object, with an initial count
of I<$n>. If I<$n> is omitted, the initial count is zero.

=item I<$ok> = C<$semaphore>->C<up>(I<$n>)

If the count can be increased by I<$n> without exceeding the maximum,
does so and returns true. Otherwise, does nothing and returns false.
If I<$n> is omitted, it defaults to 1.


=item I<$semaphore>->C<down>

Blocks until the count is positive, then decrements the count and
returns.

=back

=head2 Event

Events allow one thread to wait until something happens in another
thread.

Events have two states: I<set> and I<reset>. Threads I<wait> on an
event; the C<wait> call blocks until the event is set.

There are two kinds of events: I<manual> and I<automatic>.

When a manual event is set, it remains set until a C<reset> call
is made on it. All waiting threads are immediately unblocked, and
subsequent calls to C<wait> return immediately.

When an automatic event is set, one waiting thread is unblocked and
the event is immediately reset. If there are no waiting threads, the
first call to C<wait> resets the event and returns immediately.

=over 4

=item I<$event> = C<auto> C<Event>

Creates and returns an automatic C<Event> object. The event is initially reset.

=item I<$event> = C<manual> C<Event>

Creates and returns a manual C<Event> object. The event is initially reset.

=item C<set> I<$event>

Sets I<$event>.

When a manual event is set, it remains set until a C<reset> call is
made on it. During this time, any number of threads may C<wait> on it
without blocking. When an automatic event is set, it is reset by the
first thread that C<wait>s on it.

=item C<reset> I<$event>

Resets I<$event>.

=item C<wait> I<$event>

Blocks until I<$event> is set.

=back

=head2 Timer

=over 4

=item I<$timer> = C<Timer>->C<delay>(I<$seconds>)

Creates and returns a new C<Timer> object. The timer will expire
I<$seconds> seconds after it is created. I<$seconds> may be a floating
point number, so this interface supports whatever time resolution the
platform provides.


=item I<$timer> = C<Timer>->C<alarm>(I<$time>)

Creates and returns a new C<Timer> object. The timer will expire at
I<$time> seconds after the epoch. I<$time> may be floating point
number, so this interface supports whatever time resolution the
platform provides.


=item C<wait> I<$timer>

Blocks until the timer expires.

=back

=head2 Wait functions

Threads, mutexes, semaphores, events, and timers are collectively
referred to collectively as I<synchronization objects>. In addition,
we can create C<readable>, C<writable>, and C<failure> objects from a
file handle.

A synchronization object is said to be I<signaled> when a thread would
not block on it. The conditions for an object to be signaled depend on
the kind of object.

=over 4

=item *

Threads are signaled when they have completed execution.

=item *

Mutexes are signaled with they are unlocked.

=item *

Semaphores are signaled when they have a positive count.

=item *

Events are signaled while they are set.

=item *

Timers are signaled when they expire

=item *

Readable objects are signaled when a read on the underlying file
handle will not block.

=item *

Writable objects are signaled when a write on the underlying file
handle will not block.

=item *

Failure objects are signaled when I/O on the underlying file
handle will fail.

=back


=over 4

=item I<$readable> = I<$fh>->C<readable>

Returns a synchronization object that is signaled when there is data
available to be read on I<$fh>.

=item I<$writable> = I<$fh>->C<writable>

Returns a synchronization object that is signaled when a write to
<$fh> will not block

=item I<$failure> = I<$fh>->C<failure>

Returns a synchronization object that is signaled when I/O operations
on I<$fh> will fail.

=item I<$ok> = wait_all(@objects)

Blocks until all of the synchronization I<@objects> are signaled.
C<wait_all> does not change the state of any object until all the
objects are signaled. This prevents deadlock, at least between
competing wait functions.

Returns true on success, false on error. An error occurs if an element
of I<@objects> is not a synchronization object.


=item I<$i> = wait_any(@objects);

Blocks until at least one of the synchronization I<@objects> is
signaled. On success, returns the index in I<@objects> of a signaled
object. Returns -1 on error. An error occurs if an element
of I<@objects> is not a synchronization object.


=back


=head1 IMPLEMENTATION

All of these features should be doable if threads are built into Perl.

Making file handles and sockets into synchronization objects probably
requires asynchronous I/O.

Not everything has to be in the core. For example, semaphores are
easily built out of mutexes.


=head1 DISCUSSION

This interface is an amalgam of

=over 4

=item *

the C<Thread.pm> interface from Perl 5.6.0

=item *

the Win32 thread interface

=item *

my own wish list (you can't get it if you don't ask...)

=back

Here are some issues to consider

=head2 Thread creation

Threads are created by 

  new Thread \&func
  new Thread sub { ... }
  async { ... }

We arguably don't need three different ways to create threads.
However, the different syntaxes fit into the language in slightly
different ways, and I'm not sure which one I'd be willing to give up.
The first is the most fundamental; losing it would be a serious
inconvenience. Perl generally allows an anonymous subroutine where
ever it allows a code ref, so the second also seems appropriate. And
the third allows us to create threads with the kind of lightweight
syntax that makes Perl such a fluent language.


=head2 C<join>

The calling context of C<join> can't be propagated into the thread,
for several reasons.

=over 4

=item *

The thread can compute only one return value, but C<join> can be
called repeatedly in different contexts.

=item *

The thread might terminate before the first call to join. C<join> can
return the last expression evaluated in the thread, but it can't
retroactively affect the context in which that expression was evaluated.

=back

Not allowing multiple C<join>s on a thread might help with the first
problem; I can't see any way around the second.


=head2 Critical sections

This interface provides the

  critical { ... } 

construct. In principle, we don't need this: you can always do the
same thing with a mutex

  { lock $mutex; ... unlock $mutex; }

Nonetheless, critical sections have several attractive features.

=over 4

=item *

They reduce clutter. No mutex to create and lock and unlock.

=item *

Along with less clutter comes fewer chances for bugs. There isn't a
mutex floating around to get abandoned (locked by a thread that has
terminated), or locked twice by the same thread, or locked by the
wrong thread, or locked and never unlocked, or...

=item *

The implementation can be highly optimized. Internally, a critical
section is still protected by some kind of mutex. However, this mutex
isn't user visible: the interpreter has complete control over it.
Therefore, it can be very lightweight.

=back


Efficiency matters, because critical sections are used to manage things that
are...well...critical. Important, global, high-contention resources like
memory managers and process schedulers. Granted, these are poor
examples for Perl, but you get the idea.

There are other kinds of built-in serialization mechanisms. For
example, Java provides one mutex per object; all method calls on the
object are serialized. C<Threads.pm> documents a C<lock(&sub)> call
and a C<:locked> attribute for subroutines. The problem with these
approaches is that they tie serialization to other language
structures, and those structures may not be at the right granularity
for a particular application.

For example, a class may have some methods that require serialization
and some that do not; a mutex that automatically serializes all method
calls on the object needlessly reduces performance (and increases the
chances for deadlock). Conversely, an application may need to
serialize access to a subsystem that comprises multiple objects. In
this case, the programmer has to create and manage their own mutexes;
the built-in mutexes don't help with this.

Similarly, the C<lock(&sub)> call and the C<:locked> attribute co-opt
the subroutine structure of the program in favor of synchronization.
Subroutine structure should driven by program design, not dictated by
language features.


=head2 Mutexes

I dropped the 

  lock($scalar)

call from this interface in favor of Mutexes. The two features provide
roughly the same functionality, so this is partly a matter style. One
possible reason to prefer mutexes is to simplify implementation of
C<wait_all> and C<wait_one>. I'm open to use cases on this.



=head2 Events

I dropped the C<cond_wait> mechanism in C<Threads.pm> in favor of
Events. Events do essentially the same thing with a simpler interface.
In particular, Events don't expose a mutex the way C<cond_wait> does.
(As far as I can tell, this mutex is an artifact of the PThreads
implementation.)

One substantive difference between Events and C<cond_wait> is that the
manual/automatic distinction for Events is a property of the Event
object, while the corresponding broadcast/signal distinction for
C<cond_wait> is a property of the signaling call. I'm open to use
cases that would show a preference for one of these architectures over
the other.


=head2 C<die>

I dropped the I<$thread>->C<eval> call from this interface, and didn't
say what happens if a thread C<die>s. There are several possibilities

=over 4

=item *

The exception is propagated to any thread that C<join>s it. This has a
certain logic to it, but it suffers from the fact that a program
needn't C<join> its threads, so it doesn't guarantee that exceptions
will actually be handled.

=item *

The interpreter prints C<$@> on stderr and exits. This is what C++
does. It ensures that exceptions won't just disappear into the void;
however, it also causes a good deal of anxiety and paranoia, because
I<any> thread can potentially blow your program out of the water. (I
speak from experience here.)

=item *

The thread just quietly goes away. After working with threads in C++,
I'm actually partial to this one. We still need some way to recover
C<$@> when a thread C<die>s. Returning C<$@> to C<join> is probably
the Wrong Thing.

=back


=head2 C<==>

I dropped I<$thread>->C<equal> in favor of overloading C<==> to
compare threads. This seems more natural, and should be easy to
implement if threads are built into the language.


=head2 Thread IDs

I dropped thread IDs from the interface. You don't want thread IDs.
Thread IDs are an implementation artifact. Carrying around explicit
numerical indices isn't the Perl way. They were broken anyway (wrap at
2^32, with no guarantee of uniqueness after that). 


=head2 Detach

I dropped C<detach> from the interface. Detach is an artifact of
languages that require programmers to manage their own storage. It has
rigorous semantics, there's no going back, and if you get it wrong,
you either leak threads or you crash.

In Perl, detachment is more a state of mind. We have threads, and we
have C<Thread> objects to manage them. The thread holds a reference on
its C<Thread> object until it terminates. The C<Thread> object holds a
reference on its thread as long as the C<Thread> object exists.

If there are no user-visible references to a C<Thread> object (i.e.
the only reference on the C<Thread> object is the one held by the
thread), then the thread is said to be detached. A call to
C<Thread>->C<all> or C<Thread>->C<this> could recover a reference to
the C<Thread> object of a detached thread; when this happens, the
thread is no longer detached.

In any case, you don't have to worry about it. Like so many others,
C<detach> is a problem that Perl doesn't have.


=head2 Semaphores

PThreads allows an application to get the current count of a
semaphore. This feature is useless, and an open invitation to bugs. It
does not appear in this interface.


=head2 Import

To minimize namespace pollution, we could @EXPORT_OK the functions
that appear in this interface. 

  use Threads qw(yield wait_all wait_any)

On the other hand, if they get moved into the core, the issue is
probably moot.


=head2 Wait functions

C<wait_all> and C<wait_one> are generalizations of the C<select>(2)
Unix system call and the C<WaitForMultipleObjects> Win32 call. 

C<readable>, C<writable>, and C<failure> are documented as being
file handle methods; however, it is anticipated that file handles will
subsume sockets in Perl6. For an unconnected socket, the semantics of
C<readable> are extended so that it is signaled when a C<connect> or
C<accept> call will not block. Allowing applications to block on
network I/O in a controlled fashion is an important use of the wait
functions.

The wait functions may seem overdone; however, applications really do
need these features, and they can be I<very> difficult to implement
without language support. For example, C<select>(2) doesn't work with
file descriptors for the console, and C<WaitForMultipleObjects>
doesn't work with sockets. I have direct experience with the
difficulty of programming around these deficiencies.

An outstanding problem with the interface documented here is that it
does not guarantee that a socket will still be readable or writable at
the time the application actually attempts I/O, nor does it indicate
I<how many> bytes may be read or written without blocking.

A better approach might be to do asynchronous I/O, and obtain a
synchronization object that is signaled when the I/O operation
completes. I hesitate to specify such an interface until there is more
definition for file handles and asynchronous I/O in Perl6.


=head2 Timer

There are two kinds of timers: relative and absolute. Obviously, you
can always build one kind out of the other, but I wanted to
distinguish them with different constructors. I named the constructors
C<delay> and C<alarm>, respectively. These are short, and read fairly
naturally.


=head2 C<this Thread>

C++ partisans will get brain freeze reading code like

  my $thread = this Thread;

but that's not why I traded in C<self> for C<this>. Really. I did it
because it reads more naturally to me.


=head1 REFERENCES

RFC   1: Implementation of Threads in Perl 

RFC  27: Coroutines for Perl

RFC  31: Co-routines

RFC  47: Universal Asynchronous I/O 

RFC 178: Lightweight Threads
  
Threads.pm
  
PThreads info page
RFC 185 (v1) Thread Programming Model

Reply via email to