This and other RFCs are available on the web at http://tmtowtdi.perl.org/rfc/ =head1 TITLE Implementation of Threads in Perl =head1 VERSION Maintainer: Bryan C. Warnock <[EMAIL PROTECTED]> Date: 04 Aug 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 1 =head1 ABSTRACT Perl 6 should be built around threads from the beginning. =head1 DESCRIPTION Perl 5 attempted (with relatively good success) to implement threads atop the current architecture. It did, unfortunately, leave several gaps, traps, and "features" in heavy concurrency uses. These weaknesses could be fixed if Perl was built with threading from the start. All Perl programs are threaded. Most just only have one. =head1 MOTIVATORS Impatience, Hubris, and Laziness, in that order. =head1 IMPLEMENTATION Attempt to build-in thread constructs for the internals, while allowing a Thread module to safely and robustly add user thread constructs, while not making things bad for the single-threaded folks. =head2 SUMMARY OF IMPLEMENTATION The summary is based on the current Perl 5 architecture. As the internal structure changes, like using vtables, the thread design will have to change. =over 4 =item * Create an additional pseudo-global stash, one per thread created, that is local to that thread. This stash would be the default space for non- lexical variables. C<$main::foo> == C<$foo> within one thread, while C<$main::foo> != C<$main::foo> in different threads. There need be no way to specify the particular thread-space, as it should be visible only to the owning thread. =item * The Thread module should add a C<global> keyword or function that explicitly access a variable in the program-global stash. C<global $main::foo = $foo; # Let another thread know what my $foo is.> C<global $main::foo = \$foo; # Share my local foo. Dangerous!> C<$foo = global $main::foo; # Localize this instance of $main::foo.> =item * The Thread module should, on inclusion, also set the optree flag that triggers mutex locking on variables within the perl core itself. (As differentiated by a user-created and controlled mutex.) This is to guarantee that the above constructs will actually work - user created race conditions aside. =item * Populate the thread-space stash with the built-ins, vice the program global stash. Very few of the built-ins are meaningless in this threaded construct, most are truly independent, and those that aren't, like $^O, should probably be read-only anyway. =back =head2 IMPACT =over 4 =item * Impact on Perl on a non-thread-supporting architecture. None. (The mutex locking code would be no-opped out, the Thread module would fail on inclusion, preventing any of the global semantics from being invoked. The thread space would appear to the program to be a standard global stash.) =item * Impact on Perl built for non-threaded use. None. Same as above. =item * Impact on a single-threaded program under a multi-threaded Perl. None, most likely, for the above reasons. (There would be an additional flag check, vice, I believe, automatic mutex locking under the current scheme.) =item * Impact on multi-threaded scripts under a multi-threaded Perl. Some. Mutex locking would occur much as it does today. Current Perl scripts, without the knowledge of global versus thread space would find data-sharing broken. Threads have been declared experimental, and I believe the benefits of simplifying threads in general outweigh the heartache of those (who would benefit) that would have to change their programs. In addition, see the notes about module inclusion below. =item * Impact on Perl 5. Possible mutual compatibility between Perl 5 and Perl 6, with the exception of C<use Thread> and the sematics it would add. See the notes below about module inclusion. (Obviously, other changes to the language notwithstanding.) =back =head2 UNKNOWNS =over 4 =item * Probably the biggest unknown, and the one with the largest potential impact, will be exactly how module inclusion will work with threads under Perl 6. Currently, modules are parsed and interpreted at compile time in a global scope. Under the above architecture, this will populate the primary thread by default, as secondary threads are a run-time issue. So how do secondary threads C<use> a module? How does a module's symbols find themselves in the proper thread space, instead of cramming the primary thread space at compile time, or, worse yet, completely undoing the entire point of threads by making everything global? Certainly, the global approach, by default, is not the desired solution, as you now lose any ability to make your interface reentrant, unless it is specifically designed and tested for thread use. In which case, the module would then need to C<use Threads>, which would then initiate multi-threading, assuming the core and platform supported it, even if the original program didn't. Evil, evil, evil. Another possibility, and another one I do not like, is that a thread inherits the entire stash of the parent thread. Now, you either need to duplicate the entire stash, or resolve yourself to automatically sharing all the data. Neither one is acceptable, for what I hope are obvious reasons. So that means that C<use Threads> must now also define a method for runtime inclusion of modules. This, in and of itself, should not be too difficult. A possible syntax might be to include the necessary module names as arguments to the spawning call. But there are issues of lexical scoping across multiple threads that could be an issue. Lastly, how do other compile-time constructs, such as C<BEGIN> and C<END> blocks, deal with handling thread-space? Is there going to be a need to support similar constructs for thread creation? =item * Mutex locking of a hash or array, and the scalars they contain, and vice versa? =item * Mutex locking of a reference and the referree. =item * Limitations or assumptions on threading schemes other than those in pthreads, due to the author's lack of experience with anything but. =back =head1 REFERENCES None, currently. =head1 CHANGES =over 4 =item * Added module inclusion lament under L<"UNKNOWNS">. =back