Re: some thoughts on multitasking

Rocco Caputo Sun, 20 Jul 2003 12:37:57 -0700

On Sun, Jul 20, 2003 at 10:49:29AM -0700, Rich Morin wrote:
> I'm have been giving some thought to different approaches to scheduling
> activities.  Here are some notes, for comment and discussion...
> 
> -r
> 
>  * shared process
> 
>    All activities share a single process.  Process-level state (e.g.,
>    nice(2) level) is common to all; program data may or may not be shared.
> 
>    * call a subroutine
> 
>      This technique, used by POE, has the advantage of low start-up
>      overhead.  Unfortunately, the multitasking is "cooperative" (ala Mac
>      OS 9), rather then "pre-emptive".  So, if an activity doesn't play
>      nicely, the entire subsystem gets dragged down.
> 
>      Because subroutines share a common name space for data, communication
>      among the activities is easy.  Concurrency issues are (largely)
>      avoided because only one subroutine can be running at a time.  If a
>      subroutine "yields" control, however, care must be taken to ensure
>      that its working data does not get compromised.


I think you are mostly correct, but you don't note that POE includes a
mechanism to keep each session's data separate.

Each POE task includes its own storage space.  It's called a "heap"
because sessions are in some ways modeled after UNIX processes.  Heaps
are anonymous hash references that POE guarantees not to touch.  They
are reserved for sessions' use only.

Subroutines are called within the context of one or another session
instance.  This is also modeled after UNIX processes: when you call
alarm(3), the kernel knows which process gets SIGALRM.  When you call
$kernel->alarm() in POE, POE::Kernel knows which session to register the
alarm with.  That session will receive the alarm event, and the
associated subroutine is called in that session's context.

The session context includes $_[SESSION] (a reference to the session
instance) and $_[HEAP] (a reference to the session's heap).  If a
subroutine uses $_[HEAP] for storage, several session instances can
share it without data colissions.

On the other hand, file-scoped lexical and package-scoped global
variables are an easy way to "share" memory between sessions.

>    * spawn a thread
> 
>      In Perl's implementation, each thread has its own, independent, set
>      of data.  Threads can share data, by agreement, but care must be
>      taken to avoid deadlocks, etc.

POE is due to adopt Perl's threading.  Arthur Bergman (the author of
Perl's threading implementation) has been awarded a Perl Foundation
grant to incorporate them into POE.  If I had to guess, he will start
working on it after all the conferences are over.

>  * separate process
> 
>    Each activity has its own process.  Process-level state (e.g., nice(2)
>    level) and program data are not shared.
> 
>    * fork(2), then exec(2) a new program
> 
>      This technique, used by cron(8), provides complete independence for
>      each activity.  The program may be written in an arbitrary language,
>      have its own nice(2) level, etc.
> 
>      The drawback is that the exec(2) and subsequent start-up activities
>      are quite time-consuming; if we're doing a lot of this, the overhead
>      will be substantial.
> 
>    * fork(2) the process, then call a subroutine
> 
>      This technique, used by many system daemons, takes advantage of the
>      fact that fork(2) is a highly-optimized mechanism on BSD systems.
>      Although each process acts as if it has independent state, only a
>      small amount of information must be copied initially; the remainder
>      is copied (via "copy on write" facilities) if and when the processes
>      diverge.
> 
>      If the parent Perl script loads (via require) all of the routines
>      that any of the children will need to call, the start-up overhead
>      for the individual activities will be very small.

I think your descriptions are accurate.

POE::Wheel::Run supports both techniques.  This class also handles pipe
redirection of the child's stdio.  A put() call on a Wheel::Run instance
sends information to the child's STDIN.  STDOUT and STDERR arrive at the
parent as events.

IPC between a parent process and its children allows us to spawn several
persistent workers.  Even if they must be spawned through exec(2), that
overhead can be absorbed at initialization time and kept away from the
runtime.

After the pool is filled, requests can be sent to free children, and
their completed responses may be passed back asynchronously.

POE::Filter::Reference helps by encapsulating Perl data serialization.
It's used to send arbitrarily complex Perl structures across sockets and
pipes, including the pipes between parent and child processes.
POE::Filter::Reference does not require the rest of POE to be loaded, so
forked child processes may be relatively lightweight.

-- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/

Re: some thoughts on multitasking

Reply via email to