[PHP-DEV] Implementing Threading: Some Hints (Was Re: [PHP-DEV] phpthreads - hints anyone...)

Wez Furlong Thu, 01 Aug 2002 06:58:10 -0700

Hi Alan,

I've missed the start of this thread on threads because our
internet connection was physically down for a couple of days,
but hopefully what I've said here is useful to you.
It started as a couple of comments but then grew a bit long...


You might find some of the code in the activescript sapi useful;
it's in CVS and only builds for Windows, but might give you a couple
of hints.  It doesn't implement threading (directly), but does
have some code that helps you to see what is required.

Each thread must have it's own copy of the engine.
This is a "hard" requirement of the PHP/TSRM architecture.

Communication between threads needs some kind of synchronization
mechanism.  Zvals need to be proxied so that access is serialized.
ActiveScript does this by exporting the zvals as IDispatch-able
COM objects and lets windows manage serialization via it's message
queues.

For other platforms you would need to implement some way to "pre-empt"
a thread so that it can access a zval and return the data.
You could implement this using ticks, or other similar mechanism -
perhaps as a Zend extension.

Proxying the zvals means the a given zval can only be accessed in
the context of the thread that created it - not 100% useful for
threading, but this is a "limitation" (feature?) of the memory
management system - memory is managed per-thread, so write accesses
(including changing a refcount) may trigger allocations/deallocations.
If that happens on the wrong thread, you get inconsistent hash tables
and most probably segfaults in BOTH threads.

Likewise, the zend functions and classes structures are managed in
a similar way. You must make your own copy of those for each thread
in case that thread includes/requires other scripts, or creates new
functions/classes dynamically using eval or similar.

To make threading useful, you would need to somehow arrange for multiple
threads to access the same underlying zval data without blocking all
the threads.  This just isn't possible AFAIK.

If you want fast read-only access to "shared" zvals, you can serialize
(just like sessions) the zval from the main thread and unserialize it
into your new thread.  This isn't strictly read-only, although writes
will only be visible to the new thread.  The advantage is that since
the zval lives in the threads own engine/address-space, no
thread-serialization occurs so performance is "better".


As a summary of the above, threading would work something like below:

  Given a zval, allocates a struct that identifies the zval
  as belonging to this thread and assigns it an id (much like the resource
  management system, but using global malloc()d memory).
  The id can be passed to other threads (much like session serialization)
  and used to construct a proxy (or just return the actual zval if it's
  on the same thread as the zval).

proxy_data proxyize(zval * zval);


  Given a proxy id: if the zval is on the same thread as the current
  caller, return the zval from the proxy data.  Otherwise, look in
  the list of "running" proxies for this thread; if we already have
  a proxy for this zval, return it.  Otherwise we have not yet accessed
  the zval from this thread.  Create an overloaded object that will
  use the thread synchronization methods to access the zval in it's
  owners thread.

zval * proxy_to_zval(proxy_data data);


  Returns a php script representing all the functions and classes
  defined in the current engine context.  This would be generated
  by examining the zend structures.  It has to be serialized like
  this since the zend structures are emalloc-ed.
  Ideally, this would return a malloc-ed copy of the structures that
  would then be emalloc-ed into the address space of the new thread.
  As a first attempt, it might be simpler just to convert to a
  string and ask the new engine to recompile it.

char * serialize_functions_and_classes();


struct thread_create_info {
  char * functions;
  int globals_count;
  char ** globals_keys
  proxy_data ** globals_values;
  char * threadfunc;
}

// proto resource thread_create(string threadfunc);
PHP_FUNCTION(thread_create)
{
   thread_create_info * info = malloc(sizeof(thread_create_info));

   info->functions = serialize_functions_and_classes();
   info->threadfunc = strdup(threadfunc);

   // could do similar thing for passing params to the thread func
   info->globals_count = count($GLOBALS);
   for (i = 0; i < info->globals_count; i++) {
      info->globals_keys[i] = strdup(key($GLOBALS));
      info->globals_values[i] = proxyize(current($GLOBALS));
      next($GLOBALS);
   }

   // use your platform-specific function here
   pthread_create(phpthreads_threadfunc, info);

   // return some kind of token for that thread
}

phpthreads_threadfunc(thread_create_info * info)
{
   zval ** args;

   /* create a new zend engine instance */
   ...

   /* load in functions (and classes) */
   compile_string(info->functions);
   free(info->functions);

   /* import global data */
   // could do similar thing to import thread function args   
   for (i = 0; i < info->globals_count; i++) {
      $GLOBALS[info->globals_keys[i]] = proxy_to_zval(info->globals_values[i]);
      free(info->globals_keys[i]);
   }

   call_user_func(info->threadfunc);

   free(info->threadfunc);
   free(info);

   /* close down the zend engine instance */
}

For the user, their script would look like this:

<?php
function another_thread()
{
   // This $GLOBALS access will switch contexts to the main thread
   // to retrieve the value (= slow)
   echo "Another thread: hello is " . $GLOBALS["hello"] . "\n";
}

$GLOBALS["hello"] = "hi there";
$thread = thread_create("another_thread");

// If main engine dies before the child, there is a chance for
// segfaults...
thread_wait($thread);
?>

Conclusion:
I'm not sure if this threading implementation will have enough performance
to warrant it's use (generally speaking), although high performance scripts
can be written in this framework if the user is aware of the issues.

If we change the way we create the thread so that there is a super-global
called $THREADGLOBALS that holds the proxyized contents of $GLOBALS, and
make $GLOBALS in the new thread hold serialized versions of the original
$GLOBALS, we will end up with a faster-by-default version:

<?php
function another_thread()
{
   // This $GLOBALS access occurs within our own thread (=fast)
   echo "Another thread: hello is " . $GLOBALS["hello"] . "\n";

   // this changes the thread-local version only. This is fast
   // but other threads cant see it
   $GLOBALS["hello"] = "local";

   // this changes the value in the main thread.
   // this is slower than the other methods above.
   // the new value is only visible to child threads using the
   // $THREADGLOBALS superglobal or to the main thread using
   // it's regular $GLOBALS access.
   // the main thread should create $THREADGLOBALS as an alias
   // for GLOBALS so that thread aware code can use $THREADGLOBALS
   // without worrying about which thread they are running in.
   $THREADGLOBALS["hello"] = "global";

   // copy the shared value into local space (=slow) 
   $GLOBALS["hello"] = $THREADGLOBALS["hello"];
}

$GLOBALS["hello"] = "hi there";
$thread = thread_create("another_thread");

// If main engine dies before the child, there is a chance for
// segfaults...
thread_wait($thread);
?>

Phew!  Thats a long mail.  I hope it makes sense.
What I suggest, if you are (still!) serious about threading PHP,
is that you work with ZE2 and use Haralds RPC extension for the
proxies (that will make things a little bit easier).

I don't think there is an easier way than this!

--Wez.





-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Implementing Threading: Some Hints (Was Re: [PHP-DEV] phpthreads - hints anyone...)

Reply via email to