Hi Zope (and Python) experts!

There seems to be a problem when an external python module segfaults
during a zope request. The remaining worker threads are deadlocked.

I think this is the same problem as Dieter pointed out in his message
to zope-dev "[Problem] strange state after SIGSEGV":


The reason is the way python handles threads on some systems
(RedHat-7.3, kernel 2.4.20, without NPTL). I've written a small python
extension, which does nothing but segfault[1]. With this, i made the
following simulation, where one thread acquires a lock and segfaults:

  #!/usr/bin/env python2.3

  import thread
  import time
  import _segfault

  _lock = thread.allocate_lock()

  def worker():

  thread.start_new_thread(worker, ())
  thread.start_new_thread(worker, ())
  thread.start_new_thread(worker, ())
  thread.start_new_thread(worker, ())


  print 'Bye...'

On my RedHat-7.3 box (kernel 2.4.20-18, without NPTL) i get the
following behaviour. After starting the program, pstree shows this:


After the 10 seconds sleep, one worker gets the lock, and
segfaults. After that, pstree shows this:


Three remaining worker threads (without main thread).

Gdb shows, that they wait for the lock (but they wont get it):

  (gdb) info stack
  #0  0x420293d5 in sigsuspend () from /lib/i686/libc.so.6
  #1  0x40031609 in __pthread_wait_for_restart_signal ()
     from /lib/i686/libpthread.so.0
  #2  0x4003272c in sem_wait@@GLIBC_2.1 () from /lib/i686/libpthread.so.0
  #3  0x080c7b2d in PyThread_acquire_lock (lock=0x8170728, waitflag=1)
      at Python/thread_pthread.h:406

(On a side note, as python threads block all signals, these worker
threads cannot be stopped with SIGTERM. They must be killed with SIGKILL.)

All this has the consequences Dieter described:
>   Consequences:
>     *  Zope did no longer respond to requests
>     *  "stop" did not work (as "SIGTERM" was ineffective)
>     *  "start" did not work, as the dangling processes kept
>        the HTTP port bound.

So i think i know what's happening, but i don't know how to fix it!
Can anyone help please? Any hints are highly appreciated!


PS: A RedHat-9 system (kernel 2.4.20, with NPTL) shows a different
behaviour. After the segfault, all threads disappeared. So maybe
all is ok with NPTL, but i've not tested it yet...

[1] segfault module


  char *x = 0;

  *x = 'a';


%module segfault

void segfault(void);


$ swig -python segfault.i
$ gcc -I/usr/local/include/python2.3 -c segfault_wrap.c -o segfault_wrap23.o
$ gcc -c -o segfault.o segfault.c
$ gcc -shared segfault_wrap23.o segfault.o -o _segfault.so

[EMAIL PROTECTED]                Fax: +43/1/31336/9207
Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria

Zope-Dev maillist  -  [EMAIL PROTECTED]
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope )

Reply via email to