[Dieter Maurer] >>> The problem occured in a ZEO client which called "asyncore.poll" >>> in the forked subprocess. This "poll" deterministically >>> stole ZEO server invalidation messages from the parent.
[Tim Peters] >> I'm sorry, but this is still too vague to guess what happened. [Dieter Maurer] > Even when I sometimes make errors, my responses usually contain > all relevant information. I agree, but for whatever reason I'm having a very hard time following this message thread. >> - Which operating system was in use? > The ZEO client application mentioned above is almost independent > of the operating system -- beside the fact, that is uses > "fork" (and therefore requires the OS to support it). The OS is important because the semantics of fork() depend on the OS. > Therefore, I did not mention that the application was running > on Linux 2. OK, so, e.g., the Solaris fork() semantics play no role in the actual damage you saw. >> - Which thread package? > The application mentioned above does not use any thread. > Therefore, it is independent of the thread package. > Would it use threads it were "LinuxThreads" (but it does not). You said the app was a ZEO client, and, if that's so, it uses multiple threads whether or not your part of the app creates threads of its own. For example, a ZEO client creates a new thread just to connect to a ZEO server. If this is a ZEO client that never connects to a ZEO server, then perhaps threads are wholly irrelevant. > There is no mystery at all that the application lost ZEO server > invalidation messages. It directly follows from the fork > semantics with respect to file descriptors. I can believe that's the truth, but I confess I still don't see how. > The problem I saw for wider Zope/ZEO client usage came alone > from reading the Linux "fork" manual page which indicates > (or at least can be interpreted) that child and parent have the same threads. > There was no concrete observation that messages are lost/duplicated > in this szenario. Good! Thanks. > Meanwhile, I checked that "fork" under Linux with LinuxThreads > behaves with respect to threads as dictated by the POSIX > standard: the forked process has a single thread and > does not inherit other threads from its parent. > > I will soon check how our Solaris version of Python behaves. > If this, too, has only one thread, I will apologize for > the premature warning... Solaris offers (or imposes <0.9 wink>) choices that don't exist on other platforms. One Solaris choice is whether you link Python with native Solaris threads, or with the Sun POSIX pthreads library. Another choice is whether you call Solaris fork() or Solaris fork1() (note that Python exposes fork1() on platforms that have it -- fork1() clones only the calling threading). The dangerous combination is Solaris threads + Solaris fork(). The other 3 combinations are harmless in this respect. Note that even using Solaris threads, it doesn't follow that places where Linux calls fork() under the covers are also places Solaris calls fork() under the covers. For example, Solaris system() calls Solaris vfork() under the covers, which differs from Solaris fork() in several key respects (and also differs from Solaris fork1()). The most relevant way vfork() differs from fork() under Solaris is that vfork() only clones the calling thread. >> - In the ZEO client that called fork(), did it call fork() directly, or >> indirectly as the result of a system() or popen() call? Or what? > The ZEO client as the basic structure: > > while 1: > work_to_do = get_work(...) > for work in work_to_do: > pid = fork() > if pid == 0: > do_work(work) > # will not return > sleep(...) > > "do_work" opens a new ZEO connection. > "get_work" and "do_work" use "asyncore.poll" to synchronize with incoming > messages from ZEO -- no "asyncore.mainloop" around. > > The "poll" in "do_work" has stolen ZEO invalidation messages > destined for the parent such that "get_work" has read old state > and returned work items already completed. That is the problem > I saw. Well, don't do that then <wink>. > All this is easy to understand, (almost) platform independent > and independant of the thread library. I still wouldn't say it's easy to understand. While the thread that calls fork isn't running an asyncore loop, it must still be the case that asyncore in the parent has a non-empty map -- yes? If it had an empty map, the child processes would start with a clean slate (map), and so wouldn't pick up socket traffic meant for the parent. If that's so, it looks like just clearing asyncore's map in the child (before do_work()) would solve the (main) problem. > *Iff* a thread library lets a forked child inherit all threads > then the problem I announced in this "Warning" thread can > occur, as it then behaves similarly to my application > above (with an automatic rather than a explicit "poll"). I still don't want to rush to generalizations; as above, even on Solaris with native Solaris threads and clone-everything Solaris fork(), system() should be harmless regardless. I don't know about popen() on Solaris, though; etc etc. > It may well be that there is no thread library that does this. > In your words: all thread implementations may be "sane" > with respect to thread inheritance... At least Solaris fork() with Solaris native threads is not sane in this respect. Solaris fork1() with Solaris native threads is sane, ditto any flavor of Solaris fork with Sun pthreads. And sanity is relative <wink>. _______________________________________________ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )