Hi Tim, Tim Peters wrote at 2004-6-27 17:06 -0400: >[Dieter Maurer] >> The problem occured in a ZEO client which called "asyncore.poll" >> in the forked subprocess. This "poll" deterministically >> stole ZEO server invalidation messages from the parent. > >I'm sorry, but this is still too vague to guess what happened.
Even when I sometimes make errors, my responses usually contain all relevant information. >- Which operating system was in use? The ZEO client application mentioned above is almost independent of the operating system -- beside the fact, that is uses "fork" (and therefore requires the OS to support it). Therefore, I did not mention that the application was running on Linux 2. >- Which thread package? The application mentioned above does not use any thread. Therefore, it is independent of the thread package. Would it use threads it were "LinuxThreads" (but it does not). There is no mystery at all that the application lost ZEO server invalidation messages. It directly follows from the fork semantics with respect to file descriptors. The problem I saw for wider Zope/ZEO client usage came alone from reading the Linux "fork" manual page which indicates (or at least can be interpreted) that child and parent have the same threads. There was no concrete observation that messages are lost/duplicated in this szenario. Meanwhile, I checked that "fork" under Linux with LinuxThreads behaves with respect to threads as dictated by the POSIX standard: the forked process has a single thread and does not inherit other threads from its parent. I will soon check how our Solaris version of Python behaves. If this, too, has only one thread, I will apologize for the premature warning... >- In the ZEO client that called fork(), did it call fork() directly, or > indirectly as the result of a system() or popen() call? Or what? > I'd like to understand a specific failure before rushing to > generalization. The ZEO client as the basic structure: while 1: work_to_do = get_work(...) for work in work_to_do: pid = fork() if pid == 0: do_work(work) # will not return sleep(...) "do_work" opens a new ZEO connection. "get_work" and "do_work" use "asyncore.poll" to synchronize with incoming messages from ZEO -- no "asyncore.mainloop" around. The "poll" in "do_work" has stolen ZEO invalidation messages destined for the parent such that "get_work" has read old state and returned work items already completed. That is the problem I saw. All this is easy to understand, (almost) platform independent and independant of the thread library. *Iff* a thread library lets a forked child inherit all threads then the problem I announced in this "Warning" thread can occur, as it then behaves similarly to my application above (with an automatic rather than a explicit "poll"). It may well be that there is no thread library that does this. In your words: all thread implementations may be "sane" with respect to thread inheritance... -- Dieter _______________________________________________ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )