Hi. I think I discovered a dangerous code behaviour at zope shutdown.
I've had a strange problem on a site where persistent objects are created from data inserted in an SQL table. Upon object creation, SQL table is updated to mark the line as imported. Such import got triggered just before a shutdown. After restart and another import, documents were created twice. What I believe happened (but I could not find any hard evidence of it) is that Zope blindly exited while the working thread was runing, and in the worst possible method: tpc_finish. ZODB was already commited, but mysql was not. So mysql did a rollback on changes, and the lines were in a "ready to import" state. And imported again at next import attemp. Reading shutdown code, I discovered 2 distinct timeout mechanism (note: having just one is enough to trigger the problem): - Lifetime.py: iterating through asyncore sockets, it alerts servers that it will shut down soon. If they take the veto for too long, the veto is ignored and shutdown continues. Default timeout is 20 seconds, meaning there is at most one minute from the first shutdown notice to the effective process exit (taking all runing threads down). When invoking "zopectl stop", it's runing a "fast" shutdown, which means the timeout is shortened to 1 second, so total maximum sutdown time is 3 seconds. This timeout can be worked around by just writing blocking shutdown methods and not using the veto system. - zdaemon/zdrun.py: if the instance being shut down still responds after 10 seconds, it will be sent a SIGKILL. This cannot be worked around without changing code in zdrun.py or not executing it at all (no idea if there is any alternative). I could easily reproduce the problem by writing a simple connection mamager which calls time.wait(3600) in _finish method and defining a sortKey method to make it commit after another connection manager. I could not find a trace of any mechanism preventing commit from happening when a shutdown is in progress, and I don't think there should be any: considering that some storages might be accessed through a network, latency can become a problem, so tpc_finish can take time to complete, so just checking that there is no pending shutdown before entering this function would not solve the problem. I suggest removing all those timeouts. If a user wants a Zope to shutdown for a reason serious enough to send it a SIGKILL or causing immediate python thread termination, it's his responsibility. But I think regular shutdown mechanism must not do that. Also, the same problem can happen with "zopectl fg" since Zope does not go through any shutdown sequence as far as I can tell (it just dies). -- Vincent Pelletier _______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )