Not entirely sure whether this belongs here or a zope list, but here goes...
System info on this cluster: One ZEO server Two ZEO clients, let's call the clients ZFred and ZJoe. all using zope 2.7.3 Twice now I have observed the following pattern: * Somebody complains to me that ZSyncer is not working in that it reports success, but the new data doesn't actually appear in pages generated by the cluster. (For those not familiar, ZSyncer just uses the "Import / Export" feature of Zope, except that the imported package is received via http POST and the data is imported from a cStringIO.StringIO() instance instead of the filesystem.) So it's as if you've imported some data and apparently succeed but the new data doesn't actually seem to be there. (ZFred is the zope server on which the new data gets imported.) * Later the same day, *while I am looking at the zope management interface*, one Zope (the ZJoe client) gets stuck. Responses stop coming out, and the debug (aka trace aka "big M") log shows that new requests are coming in but (Lots of "B"s and "Is", no "A"s or "E"s). The CPU is mostly idle and there is plenty of free ram, so presumably we are blocking on some I/O. In both cases I had done a bit of poking around in the management interface to no apparent harm. In both cases, a request to a folder's manage_main was the first request in the long series of "B and I but no A and E" requests. * After some time (both times it was around 11-13 minutes), ZJoe gets unstuck and there is a flood of completed requests in the debug log. This coincides with a series of ClientDisconnected errors in the zope event log (corresponding to some http 500 errors in the access log). (That's why I think it's a ZEO issue and decided to ask here.) * The ZEO server log shows nothing at all unusual during this whole time ... all quiet. * The other ZEO client, ZFred, has been up all this time and reported no problems. * We still don't see the data that we imported unless I either restart zope or use the Control Panel to clear the in-memory ZODB cache. And then suddenly we see it. So it appears that the sync succeeds, and ZFred successfully got the ZEO server to store the changes, but both ZFred and ZJoe are using stale cache data until I restart them or flush the cache. I have no idea if the ZEO client disconnection is really relevant or just a nasty coincidence or what. Is it possibly relevant that we have significant system clock skew? ZEO server is about 13 minutes slow, one ZEO client (ZJoe) is 32 minutes slow, ZFred is 16 minutes slow. I thought it might be simple network issues, but the two ZEO clients are on the same subnet and the admins swear they weren't monkeying with the firewall or anything else. -- Paul Winkler http://www.slinkp.com _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [email protected] http://mail.zope.org/mailman/listinfo/zodb-dev
