Not entirely sure whether this belongs here or a zope list, but
System info on this cluster:
One ZEO server
Two ZEO clients, let's call the clients ZFred and ZJoe.
all using zope 2.7.3
Twice now I have observed the following pattern:
* Somebody complains to me that ZSyncer is not working
in that it reports success, but the new data doesn't actually
appear in pages generated by the cluster.
(For those not familiar, ZSyncer just uses the "Import / Export"
feature of Zope, except that the imported package is received via http
POST and the data is imported from a cStringIO.StringIO() instance
instead of the filesystem.)
So it's as if you've imported some data and apparently succeed
but the new data doesn't actually seem to be there.
(ZFred is the zope server on which the new data gets imported.)
* Later the same day, *while I am looking at the zope management
interface*, one Zope (the ZJoe client) gets stuck.
Responses stop coming out,
and the debug (aka trace aka "big M") log shows that new requests
are coming in but (Lots of "B"s and "Is", no "A"s or "E"s).
The CPU is mostly idle and there is plenty of free ram, so
presumably we are blocking on some I/O.
In both cases I had done a bit of poking around in the management
interface to no apparent harm. In both cases, a request to a
folder's manage_main was the
first request in the long series of "B and I but no A and E" requests.
* After some time (both times it was around 11-13 minutes),
ZJoe gets unstuck and there is a flood of
completed requests in the debug log.
This coincides with a series of ClientDisconnected errors
in the zope event log (corresponding to some http 500 errors in the
access log). (That's why I think it's a ZEO issue and decided
to ask here.)
* The ZEO server log shows nothing at all unusual during this whole
time ... all quiet.
* The other ZEO client, ZFred, has been up all this time and reported no
* We still don't see the data that we imported unless I either restart
zope or use the Control Panel to clear the in-memory ZODB cache.
And then suddenly we see it.
So it appears that the sync succeeds, and ZFred successfully got the
ZEO server to store the changes, but both ZFred and ZJoe are using
stale cache data until I restart them or flush the cache.
I have no idea if the ZEO client disconnection is really relevant or
just a nasty coincidence or what.
Is it possibly relevant that we have significant system clock skew?
ZEO server is about 13 minutes slow, one ZEO client (ZJoe)
is 32 minutes slow, ZFred is 16 minutes slow.
I thought it might be simple network issues, but the two ZEO clients are
on the same subnet and the admins swear they weren't monkeying with
the firewall or anything else.
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org