Forgot to mention: I did the "debug spinning zope" dance
(gdb python; attach PID; info threads) and got nothing at all
from "info threads". Next time I'll try DeadlockDebugger,
see if that turns up anything.
On Mon, Aug 01, 2005 at 05:17:10PM -0400, Paul Winkler wrote:
> Not entirely sure whether this belongs here or a zope list, but
> here goes...
> System info on this cluster:
> One ZEO server
> Two ZEO clients, let's call the clients ZFred and ZJoe.
> all using zope 2.7.3
> Twice now I have observed the following pattern:
> * Somebody complains to me that ZSyncer is not working
> in that it reports success, but the new data doesn't actually
> appear in pages generated by the cluster.
> (For those not familiar, ZSyncer just uses the "Import / Export"
> feature of Zope, except that the imported package is received via http
> POST and the data is imported from a cStringIO.StringIO() instance
> instead of the filesystem.)
> So it's as if you've imported some data and apparently succeed
> but the new data doesn't actually seem to be there.
> (ZFred is the zope server on which the new data gets imported.)
> * Later the same day, *while I am looking at the zope management
> interface*, one Zope (the ZJoe client) gets stuck.
> Responses stop coming out,
> and the debug (aka trace aka "big M") log shows that new requests
> are coming in but (Lots of "B"s and "Is", no "A"s or "E"s).
> The CPU is mostly idle and there is plenty of free ram, so
> presumably we are blocking on some I/O.
> In both cases I had done a bit of poking around in the management
> interface to no apparent harm. In both cases, a request to a
> folder's manage_main was the
> first request in the long series of "B and I but no A and E" requests.
> * After some time (both times it was around 11-13 minutes),
> ZJoe gets unstuck and there is a flood of
> completed requests in the debug log.
> This coincides with a series of ClientDisconnected errors
> in the zope event log (corresponding to some http 500 errors in the
> access log). (That's why I think it's a ZEO issue and decided
> to ask here.)
> * The ZEO server log shows nothing at all unusual during this whole
> time ... all quiet.
> * The other ZEO client, ZFred, has been up all this time and reported no
> * We still don't see the data that we imported unless I either restart
> zope or use the Control Panel to clear the in-memory ZODB cache.
> And then suddenly we see it.
> So it appears that the sync succeeds, and ZFred successfully got the
> ZEO server to store the changes, but both ZFred and ZJoe are using
> stale cache data until I restart them or flush the cache.
> I have no idea if the ZEO client disconnection is really relevant or
> just a nasty coincidence or what.
> Is it possibly relevant that we have significant system clock skew?
> ZEO server is about 13 minutes slow, one ZEO client (ZJoe)
> is 32 minutes slow, ZFred is 16 minutes slow.
> I thought it might be simple network issues, but the two ZEO clients are
> on the same subnet and the admins swear they weren't monkeying with
> the firewall or anything else.
> Paul Winkler
> For more information about ZODB, see the ZODB Wiki:
> ZODB-Dev mailing list - ZODB-Dev@zope.org
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org