Forgot to mention: I did the "debug spinning zope" dance (gdb python; attach PID; info threads) and got nothing at all from "info threads". Next time I'll try DeadlockDebugger, see if that turns up anything.
On Mon, Aug 01, 2005 at 05:17:10PM -0400, Paul Winkler wrote: > Not entirely sure whether this belongs here or a zope list, but > here goes... > > System info on this cluster: > One ZEO server > Two ZEO clients, let's call the clients ZFred and ZJoe. > all using zope 2.7.3 > > Twice now I have observed the following pattern: > > * Somebody complains to me that ZSyncer is not working > in that it reports success, but the new data doesn't actually > appear in pages generated by the cluster. > (For those not familiar, ZSyncer just uses the "Import / Export" > feature of Zope, except that the imported package is received via http > POST and the data is imported from a cStringIO.StringIO() instance > instead of the filesystem.) > So it's as if you've imported some data and apparently succeed > but the new data doesn't actually seem to be there. > > (ZFred is the zope server on which the new data gets imported.) > > * Later the same day, *while I am looking at the zope management > interface*, one Zope (the ZJoe client) gets stuck. > Responses stop coming out, > and the debug (aka trace aka "big M") log shows that new requests > are coming in but (Lots of "B"s and "Is", no "A"s or "E"s). > The CPU is mostly idle and there is plenty of free ram, so > presumably we are blocking on some I/O. > > In both cases I had done a bit of poking around in the management > interface to no apparent harm. In both cases, a request to a > folder's manage_main was the > first request in the long series of "B and I but no A and E" requests. > > * After some time (both times it was around 11-13 minutes), > ZJoe gets unstuck and there is a flood of > completed requests in the debug log. > This coincides with a series of ClientDisconnected errors > in the zope event log (corresponding to some http 500 errors in the > access log). (That's why I think it's a ZEO issue and decided > to ask here.) > > * The ZEO server log shows nothing at all unusual during this whole > time ... all quiet. > > * The other ZEO client, ZFred, has been up all this time and reported no > problems. > > * We still don't see the data that we imported unless I either restart > zope or use the Control Panel to clear the in-memory ZODB cache. > And then suddenly we see it. > > So it appears that the sync succeeds, and ZFred successfully got the > ZEO server to store the changes, but both ZFred and ZJoe are using > stale cache data until I restart them or flush the cache. > I have no idea if the ZEO client disconnection is really relevant or > just a nasty coincidence or what. > > Is it possibly relevant that we have significant system clock skew? > ZEO server is about 13 minutes slow, one ZEO client (ZJoe) > is 32 minutes slow, ZFred is 16 minutes slow. > > I thought it might be simple network issues, but the two ZEO clients are > on the same subnet and the admins swear they weren't monkeying with > the firewall or anything else. > > -- > > Paul Winkler > http://www.slinkp.com > _______________________________________________ > For more information about ZODB, see the ZODB Wiki: > http://www.zope.org/Wikis/ZODB/ > > ZODB-Dev mailing list - [email protected] > http://mail.zope.org/mailman/listinfo/zodb-dev -- Paul Winkler http://www.slinkp.com _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - [email protected] http://mail.zope.org/mailman/listinfo/zodb-dev
