Forgot to mention: I did the "debug spinning zope" dance
(gdb python; attach PID; info threads)  and got nothing at all
from "info threads".  Next time I'll try DeadlockDebugger,
see if that turns up anything.

On Mon, Aug 01, 2005 at 05:17:10PM -0400, Paul Winkler wrote:
> Not entirely sure whether this belongs here or a zope list, but
> here goes...
> 
> System info on this cluster:
> One ZEO server
> Two ZEO clients, let's call the clients ZFred and ZJoe.
> all using zope 2.7.3
>  
> Twice now I have observed the following pattern:
> 
> * Somebody complains to me that ZSyncer is not working
>   in that it reports success, but the new data doesn't actually
>   appear in pages generated by the cluster.
>   (For those not familiar, ZSyncer just uses  the "Import / Export" 
>   feature of Zope, except that the imported package is received via http
>   POST and the data is imported from a cStringIO.StringIO() instance
>   instead of the filesystem.)
>   So it's as if you've imported some data and apparently succeed
>   but the new data doesn't actually seem to be there.
> 
>   (ZFred is the zope server on which the new data gets imported.)
> 
> * Later the same day, *while I am looking at the zope management
>   interface*, one Zope (the ZJoe client) gets stuck.  
>   Responses stop coming out,
>   and the debug (aka trace aka "big M") log shows that new requests
>   are coming in but  (Lots of "B"s and "Is", no "A"s or "E"s).
>   The CPU is mostly idle and there is plenty of free ram, so 
>   presumably we are blocking on some I/O.
> 
>   In both cases I had done a bit of poking around in the management
>   interface to no apparent harm. In both cases, a request to a 
>   folder's manage_main was the 
>   first request in the long series of "B and I but no A and E" requests. 
> 
> * After some time (both times it was around 11-13 minutes), 
>   ZJoe gets unstuck and there is a flood of 
>   completed requests in the debug log.
>   This coincides with a series of ClientDisconnected errors
>   in the zope event log (corresponding to some http 500 errors in the
>   access log). (That's why I think it's a ZEO issue and decided
>   to ask here.)
> 
> * The ZEO server log shows nothing at all unusual during this whole
>   time ... all quiet.
> 
> * The other ZEO client, ZFred, has been up all this time and reported no
>   problems.
> 
> * We still don't see the data that we imported unless I either restart
>   zope or use the Control Panel to clear the in-memory ZODB cache.
>   And then suddenly we see it.
> 
> So it appears that the sync succeeds, and ZFred successfully got the
> ZEO server to store the changes, but both ZFred and ZJoe are using
> stale cache data until I restart them or flush the cache.
> I have no idea if the ZEO client disconnection is really relevant or
> just a nasty coincidence or what.
> 
> Is it possibly relevant that we have significant system clock skew?
> ZEO server is about 13 minutes slow, one ZEO client (ZJoe) 
> is 32 minutes slow, ZFred is 16 minutes slow.
> 
> I thought it might be simple network issues, but the two ZEO clients are
> on the same subnet and the admins swear they weren't monkeying with
> the firewall or anything else.
> 
> -- 
> 
> Paul Winkler
> http://www.slinkp.com
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
> 
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> http://mail.zope.org/mailman/listinfo/zodb-dev

-- 

Paul Winkler
http://www.slinkp.com
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to