Thanks so much for helping to pin this down. Thanks for the observation that the error occurs more frequently when the timeout is small. This is likely because the timeout value directly impacts the frequency of "housekeeping" operations, where old buckets are garbage collected and new buckets are added. The housekeeping operations must confuse the underlying storage. I don't know how this could happen, but it does. I suspect it's related somehow to conflict errors.
So we seem to have two bugs: - The first is that turning off read conflicts allows for the potential for the session data bucket and the session data index to become out of sync. I admit that I knew this already, but haven't figured out a way around it yet without having massive numbers of conflicts. That said, it may be that we can't avoid them and I may turn this behavior off in the next revision. Another alternative is to use a nonpersistent index, but that has other challenges ( it could also prevent the shared use of session data between ZEO clients). - The second is an unidentified (yet) bug in TemporaryStorage. Most of the code for TemporaryStorage came right out of the Berkeley "Packless" storage. I should probably try to replicate this under load (this should not be difficult).To verify the assertion that there really is a bug in TemporaryStorage without much trouble, you could temporarily put your transient object container (session_data) in the main database and point the session data manager at it and run for a while under this config. This is not recommended for production as it will cause main database bloat, however. If I find and fix the TemporaryStorage bug I will let you know. In the meantime, you can probably work around this by using a different non-undoing storage to put your session data in (e.g. Berkeley, or maybe DirectoryStorage?). - C On Fri, 2003-03-14 at 20:28, John Eikenberry wrote: > > Sorry for the length of this one... but I'm trying to braindump to give you > as much info about the problem as possible. > > To be sure it doesn't get lost in my below ramblings, there is probably > important peice of information I haven't mentioned yet... that is that > these errors seem to coincide with the session data timeout setting . I > don't get the errors at all until the timeout is reached or has passed. > >  The timeout setting I'm refering to is denoted by the label: "Data > object timeout value in minutes" on the /temp_folder/session_data object. > > > Chris McDonough wrote: > > > OK, thanks John. Let's try one more thing... currently the mounted > > database used to store the session data uses a connection that ignores > > read conflicts. This is known to be bad because the machinery which > > deals with keeping the sessioning index data will also ignore read > > conflicts, which may create inconcstencies between two data structures > > (BTrees) that need to be kept in sync. > > I tried this and it seemed to help some. I haven't seen the get() error > we've been dicussing yet, but a the load() error just occurred (line 94 in > TemporaryStorage - this was error #1 in my original email). Though the > traceback is a bit different from my original email, as the > LowConflictConnection isn't being used. Here's the new Traceback: > > Error Type: KeyError > Error Value: [non-ascii chars] > > Traceback (innermost last): > > * Module ZPublisher.Publish, line 98, in publish > * Module ZPublisher.mapply, line 88, in mapply > * Module ZPublisher.Publish, line 39, in call_object > * Module Products.DotOrg.Pages.KPage, line 110, in testSession > * Module Products.DotOrg.Utils.Spawn, line 42, in launchProcess > * Module Products.DotOrg.Utils.Spawn, line 73, in storeArgs > * Module Products.Sessions.SessionDataManager, line 180, in > * _getSessionDataObject > * Module Products.Transience.Transience, line 175, in new_or_existing > * Module Products.Transience.Transience, line 797, in get > * Module Products.Transience.Transience, line 546, in _getCurrentBucket > * Module ZODB.Connection, line 509, in setstate > * Module Products.TemporaryFolder.TemporaryStorage, line 94, in load > > > > Here's a patch to lib/python/Products/TemporaryFolder/TemporaryFolder.py > > that reenables read conflict generation on the database. > > > > Index: TemporaryFolder.py > > =================================================================== > > RCS file: > > /cvs-repository/Zope/lib/python/Products/TemporaryFolder/TemporaryFolder.py,v > > retrieving revision 1.7 > > diff -r1.7 TemporaryFolder.py > > 72c72 > > < db.klass = LowConflictConnection > > --- > > > #db.klass = LowConflictConnection > > > > You may see many more conflicts with this running. But maybe the data > > structures will not become desynchronized. > > You weren't kidding about the increase in conflict errors. > > > Another problem, still unexplained, experienced by Andrew Athan, is that > > if a reference is made to a session data object from within the standard > > error message, somehow things get screwy under high load. If you're > > doing the same, please let me know. > > Before this started happening there was a hasSessionData check getting > called during standard error publishing, though we removed that early this > week when this started happening. > > --- > > It might help you to better understand what might be causing the problem if > you know where we're using sessions and how we can force this problem to > occur. Not sure if this willl be of much help, but thought it couldn't > hurt. > > We use sessions primarily as a sort of authenticated user marker. It just > stored their username and a state field that get used in non-authenticated > sections of our site to detect the user as having logged into the site (we > can then raise an unautorized error to get the basic auth info for that > user). Anyways, these calls happen on our basic Content class (subclassed > from DTMLMethod) in its __call__() method. We use it a couple other places > for small things, but this one sees the most use. > > I've figured out how to force these errors to happen to some extent. I've > written a method that starts up a thread, which uses Client.call to call > another method, which then basically just loops endlessly calling > hasSessionData and getSessionData, incrementing a number in the session > data and sleeping for a N number of seconds between loops. One of these > guys will run forever without a problem. > > Once you start a second thread ReadConflictErrors start getting raised. > Which thread gets the conflict and which one keeps working seems variable > (probably just a timing thing). If I start enough of these threads I can > cause the error to happen. But only once the session timeout is reached. > > Note that to help speed up getting the errors I either set the session time > to 1 minute via _setTimeout() call or even manually tweak the appropriate > session data managers attributes (_timeout_secs, _period and > _timeout_slices) to very small values (ie. a few seconds). > > > -- > > John Eikenberry [EMAIL PROTECTED] > ______________________________________________________________ > "A society that will trade a little liberty for a little order > will deserve neither and lose both." > --B. Franklin > > _______________________________________________ > Zope-Dev maillist - [EMAIL PROTECTED] > http://mail.zope.org/mailman/listinfo/zope-dev > ** No cross posts or HTML encoding! ** > (Related lists - > http://mail.zope.org/mailman/listinfo/zope-announce > http://mail.zope.org/mailman/listinfo/zope ) _______________________________________________ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )