On Sun, Feb 8, 2009 at 9:23 PM, Christian Boos <[email protected]> wrote: >> >> It turned out to be locked cache after timeout. At first I thought >> about a race condition, but it was false. The cache can easily be >> jammed with a long resync from repository on a busy server. It took >> about 40 seconds to sync 200 revisions before timeout occurred. Take a >> cache.py sync() >> http://trac.edgewall.org/browser/branches/0.11-stable/trac/versioncontrol/cache.py#L158 >> >> On line 159 we check if a revision is being entered in DB by looking >> if next revision number is already present in `revisions` table. That >> means another thread is made an INSERT in line 184. The cache is now >> locked. To unlock it, another thread should update youngest revision >> at line 221. >> > Correct. >> So, the problem is when a timeout occurs during execution of 202:221 - >> there is an update in `node_change` table that can potentially take a >> long time to experience a timeout (esp. when server is under a heavy >> load). There are also cset.get_changes() and >> self.repos.next_rev(next_youngest) calls that can contribute to the >> issue. > > I don't really see how a time out can happen here. Do you mean a > TimeoutError exception? > That shouldn't happen, as the "db" variable already refers to a > connection, and TimeoutError exceptions are only raised when trying to > get a connection from the cache.
I am running Trac via fastcgi on Apache. I think server kills the process in the middle of operation. If a user dropped connection I can't see why would Apache should allow process to run. Should there be an exception? There must be timeouts for scripts as well. Are those scripts notified before the kill? Log doesn't show that. > But you're nevertheless right, if anything wrong happens there (unicode > exception, db level timeout?), then there's no rollback done here. > That exception ends up being ignored at a later point (converted to a > warning) and if this is followed by a commit (e.g. session save), then > the cache remains locked. I completely ignored the fact that there should be a commit() for the lock to occur. That makes things complicated. Here is the log content for the lock time. The message "Resync from other thread in progress" is inserted in "for" block after line 158. No warnings are displayed. 2009-02-08 03:27:48,106 Trac[__init__] DEBUG: Caching node change in [2391]: (u'trunk/unicode_far/vbuild.m4', 'file', 'edit', u'trunk/unicode_far/vbuild.m4', 2389) 2009-02-08 03:27:48,107 Trac[__init__] DEBUG: Caching node change in [2391]: (u'trunk/unicode_far/wrap.cpp', 'file', 'edit', u'trunk/unicode_far/wrap.cpp', 2 387) 2009-02-08 03:27:48,181 Trac[__init__] DEBUG: Dispatching <Request "GET u'/chrome/site/fartrac.ico'"> 2009-02-08 03:27:48,191 Trac[__init__] INFO: Trying to sync revision [2392] 2009-02-08 03:27:48,342 Trac[__init__] DEBUG: 18 unreachable objects found. 2009-02-08 03:27:50,422 Trac[__init__] DEBUG: Dispatching <Request "GET u'/log/trunk/enc/enc_rus/styles'"> 2009-02-08 03:27:50,430 Trac[__init__] INFO: repos rev [2559] != cached rev [2391] 2009-02-08 03:27:50,434 Trac[__init__] DEBUG: resync from other thread in progress 2 2009-02-08 03:27:50,452 Trac[__init__] DEBUG: Prepare chrome data for request 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed anonymous performing TRAC_ADMIN on None 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed anonymous performing PERMISSION_GRANT on None 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed anonymous performing PERMISSION_REVOKE on None 2009-02-08 03:27:50,559 Trac[__init__] DEBUG: No policy allowed anonymous performing EMAIL_VIEW on None 2009-02-08 03:27:50,918 Trac[__init__] DEBUG: 330 unreachable objects found. 2009-02-08 03:27:51,525 Trac[__init__] DEBUG: Dispatching <Request "GET u'/browser/trunk/unicode_far'"> 2009-02-08 03:27:51,568 Trac[__init__] INFO: repos rev [2559] != cached rev [2391] 2009-02-08 03:27:51,572 Trac[__init__] DEBUG: resync from other thread in progress 2009-02-08 03:27:51,577 Trac[__init__] DEBUG: Prepare chrome data for request 2009-02-08 03:27:51,578 Trac[__init__] DEBUG: No policy allowed anonymous performing TRAC_ADMIN on None 2 How did it happen that youngest revision was not updated in system table if a commit that inserted rev number into revision table was successful? Maybe transaction mechanism doesn't work for MySQL backend? > Rather simply doing a rollback in all cases when something goes. > I'll work on a patch. > That would be great. >> It would be more useful to see pid or thread id displayed in log >> instead of Trac[__init__] string. Does this Trac[__init__] change at >> all? > > Have a look at http://trac.edgewall.org/wiki/TracLogging#LogFormat and the > "The format is a string which can contain any of the Python logging > Formatter variables. " part. > You can indeed add the thread id there. I'll follow your advice. Thanks. I wonder why by default the setting is so strange? > Thanks for you in-depth analysis of the problem, this might solve one of > the "weird" issues we still carry on since Trac 0.10 (e.g. #5440). Hope this helps. BTW, what do these log messages about unreachable objects mean? -- --anatoly t. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Trac Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/trac-dev?hl=en -~----------~----~----~----~------~----~------~--~---
