anatoly techtonik wrote: > On Sun, Feb 8, 2009 at 9:23 PM, Christian Boos <[email protected]> wrote: > >>> It turned out to be locked cache after timeout. At first I thought >>> about a race condition, but it was false. The cache can easily be >>> jammed with a long resync from repository on a busy server. It took >>> about 40 seconds to sync 200 revisions before timeout occurred. Take a >>> cache.py sync() >>> http://trac.edgewall.org/browser/branches/0.11-stable/trac/versioncontrol/cache.py#L158 >>> >>> On line 159 we check if a revision is being entered in DB by looking >>> if next revision number is already present in `revisions` table. That >>> means another thread is made an INSERT in line 184. The cache is now >>> locked. To unlock it, another thread should update youngest revision >>> at line 221. >>> >>> >> Correct. >> >>> So, the problem is when a timeout occurs during execution of 202:221 - >>> there is an update in `node_change` table that can potentially take a >>> long time to experience a timeout (esp. when server is under a heavy >>> load). There are also cset.get_changes() and >>> self.repos.next_rev(next_youngest) calls that can contribute to the >>> issue. >>> >> I don't really see how a time out can happen here. Do you mean a >> TimeoutError exception? >> That shouldn't happen, as the "db" variable already refers to a >> connection, and TimeoutError exceptions are only raised when trying to >> get a connection from the cache. >> > > I am running Trac via fastcgi on Apache. I think server kills the > process in the middle of operation. If a user dropped connection I > can't see why would Apache should allow process to run. Should there > be an exception? There must be timeouts for scripts as well. Are those > scripts notified before the kill? Log doesn't show that. > > >> But you're nevertheless right, if anything wrong happens there (unicode >> exception, db level timeout?), then there's no rollback done here. >> That exception ends up being ignored at a later point (converted to a >> warning) and if this is followed by a commit (e.g. session save), then >> the cache remains locked. >> > > I completely ignored the fact that there should be a commit() for the > lock to occur. That makes things complicated. Here is the log content > for the lock time. The message "Resync from other thread in progress" > is inserted in "for" block after line 158. No warnings are displayed. > > 2009-02-08 03:27:48,106 Trac[__init__] DEBUG: Caching node change in > [2391]: (u'trunk/unicode_far/vbuild.m4', 'file', 'edit', > u'trunk/unicode_far/vbuild.m4', > 2389) > 2009-02-08 03:27:48,107 Trac[__init__] DEBUG: Caching node change in > [2391]: (u'trunk/unicode_far/wrap.cpp', 'file', 'edit', > u'trunk/unicode_far/wrap.cpp', 2 > 387) > 2009-02-08 03:27:48,181 Trac[__init__] DEBUG: Dispatching <Request > "GET u'/chrome/site/fartrac.ico'"> > 2009-02-08 03:27:48,191 Trac[__init__] INFO: Trying to sync revision [2392] > 2009-02-08 03:27:48,342 Trac[__init__] DEBUG: 18 unreachable objects found. > 2009-02-08 03:27:50,422 Trac[__init__] DEBUG: Dispatching <Request > "GET u'/log/trunk/enc/enc_rus/styles'"> > 2009-02-08 03:27:50,430 Trac[__init__] INFO: repos rev [2559] != > cached rev [2391] > 2009-02-08 03:27:50,434 Trac[__init__] DEBUG: resync from other thread > in progress > 2 > 2009-02-08 03:27:50,452 Trac[__init__] DEBUG: Prepare chrome data for request > 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed > anonymous performing TRAC_ADMIN on None > 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed > anonymous performing PERMISSION_GRANT on None > 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed > anonymous performing PERMISSION_REVOKE on None > 2009-02-08 03:27:50,559 Trac[__init__] DEBUG: No policy allowed > anonymous performing EMAIL_VIEW on None > 2009-02-08 03:27:50,918 Trac[__init__] DEBUG: 330 unreachable objects found. > 2009-02-08 03:27:51,525 Trac[__init__] DEBUG: Dispatching <Request > "GET u'/browser/trunk/unicode_far'"> > 2009-02-08 03:27:51,568 Trac[__init__] INFO: repos rev [2559] != > cached rev [2391] > 2009-02-08 03:27:51,572 Trac[__init__] DEBUG: resync from other thread > in progress > 2009-02-08 03:27:51,577 Trac[__init__] DEBUG: Prepare chrome data for request > 2009-02-08 03:27:51,578 Trac[__init__] DEBUG: No policy allowed > anonymous performing TRAC_ADMIN on None > 2 > > How did it happen that youngest revision was not updated in system > table if a commit that inserted rev number into revision table was > successful? Maybe transaction mechanism doesn't work for MySQL > backend? > > >> Rather simply doing a rollback in all cases when something goes. >> I'll work on a patch. >> >> > > That would be great. >
See follow-up on http://trac.edgewall.org/ticket/8067 Still a bit mysterious to me. >>> It would be more useful to see pid or thread id displayed in log >>> instead of Trac[__init__] string. Does this Trac[__init__] change at >>> all? >>> >> Have a look at http://trac.edgewall.org/wiki/TracLogging#LogFormat and the >> "The format is a string which can contain any of the Python logging >> Formatter variables. " part. >> You can indeed add the thread id there. >> > > I'll follow your advice. Thanks. I wonder why by default the setting > is so strange? > > >> Thanks for you in-depth analysis of the problem, this might solve one of >> the "weird" issues we still carry on since Trac 0.10 (e.g. #5440). >> > > Hope this helps. BTW, what do these log messages about unreachable objects > mean? > Nothing, debugging noise ;-) -- Christian --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Trac Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/trac-dev?hl=en -~----------~----~----~----~------~----~------~--~---
