anatoly techtonik wrote:
> On Sun, Feb 8, 2009 at 9:23 PM, Christian Boos <[email protected]> wrote:
>   
>>> It turned out to be locked cache after timeout. At first I thought
>>> about a race condition, but it was false. The cache can easily be
>>> jammed with a long resync from repository on a busy server. It took
>>> about 40 seconds to sync 200 revisions before timeout occurred. Take a
>>> cache.py sync()
>>> http://trac.edgewall.org/browser/branches/0.11-stable/trac/versioncontrol/cache.py#L158
>>>
>>> On line 159 we check if a revision is being entered in DB by looking
>>> if next revision number is already present in `revisions` table. That
>>> means another thread is made an INSERT in line 184. The cache is now
>>> locked. To unlock it, another thread should update youngest revision
>>> at line 221.
>>>
>>>       
>> Correct.
>>     
>>> So, the problem is when a timeout occurs during execution of 202:221 -
>>> there is an update in `node_change` table that can potentially take a
>>> long time to experience a timeout (esp. when server is under a heavy
>>> load). There are also cset.get_changes() and
>>> self.repos.next_rev(next_youngest) calls that can contribute to the
>>> issue.
>>>       
>> I don't really see how a time out can happen here. Do you mean a
>> TimeoutError exception?
>> That shouldn't happen, as the "db" variable already refers to a
>> connection, and TimeoutError exceptions are only raised when trying to
>> get a connection from the cache.
>>     
>
> I am running Trac via fastcgi on Apache. I think server kills the
> process in the middle of operation. If a user dropped connection I
> can't see why would Apache should allow process to run. Should there
> be an exception? There must be timeouts for scripts as well. Are those
> scripts notified before the kill? Log doesn't show that.
>
>   
>> But you're nevertheless right, if anything wrong happens there (unicode
>> exception, db level timeout?), then there's no rollback done here.
>> That exception ends up being ignored at a later point (converted to a
>> warning) and if this is followed by a commit (e.g. session save), then
>> the cache remains locked.
>>     
>
> I completely ignored the fact that there should be a commit() for the
> lock to occur. That makes things complicated. Here is the log content
> for the lock time. The message "Resync from other thread in progress"
> is inserted in "for" block after line 158. No warnings are displayed.
>
> 2009-02-08 03:27:48,106 Trac[__init__] DEBUG: Caching node change in
> [2391]: (u'trunk/unicode_far/vbuild.m4', 'file', 'edit',
> u'trunk/unicode_far/vbuild.m4',
>  2389)
> 2009-02-08 03:27:48,107 Trac[__init__] DEBUG: Caching node change in
> [2391]: (u'trunk/unicode_far/wrap.cpp', 'file', 'edit',
> u'trunk/unicode_far/wrap.cpp', 2
> 387)
> 2009-02-08 03:27:48,181 Trac[__init__] DEBUG: Dispatching <Request
> "GET u'/chrome/site/fartrac.ico'">
> 2009-02-08 03:27:48,191 Trac[__init__] INFO: Trying to sync revision [2392]
> 2009-02-08 03:27:48,342 Trac[__init__] DEBUG: 18 unreachable objects found.
> 2009-02-08 03:27:50,422 Trac[__init__] DEBUG: Dispatching <Request
> "GET u'/log/trunk/enc/enc_rus/styles'">
> 2009-02-08 03:27:50,430 Trac[__init__] INFO: repos rev [2559] !=
> cached rev [2391]
> 2009-02-08 03:27:50,434 Trac[__init__] DEBUG: resync from other thread
> in progress
> 2
> 2009-02-08 03:27:50,452 Trac[__init__] DEBUG: Prepare chrome data for request
> 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed
> anonymous performing TRAC_ADMIN on None
> 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed
> anonymous performing PERMISSION_GRANT on None
> 2009-02-08 03:27:50,453 Trac[__init__] DEBUG: No policy allowed
> anonymous performing PERMISSION_REVOKE on None
> 2009-02-08 03:27:50,559 Trac[__init__] DEBUG: No policy allowed
> anonymous performing EMAIL_VIEW on None
> 2009-02-08 03:27:50,918 Trac[__init__] DEBUG: 330 unreachable objects found.
> 2009-02-08 03:27:51,525 Trac[__init__] DEBUG: Dispatching <Request
> "GET u'/browser/trunk/unicode_far'">
> 2009-02-08 03:27:51,568 Trac[__init__] INFO: repos rev [2559] !=
> cached rev [2391]
> 2009-02-08 03:27:51,572 Trac[__init__] DEBUG: resync from other thread
> in progress
> 2009-02-08 03:27:51,577 Trac[__init__] DEBUG: Prepare chrome data for request
> 2009-02-08 03:27:51,578 Trac[__init__] DEBUG: No policy allowed
> anonymous performing TRAC_ADMIN on None
> 2
>
> How did it happen that youngest revision was not updated in system
> table if a commit that inserted rev number into revision table was
> successful? Maybe transaction mechanism doesn't work for MySQL
> backend?
>
>   
>> Rather simply doing a rollback in all cases when something goes.
>> I'll work on a patch.
>>
>>     
>
> That would be great.
>   


See follow-up on http://trac.edgewall.org/ticket/8067
Still a bit mysterious to me.

>>> It would be more useful to see pid or thread id displayed in log
>>> instead of Trac[__init__] string. Does this Trac[__init__] change at
>>> all?
>>>       
>> Have a look at http://trac.edgewall.org/wiki/TracLogging#LogFormat and the
>> "The format is a string which can contain any of the Python logging
>> Formatter variables. " part.
>> You can indeed add the thread id there.
>>     
>
> I'll follow your advice. Thanks. I wonder why by default the setting
> is so strange?
>
>   
>> Thanks for you in-depth analysis of the problem, this might solve one of
>> the "weird" issues we still carry on since Trac 0.10 (e.g. #5440).
>>     
>
> Hope this helps. BTW, what do these log messages about unreachable objects 
> mean?
>   

Nothing, debugging noise ;-)

-- Christian

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to