Hi Jason, On 03/02/15 15:27, Daniel-Constantin Mierla wrote: > Hi Jason, > > On 03/02/15 08:29, Jason Penton wrote: >> Hi Daniel, >> >> I have found another instance of a similar bug where the a process >> gets itself into deadlock. The sequence of events: >> >> 1. INVITE request is relayed downstream >> 2. error response or locally generated 408 occurs >> 2. Failure route in cfg is armed and tries to relay to alternate >> destination (voicemail/annoucnement server, etc) - using t_relay() >> 3. t_relay fails to destination for whatever reason (in this case the >> result of the destination being blacklisted in TM) >> 4. Original reply is attempted to be relayed upstream (this is all >> done safely - ie using t_reply_unsafe() as the REPLY_LOCK has already >> been acquired in failure route >> 5. At or around line 562 of t_reply.c we check if the reply is > 200 >> (final response). If it is and if its an INVITE transaction, we clean >> up timers and cancel all UAC branches. This is where we hit a problem: >> >> cancel_uacs( trans, &cancel_data, F_CANCEL_B_KILL, lock ); >> >> 6. cancel_uacs calls cancel_branch if there was no response on the >> branch by calling cancel_branch() >> 7. cancel branch calls LOCK_REPLIES(t) on the transaction and you hit >> deadlock.... (original call to LOCK_REPLIES(t) happened in failure >> route processing) >> >> This use case is slightly different to the one you fixed related to >> retransmission timers... (the beginning of this thread) >> >> Suggestion: What about a pkg/process variable that can be used to >> check if you have already acquired the reply lock to combat these >> non-reentrant deadlocks. A side-effect of this would save on passing >> variables around in various functions indicating intention to lock or >> not. >> >> Thoughts? > What about making the reply lock re-entrant? I think it should be safer > in long term to allow parts of code executed by same process to safely > acquire the lock without worrying if the same process did it before. > Global variables in pkg would need more complex management to set, test > and reset. > I made the reply lock re-entrant, this should handle the situation you were exposing it. if you have a chance to simulate it, then let me know the result.
On a slightly different idea, I wonder if async_mutex should be made re-entrant as well, what do you think? I guess it is unlikely to get two async attempts in the same process, but you know better how the mutex is planned to be used. Cheers, Daniel -- Daniel-Constantin Mierla http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Kamailio World Conference, May 27-29, 2015 Berlin, Germany - http://www.kamailioworld.com _______________________________________________ sr-dev mailing list [email protected] http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
