yes okay - got it. I will test and analyse. Thanks Daniel!
On Thu, Apr 10, 2014 at 4:35 PM, Daniel-Constantin Mierla <[email protected] > wrote: > Hello, > > iirc, there are several functions that script writer can use, like > t_reply_callid() from tmx. The idea is to analyze a bit in order to detect > if a forced reply may end up in canceling some pending branches -- the > reply on the branch doesnt matter anymore and should not be considered > anymore for relaying upstream, because the script writer already decided > what to send out. > > Cheers, > Daniel > > > > On 10/04/14 13:24, Jason Penton wrote: > > Hey Daniel, > > which reply functions are you referring to? API functions? > > Cheers > Jason > > > On Thu, Apr 10, 2014 at 12:53 PM, Daniel-Constantin Mierla < > [email protected]> wrote: > >> OK. I will leave it a bit in master to see if there are any new reports, >> then I will backport. I will also have to review the tm reply functions >> that can be used from config to align them to the new check. >> >> Cheers, >> Daniel >> >> >> On 10/04/14 09:06, Jason Penton wrote: >> >> oh excellent, I will look at it right away - was just getting ready to >> jump in myself ;) >> >> Cheers >> Jason >> >> >> On Thu, Apr 10, 2014 at 9:01 AM, Daniel-Constantin Mierla < >> [email protected]> wrote: >> >>> Hello Jason, >>> >>> I pushed a patch trying to fix this case, it is only on git master >>> branch. Can you test it? If all goes fine, we can consider backporting it. >>> >>> Cheers, >>> Daniel >>> >>> >>> On 09/04/14 23:26, Jason Penton wrote: >>> >>> Hey Daniel, >>> >>> nothing extraordinary... >>> >>> # -- TM params -- >>> modparam("tm", "fr_timer", 20000); >>> modparam("tm", "fr_inv_timer", 10000) >>> >>> >>> Cheers >>> Jason >>> >>> >>> On Wed, Apr 9, 2014 at 10:32 PM, Jason Penton <[email protected]>wrote: >>> >>>> Hey Daniel, >>>> >>>> Yes I did a test with a very basic config file and I am not able to >>>> re-create. However, with my *complex* cfg file I can re-create every time. >>>> Tomorrow I will compare what is different and report back... hopefully with >>>> fix ;) >>>> >>>> here is bt of timer process deadlocking itself: >>>> >>>> #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39 >>>> #1 0x00007f5009f22004 in futex_get (lock=0x7f4fc55030d8) at >>>> ../../mem/../futexlock.h:123 >>>> #2 0x00007f5009f223e1 in _lock (s=0x7f4fc55030d8, file=0x7f5009f90fd1 >>>> "t_cancel.c", function=0x7f5009f91980 "cancel_branch", line=250) at >>>> lock.h:99 >>>> #3 0x00007f5009f23271 in cancel_branch (t=0x7f4fc5501b40, branch=0, >>>> reason=0x7fff646d03a8, flags=3) at t_cancel.c:250 >>>> #4 0x00007f5009f22c02 in cancel_uacs (t=0x7f4fc5501b40, >>>> cancel_data=0x7fff646d03a0, flags=1) at t_cancel.c:123 >>>> #5 0x00007f5009f718c4 in _reply_light (trans=0x7f4fc5501b40, >>>> buf=0x7f500a24dc68 "SIP/2.0 500 Server error on LIR select next >>>> S-CSCF\r\nVia: SIP/2.0/UDP >>>> 10.0.1.167:6060;branch=z9hG4bKb7.2ae09f29ffbd0034cd6d58483053603b.1\r\nVia: >>>> SIP/2.0/UDP 10.0.1.166:4060;branch=z9hG4bKb7.3faa03ddea80"..., >>>> len=778, code=500, to_tag=0x7f500a1c7ae0 >>>> "c82b15d7f12ef185f95fe4945457d449-8bab", to_tag_len=37, lock=0, >>>> bm=0x7fff646d0b60) at t_reply.c:660 >>>> #6 0x00007f5009f7244c in _reply (trans=0x7f4fc5501b40, >>>> p_msg=0x7f500a1c6bc0, code=500, text=0x7f500a249a48 "Server error on LIR >>>> select next S-CSCF", lock=0) at t_reply.c:795 >>>> #7 0x00007f5009f76436 in t_reply_unsafe (t=0x7f4fc5501b40, >>>> p_msg=0x7f500a1c6bc0, code=500, text=0x7f500a249a48 "Server error on LIR >>>> select next S-CSCF") at t_reply.c:1643 >>>> #8 0x00007f5009f57621 in w_t_reply (msg=0x7f500a1c6bc0, >>>> p1=0x7f500a2497d8 "\340\332$\nP\177", p2=0x7f500a249870 "h\321$\nP\177") at >>>> tm.c:1324 >>>> #9 0x000000000041a700 in do_action (h=0x7fff646d1d30, >>>> a=0x7f500a24cee8, msg=0x7f500a1c6bc0) at action.c:1119 >>>> #10 0x0000000000423831 in run_actions (h=0x7fff646d1d30, >>>> a=0x7f500a24cee8, msg=0x7f500a1c6bc0) at action.c:1607 >>>> #11 0x000000000041a5a4 in do_action (h=0x7fff646d1d30, >>>> a=0x7f500a24d478, msg=0x7f500a1c6bc0) at action.c:1102 >>>> #12 0x0000000000423831 in run_actions (h=0x7fff646d1d30, >>>> a=0x7f500a249148, msg=0x7f500a1c6bc0) at action.c:1607 >>>> #13 0x000000000041a54e in do_action (h=0x7fff646d1d30, >>>> a=0x7f500a24c500, msg=0x7f500a1c6bc0) at action.c:1098 >>>> #14 0x0000000000423831 in run_actions (h=0x7fff646d1d30, >>>> a=0x7f500a247a28, msg=0x7f500a1c6bc0) at action.c:1607 >>>> #15 0x0000000000423fdf in run_top_route (a=0x7f500a247a28, >>>> msg=0x7f500a1c6bc0, c=0x0) at action.c:1693 >>>> #16 0x00007f5009f73815 in run_failure_handlers (t=0x7f4fc5501b40, >>>> rpl=0xffffffffffffffff, code=408, extra_flags=96) at t_reply.c:1061 >>>> #17 0x00007f5009f7527a in t_should_relay_response >>>> (Trans=0x7f4fc5501b40, new_code=408, branch=1, should_store=0x7fff646d201c, >>>> should_relay=0x7fff646d2018, cancel_data=0x7fff646d2070, >>>> reply=0xffffffffffffffff) at t_reply.c:1416 >>>> #18 0x00007f5009f76ede in relay_reply (t=0x7f4fc5501b40, >>>> p_msg=0xffffffffffffffff, branch=1, msg_status=408, >>>> cancel_data=0x7fff646d2070, do_put_on_wait=0) at t_reply.c:1819 >>>> #19 0x00007f5009f44c88 in fake_reply (t=0x7f4fc5501b40, branch=1, >>>> code=408) at timer.c:354 >>>> #20 0x00007f5009f450e7 in final_response_handler (r_buf=0x7f4fc5501e60, >>>> t=0x7f4fc5501b40) at timer.c:526 >>>> #21 0x00007f5009f4518d in retr_buf_handler (ticks=260027386, >>>> tl=0x7f4fc5501e80, p=0x3e8) at timer.c:584 >>>> #22 0x0000000000544119 in timer_list_expire (t=260027386, >>>> h=0x7f4fc527cbe0, slow_l=0x7f4fc527cdf0, slow_mark=0) at timer.c:894 >>>> #23 0x0000000000544418 in timer_handler () at timer.c:959 >>>> #24 0x00000000005446b2 in timer_main () at timer.c:998 >>>> #25 0x0000000000471ddf in main_loop () at main.c:1689 >>>> >>>> >>>> >>>> On Wed, Apr 9, 2014 at 9:34 PM, Daniel-Constantin Mierla < >>>> [email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> that should not be a very rare case and I would expect to be caught so >>>>> far, anyhow ... this looks like easy to reproduce, have you tried it? >>>>> >>>>> You can have two kamailio, one relying the invite to the second, which >>>>> will reply with 100, then wait for the timeout on the first instance. You >>>>> can add some debug messages in the code to see if the lock is called >>>>> twice. >>>>> >>>>> Cheers, >>>>> Daniel >>>>> >>>>> >>>>> On 09/04/14 17:51, Jason Penton wrote: >>>>> >>>>> Hi All, >>>>> >>>>> I have been experiencing a deadlock when a timeout occurs on a >>>>> t_relayed() INVITE. Going through the code I have noticed a possible >>>>> chance >>>>> of deadlock (without re-entrant enabled). Here is my thinking: >>>>> >>>>> t_should_relay_response() is called with REPLY_LOCK when the timer >>>>> process fires on the fr_inv_timer (no response from the INVITE that was >>>>> relayed, other than 100 provisional) and a 408 is generated. However, from >>>>> within that function there are calls to run_failure_handlers() which in >>>>> turn *could* try and lock the reply (viz. somebody having a t_reply() call >>>>> in the cfg file - in failure route block). This would result in another >>>>> lock on the same transaction's REPLY_LOCK.... >>>>> >>>>> Has anybody else experienced something like this? >>>>> >>>>> this is on master btw. >>>>> >>>>> Cheers >>>>> Jason >>>>> >>>>> >>>>> _______________________________________________ >>>>> sr-dev mailing >>>>> [email protected]http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev >>>>> >>>>> >>>>> -- >>>>> Daniel-Constantin Mierla - >>>>> http://www.asipto.comhttp://twitter.com/#!/miconda - >>>>> http://www.linkedin.com/in/miconda >>>>> >>>>> >>>>> _______________________________________________ >>>>> sr-dev mailing list >>>>> [email protected] >>>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev >>>>> >>>>> >>>> >>> >>> -- >>> Daniel-Constantin Mierla - >>> http://www.asipto.comhttp://twitter.com/#!/miconda - >>> http://www.linkedin.com/in/miconda >>> >>> >> >> -- >> Daniel-Constantin Mierla - >> http://www.asipto.comhttp://twitter.com/#!/miconda - >> http://www.linkedin.com/in/miconda >> >> > > -- > Daniel-Constantin Mierla - http://www.asipto.comhttp://twitter.com/#!/miconda > - http://www.linkedin.com/in/miconda > >
_______________________________________________ sr-dev mailing list [email protected] http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
