Hello,

iirc, there are several functions that script writer can use, like t_reply_callid() from tmx. The idea is to analyze a bit in order to detect if a forced reply may end up in canceling some pending branches -- the reply on the branch doesnt matter anymore and should not be considered anymore for relaying upstream, because the script writer already decided what to send out.

Cheers,
Daniel


On 10/04/14 13:24, Jason Penton wrote:
Hey Daniel,

which reply functions are you referring to? API functions?

Cheers
Jason


On Thu, Apr 10, 2014 at 12:53 PM, Daniel-Constantin Mierla <[email protected] <mailto:[email protected]>> wrote:

    OK. I will leave it a bit in master to see if there are any new
    reports, then I will backport. I will also have to review the tm
    reply functions that can be used from config to align them to the
    new check.

    Cheers,
    Daniel


    On 10/04/14 09:06, Jason Penton wrote:
    oh excellent, I will look at it right away - was just getting
    ready to jump in myself ;)

    Cheers
    Jason


    On Thu, Apr 10, 2014 at 9:01 AM, Daniel-Constantin Mierla
    <[email protected] <mailto:[email protected]>> wrote:

        Hello Jason,

        I pushed a patch trying to fix this case, it is only on git
        master branch. Can you test it? If all goes fine, we can
        consider backporting it.

        Cheers,
        Daniel


        On 09/04/14 23:26, Jason Penton wrote:
        Hey Daniel,

        nothing extraordinary...

        # -- TM params --
        modparam("tm", "fr_timer", 20000);
        modparam("tm", "fr_inv_timer", 10000)


        Cheers
        Jason


        On Wed, Apr 9, 2014 at 10:32 PM, Jason Penton
        <[email protected] <mailto:[email protected]>> wrote:

            Hey Daniel,

            Yes I did a test with a very basic config file and I am
            not able to re-create. However, with my *complex* cfg
            file I can re-create every time. Tomorrow I will compare
            what is different and report back... hopefully with fix ;)

            here is bt of timer process deadlocking itself:

            #0  syscall () at
            ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
            #1  0x00007f5009f22004 in futex_get
            (lock=0x7f4fc55030d8) at ../../mem/../futexlock.h:123
            #2  0x00007f5009f223e1 in _lock (s=0x7f4fc55030d8,
            file=0x7f5009f90fd1 "t_cancel.c",
            function=0x7f5009f91980 "cancel_branch", line=250) at
            lock.h:99
            #3  0x00007f5009f23271 in cancel_branch
            (t=0x7f4fc5501b40, branch=0, reason=0x7fff646d03a8,
            flags=3) at t_cancel.c:250
            #4  0x00007f5009f22c02 in cancel_uacs (t=0x7f4fc5501b40,
            cancel_data=0x7fff646d03a0, flags=1) at t_cancel.c:123
            #5  0x00007f5009f718c4 in _reply_light
            (trans=0x7f4fc5501b40,
                buf=0x7f500a24dc68 "SIP/2.0 500 Server error on LIR
            select next S-CSCF\r\nVia: SIP/2.0/UDP
            
10.0.1.167:6060;branch=z9hG4bKb7.2ae09f29ffbd0034cd6d58483053603b.1\r\nVia:
            SIP/2.0/UDP
            10.0.1.166:4060;branch=z9hG4bKb7.3faa03ddea80"...,
            len=778, code=500, to_tag=0x7f500a1c7ae0
            "c82b15d7f12ef185f95fe4945457d449-8bab", to_tag_len=37,
            lock=0, bm=0x7fff646d0b60) at t_reply.c:660
            #6  0x00007f5009f7244c in _reply (trans=0x7f4fc5501b40,
            p_msg=0x7f500a1c6bc0, code=500, text=0x7f500a249a48
            "Server error on LIR select next S-CSCF", lock=0) at
            t_reply.c:795
            #7  0x00007f5009f76436 in t_reply_unsafe
            (t=0x7f4fc5501b40, p_msg=0x7f500a1c6bc0, code=500,
            text=0x7f500a249a48 "Server error on LIR select next
            S-CSCF") at t_reply.c:1643
            #8  0x00007f5009f57621 in w_t_reply (msg=0x7f500a1c6bc0,
            p1=0x7f500a2497d8 "\340\332$\nP\177", p2=0x7f500a249870
            "h\321$\nP\177") at tm.c:1324
            #9  0x000000000041a700 in do_action (h=0x7fff646d1d30,
            a=0x7f500a24cee8, msg=0x7f500a1c6bc0) at action.c:1119
            #10 0x0000000000423831 in run_actions (h=0x7fff646d1d30,
            a=0x7f500a24cee8, msg=0x7f500a1c6bc0) at action.c:1607
            #11 0x000000000041a5a4 in do_action (h=0x7fff646d1d30,
            a=0x7f500a24d478, msg=0x7f500a1c6bc0) at action.c:1102
            #12 0x0000000000423831 in run_actions (h=0x7fff646d1d30,
            a=0x7f500a249148, msg=0x7f500a1c6bc0) at action.c:1607
            #13 0x000000000041a54e in do_action (h=0x7fff646d1d30,
            a=0x7f500a24c500, msg=0x7f500a1c6bc0) at action.c:1098
            #14 0x0000000000423831 in run_actions (h=0x7fff646d1d30,
            a=0x7f500a247a28, msg=0x7f500a1c6bc0) at action.c:1607
            #15 0x0000000000423fdf in run_top_route
            (a=0x7f500a247a28, msg=0x7f500a1c6bc0, c=0x0) at
            action.c:1693
            #16 0x00007f5009f73815 in run_failure_handlers
            (t=0x7f4fc5501b40, rpl=0xffffffffffffffff, code=408,
            extra_flags=96) at t_reply.c:1061
            #17 0x00007f5009f7527a in t_should_relay_response
            (Trans=0x7f4fc5501b40, new_code=408, branch=1,
            should_store=0x7fff646d201c,
            should_relay=0x7fff646d2018, cancel_data=0x7fff646d2070,
            reply=0xffffffffffffffff) at t_reply.c:1416
            #18 0x00007f5009f76ede in relay_reply (t=0x7f4fc5501b40,
            p_msg=0xffffffffffffffff, branch=1, msg_status=408,
            cancel_data=0x7fff646d2070, do_put_on_wait=0) at
            t_reply.c:1819
            #19 0x00007f5009f44c88 in fake_reply (t=0x7f4fc5501b40,
            branch=1, code=408) at timer.c:354
            #20 0x00007f5009f450e7 in final_response_handler
            (r_buf=0x7f4fc5501e60, t=0x7f4fc5501b40) at timer.c:526
            #21 0x00007f5009f4518d in retr_buf_handler
            (ticks=260027386, tl=0x7f4fc5501e80, p=0x3e8) at timer.c:584
            #22 0x0000000000544119 in timer_list_expire
            (t=260027386, h=0x7f4fc527cbe0, slow_l=0x7f4fc527cdf0,
            slow_mark=0) at timer.c:894
            #23 0x0000000000544418 in timer_handler () at timer.c:959
            #24 0x00000000005446b2 in timer_main () at timer.c:998
            #25 0x0000000000471ddf in main_loop () at main.c:1689



            On Wed, Apr 9, 2014 at 9:34 PM, Daniel-Constantin Mierla
            <[email protected] <mailto:[email protected]>> wrote:

                Hello,

                that should not be a very rare case and I would
                expect to be caught so far, anyhow ... this looks
                like easy to reproduce, have you tried it?

                You can have two kamailio, one relying the invite to
                the second, which will reply with 100, then wait for
                the timeout on the first instance. You can add some
                debug messages in the code to see if the lock is
                called twice.

                Cheers,
                Daniel


                On 09/04/14 17:51, Jason Penton wrote:
                Hi All,

                I have been experiencing a deadlock when a timeout
                occurs on a t_relayed() INVITE. Going through the
                code I have noticed a possible chance of deadlock
                (without re-entrant enabled). Here is my thinking:

                t_should_relay_response() is called with REPLY_LOCK
                when the timer process fires on the fr_inv_timer
                (no response from the INVITE that was relayed,
                other than 100 provisional) and a 408 is generated.
                However, from within that function there are calls
                to run_failure_handlers() which in turn *could* try
                and lock the reply (viz. somebody having a
                t_reply() call in the cfg file - in failure route
                block). This would result in another lock on the
                same transaction's REPLY_LOCK....

                Has anybody else experienced something like this?

                this is on master btw.

                Cheers
                Jason


                _______________________________________________
                sr-dev mailing list
                [email protected]  
<mailto:[email protected]>
                http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- Daniel-Constantin Mierla -http://www.asipto.com
                http://twitter.com/#!/miconda  
<http://twitter.com/#%21/miconda>  -http://www.linkedin.com/in/miconda


                _______________________________________________
                sr-dev mailing list
                [email protected]
                <mailto:[email protected]>
                http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev




-- Daniel-Constantin Mierla -http://www.asipto.com
        http://twitter.com/#!/miconda  <http://twitter.com/#%21/miconda>  
-http://www.linkedin.com/in/miconda



-- Daniel-Constantin Mierla -http://www.asipto.com
    http://twitter.com/#!/miconda  <http://twitter.com/#%21/miconda>  
-http://www.linkedin.com/in/miconda



--
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda

_______________________________________________
sr-dev mailing list
[email protected]
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

Reply via email to