Ben,

To make it easier, please send me the instructions on how to reproduce the crash.

Thanks and Regards,

Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com

On 01.08.2016 20:17, Newlin, Ben wrote:

Bogdan,

I am not familiar with gdb, so I double check what you’ve assessed. If there are some other steps with gdb you would like me to perform, just let me know what to do.

Is there a way to compile the memory debugger without using the interactive `make menuconfig` command? Our build system is completely automated, so it impossible for me to do it this way. Can I pass the options as build parameters or alter the makefile in some way?

I can provide a SIPp scenario which should reproduce the issue on any basic script that uses Dialog with topology hiding, if that would be easier.

Ben Newlin

*From: *Bogdan-Andrei Iancu <[email protected]>
*Date: *Monday, August 1, 2016 at 10:57 AM
*To: *OpenSIPS users mailling list <[email protected]>, "Newlin, Ben" <[email protected]>
*Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes

Hi Ben,

According to the BT, the crash is in a pkg_malloc() call:
                    route = pkg_malloc(size);
Please double check this with gdb info.

If so, this indicate a memory corruption and we have 2 options here:
    - you compile with memory debugger (see my previous emails)
    - provide step-by-step indications on how to reproduce this crash.

Thanks and Regards,

Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com

On 29.07.2016 15:54, Newlin, Ben wrote:

    This is 1.11.6, running on CentOS 7.

    Ben Newlin

    *From: *<[email protected]>
    <mailto:[email protected]> on behalf of
    Bogdan-Andrei Iancu <[email protected]> <mailto:[email protected]>
    *Reply-To: *OpenSIPS users mailling list
    <[email protected]> <mailto:[email protected]>
    *Date: *Friday, July 29, 2016 at 8:50 AM
    *To: *"Newlin, Ben" <[email protected]>
    <mailto:[email protected]>, OpenSIPS users mailling list
    <[email protected]> <mailto:[email protected]>
    *Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes

    Ben,

    What OpenSIPS version is this (the crashing one) ? 1.11 or 2.1 ?

    Regards,


    Bogdan-Andrei Iancu

    OpenSIPS Founder and Developer

    http://www.opensips-solutions.com

    On 27.07.2016 19:02, Newlin, Ben wrote:

        I have identified that these crashes are occurring when the
        far end system is not returning the Record-Route headers in
        the 200 OK response. The headers are present in the 180
        response, but not the 200 OK. I have reproduced the scenario
        using SIPp and captured a SIP trace:
        http://pastebin.com/ckKk3EhY <http://pastebin.com/ckKk3EhY>

        The crash occurs on receipt of the ACK request and attempt to
        match the dialog.

        I also captured a BT for this scenario as well, in case
        anything specific in the trace made the issue easier to find:
        http://pastebin.com/cM3FhPiw

        I am working with the other system to try to fix their behavior.

        Ideally the Record-Route headers from previous replies could
        be used in this case to allow the call to succeed, but I don’t
        know if that is possible.

        Thanks,

        Ben Newlin

        *From: *"Newlin, Ben" <[email protected]>
        <mailto:[email protected]>
        *Date: *Wednesday, July 27, 2016 at 9:44 AM
        *To: *Bogdan-Andrei Iancu <[email protected]>
        <mailto:[email protected]>, OpenSIPS users mailling list
        <[email protected]> <mailto:[email protected]>
        *Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes

        Bogdan,

        This is a different scenario than the other you responded to.
        As I said, we have two types of servers that work together.
        One is a load-balancer and runs as a proxy. It uses double
        Record-Route because it sends messages between public and
        private networks. Then we have our other servers using TH
        which receive those requests. We are not using TH and RR on
        the same server (although I would like to).

        If validate_dialog() and fix_route_dialog() (and possibly
        loose_route()) should not be called when using TH, I believe
        the documentation should reference that. It states that
        match_dialog() must be used with TH, but does not indicate
        that the other functions should not be used or that the
        functionality won’t work. There is also no documentation of
        the incompatibility between RR and TH.

        Either way, I ran a test where I removed all calls to
        loose_route(), validate_dialog(), and fix_route_dialog() from
        my script. The crash still occurred and the BT still pointed
        to fix_route_dialog() function. So it must be getting called
        from within Dialog module somewhere. That BT is here:
        http://pastebin.com/wu2X2Hxh

        I collected this BT with loose_route() being called from my
        script, but not validate_dialog() or fix_route_dialog():
        http://pastebin.com/6V7yPaHF

        This BT was collected with all three functions being called
        from my script: http://pastebin.com/fZYYdndn

        Ben Newlin

        *From: *Bogdan-Andrei Iancu <[email protected]>
        <mailto:[email protected]>
        *Date: *Wednesday, July 27, 2016 at 3:57 AM
        *To: *OpenSIPS users mailling list <[email protected]>
        <mailto:[email protected]>, "Newlin, Ben"
        <[email protected]> <mailto:[email protected]>
        *Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes

        Hi Ben,

        First, if you use TH, makes no sense to do Record-Routing -
        there are 2 SIP concepts that overlaps. You either act as an
        end-point (by doing TH), either as a proxy (doing RR).

        If doing TH, makes no sense to use validate + fix as these
        functions check and repair the routing information in the
        request (like Route and Contact headers). if you do TH, this
        routing info is actually hidden and added by OpenSIPS, so
        there is nothing to fix and repair.

        Nevertheless, this should not crash or corrupt OpenSIPS. HAve
        you managed to get a corefile ?

        Also if you suspect memory corruption, you can compile-in the
        memory debugger - see
        http://www.opensips.org/Documentation/TroubleShooting-OutOfMem .

        Regards,




        Bogdan-Andrei Iancu

        OpenSIPS Founder and Developer

        http://www.opensips-solutions.com

        On 26.07.2016 23:20, Newlin, Ben wrote:

            I have had 3 OpenSIPS server crashes in the last week. All
            were due to segmentation faults. I was not able to capture
            core dumps; I am configuring that now to catch the next crash.

            My logs leading up to the crash are full of errors from
            fix_route_dialog() complaining about invalid URIs for
            sequential requests:

            Jul 26 19:34:02 [220] ERROR:dialog:fix_route_dialog:
            Failed to parse SIP uri

            Jul 26 19:34:02 [220] ERROR:core:parse_uri: bad uri, state
            0 parsed: <ip:1> (4) /
            <ip:10.18.8.18:5060;ftag=gK0448f137;lr;r2=on>> (44)

            Jul 26 19:11:19 [218] ERROR:dialog:fix_route_dialog:
            Failed to parse SIP uri

            Jul 26 19:11:19 [218] ERROR:core:parse_uri: bad uri, state
            0 parsed: <b0i2> (4) /
            <b0i2yjor;transport=udp<sip:10.18.8.17:5060;ftag=7207ce89;lr;r2=on>
            (65)

            Jul 26 17:43:19 [220] ERROR:dialog:fix_route_dialog:
            Failed to parse SIP uri

            Jul 26 17:43:19 [220] ERROR:core:parse_uri: bad uri, state
            0 parsed: <ervi> (4) /
            <ervice_id6fdbc70f-2438-4726-807c-0d081df4d87> (44)

            Many times the “URI” displayed in the error message is
            actually internal OpenSIPS variables, as in the last error
            above. When they are from the SIP message, I have verified
            that the messages themselves are correctly formatted. This
            leads me to believe there is memory corruption occurring.

            This all started when I updated my load-balancer servers
            to use Record-Routing, specifically the “double_rr”
            mechanism for when multiple interfaces exist. The
            Record-Routing is occurring on different servers which
            have not crashed. Only the servers receiving the
            Record-Routed messages are experiencing the errors.

            Here is a piece of the code processing sequential
            requests. I am using the topology_hiding() functionality
            of the Dialog module. Are validate_dialog() and
            fix_route_dialog() still valid in a topology_hiding scenario?

            if (t_check_trans())

            setflag(SEQ_REQUEST);

              if (has_totag())

              {

            loose_route();

                if (match_dialog())

                {

                  if (!validate_dialog())

            fix_route_dialog();

                  if (is_method("BYE"))

            setflag(ACC_FLAG);

            setflag(SEQ_REQUEST);

                }

                else if (!isflagset(SEQ_REQUEST))

                {

                  if (!is_method("ACK")) {

            route(rlog, LV_ERROR, "check_sequential", "Sequential
            request not matched");

              route(reply_error, "481", "Call Does Not Exist");

                  }

            return(EXIT);

                }

              }

            I will attempt to get core dumps of future crashes.

            Thanks,

            Ben Newlin







            _______________________________________________

            Users mailing list

            [email protected] <mailto:[email protected]>

            http://lists.opensips.org/cgi-bin/mailman/listinfo/users




    _______________________________________________

    Users mailing list

    [email protected] <mailto:[email protected]>

    http://lists.opensips.org/cgi-bin/mailman/listinfo/users


_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to