Hi, Federico!

Can you attach the output of the core file on pastebin?

Best regards,

Răzvan Crainea
OpenSIPS Solutions
www.opensips-solutions.com

On 05/08/2015 04:51 PM, Federico Edorna wrote:
Hi Răzvan, It was happening at least once a day. It started to happen when we reach ~80 registered terminals, not a big load. Now I'm using the event_rabbitmq module with 400 terminals and it's working fine.

The event is a custom one, I called "E_REGISTERED", it's just to notify an external process that a particular terminal has registered, this is part of the configuration that crashes:

/startup_route {/
/ subscribe_event("E_REGISTERED", "xmlrpc:10.10.11.2:10080:OSEvent");/
/}/
/.../
/.../
/route {/
/.../
/.../
/        if (is_method("REGISTER")) {
/
/.../
/.../
/                $avp(attr-name) = "username";
/
/                $avp(attr-val) = $tU;/
/                $avp(attr-name) = "domain";/
/                $avp(attr-val) = $td;/
/ raise_event("E_REGISTERED", $avp(attr-name), $avp(attr-val));/
/.../
/.../
/}/

Another thing: when I compiled with DBG_QM_MALLOC instead of F_MALLOC to debug, I didn't have any crashes for about 5 days. Maybe I should have waited more time to confirm, but it seems that the first memory manager solved the issue.

Regarding to the core files, it seems than some module (even_xmlrpc for me..) it's freeing memory that it should not. After this issue I realized that the module was in beta, so I moved to the rabbitmq

Thanks for your reply
Federico


On Fri, May 8, 2015 at 7:47 AM, Răzvan Crainea <[email protected] <mailto:[email protected]>> wrote:

    Hi, Federico!

    Is this easily replicating, or it happens once in a while? Also,
    what events are you raising?

    Best regards,

    Răzvan Crainea
    OpenSIPS Solutions
    www.opensips-solutions.com  <http://www.opensips-solutions.com>

    On 04/24/2015 05:44 PM, Federico Edorna wrote:
    Just in case somebody deal with the same issue, the problem seems
    to be event_xmlrpc module. I tried with the event_datagram to
    notify the external process and I got no more crashes for a
    couple of weeks.
    Now I'm using event_rabbit module instead of datagram without
    problems for a couple of days.


    On Mon, Apr 6, 2015 at 4:46 PM, Federico Edorna
    <[email protected] <mailto:[email protected]>> wrote:

        Hello, I'm getting core dumps in version 1.11.3.
        Unlike other opensips we are running without problems, we're
        using some extra modules in this config because opensips
        needs to notify an external process (via event_xmlrpc) when a
        terminal registers, and that external process afterwards
        sends opensips (via mi_datagram/t_uac_dlg) a MWI NOTIFY for
        the terminal.

        I'm pasting 3 backtraces (commit cbaf569, but it happened for
        previous commits too)

        http://pastebin.com/xZ2zqJ0F
        http://pastebin.com/8DWhsMfK
        http://pastebin.com/9ERCD3mZ

        This is what syslog shows:

        2015-04-03T13:45:16.228227-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[24272]:
        CRITICAL:core:recv_all: 1st recv on 36 failed: Connection
        reset by peer
        2015-04-03T13:45:16.228249-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[24272]:
        CRITICAL:core:handle_tcp_child: read from tcp child 0 (pid
        24240, no 0) Connection reset by
         peer [104]
        2015-04-03T13:45:16.228260-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[24272]:
        CRITICAL:core:receive_fd: EOF on 38
        2015-04-03T13:45:16.250712-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[24214]:
        INFO:core:handle_sigs: child process 24240 exited by a signal 11
        2015-04-03T13:45:16.250727-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[24214]:
        INFO:core:handle_sigs: core was generated
        2015-04-03T13:45:16.250735-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[24214]:
        INFO:core:handle_sigs: terminating due to SIGCHLD
        2015-04-03T13:45:16.250800-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[24270]:
        INFO:core:sig_usr: signal 15 received

        ----

        2015-04-03T13:54:48.179260-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21747]:
        CRITICAL:core:recv_all: 1st recv on 36 failed: Connection
        reset by peer
        2015-04-03T13:54:48.179289-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21747]:
        CRITICAL:core:handle_tcp_child: read from tcp child 0 (pid
        21715, no 0) Connection reset by
         peer [104]
        2015-04-03T13:54:48.179307-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21747]:
        CRITICAL:core:receive_fd: EOF on 38
        2015-04-03T13:54:48.179373-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21688]:
        INFO:core:handle_sigs: child process 21715 exited by a signal 11
        2015-04-03T13:54:48.179388-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21688]:
        INFO:core:handle_sigs: core was generated
        2015-04-03T13:54:48.179402-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21688]:
        INFO:core:handle_sigs: terminating due to SIGCHLD
        2015-04-03T13:54:48.179417-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21746]:
        INFO:core:sig_usr: signal 15 received
        2015-04-03T13:54:48.179426-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21745]:
        INFO:core:sig_usr: signal 15 received
        2015-04-03T13:54:48.179435-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[21743]:
        INFO:core:sig_usr: signal 15 received

        ----

        2015-04-03T14:44:01.064875-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[31736]:
        CRITICAL:core:recv_all: 1st recv on 36 failed: Connection
        reset by peer
        2015-04-03T14:44:01.064898-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[31736]:
        CRITICAL:core:handle_tcp_child: read from tcp child 0 (pid
        31704, no 0) Connection reset by
         peer [104]
        2015-04-03T14:44:01.064922-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[31736]:
        CRITICAL:core:receive_fd: EOF on 38
        2015-04-03T14:44:01.064943-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[31678]:
        INFO:core:handle_sigs: child process 31704 exited by a signal 11
        2015-04-03T14:44:01.064954-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[31678]:
        INFO:core:handle_sigs: core was generated
        2015-04-03T14:44:01.064963-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[31678]:
        INFO:core:handle_sigs: terminating due to SIGCHLD
        2015-04-03T14:44:01.066539-03:00 bermeja
        /home/gc/local/opensips/sbin/opensips[31736]:
        INFO:core:sig_usr: signal 15 received

        Thanks in advance
        Federico




    _______________________________________________
    Users mailing list
    [email protected]  <mailto:[email protected]>
    http://lists.opensips.org/cgi-bin/mailman/listinfo/users


    _______________________________________________
    Users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.opensips.org/cgi-bin/mailman/listinfo/users




_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to