Re: [OpenSIPS-Users] CPU 100% with TCP

Bogdan-Andrei Iancu Fri, 26 Oct 2018 00:07:29 -0700

Hi Ben,

Thank you for the info.

It looks like theprocesses get stuck into a HEP related internal lock -do you see any HEP related errors inyour logs, prior to the dead-lock ?

Also, as PoC, could you disabled HEP tracing to see if the problem goesaway ?


Thanks,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  http://www.opensips-solutions.com
OpenSIPS Bootcamp 2018
  http://opensips.org/training/OpenSIPS_Bootcamp_2018/

On 10/24/2018 10:18 PM, Ben Newlin wrote:


Bogdan,

I have run the command but the output was too large for pastebin so Ihave sent it to you directly.


Ben Newlin

*From: *Bogdan-Andrei Iancu <[email protected]>
*Date: *Wednesday, October 24, 2018 at 5:17 AM

*To: *OpenSIPS users mailling list <[email protected]>, BenNewlin <[email protected]>

*Subject: *Re: [OpenSIPS-Users] CPU 100% with TCP

Hi Ben,

Could you run "opensipsctl trap" ?

Regards,

Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
   http://www.opensips-solutions.com
OpenSIPS Bootcamp 2018
   http://opensips.org/training/OpenSIPS_Bootcamp_2018/

On 10/24/2018 12:56 AM, Ben Newlin wrote:

    Hi,

    We have implemented TCP recently and are performing TCP<->UDP
    translation on one of our proxy types. This proxy only exists for
    that purpose; there are no DB queries, REST calls, or anything
    like that. It is designed to be very fast and high throughput.

    Recently we have found that when the remote endpoint of a TCP
    connection is lost, i.e. the server goes down, while under
    moderate load OpenSIPS quickly reaches 100% CPU and becomes
    unresponsive. When this occurs, the “top” command shows that
    between 30-90% CPU is in System (kernel) space, and each OpenSIPS
    TCP process shows many times the normal CPU. We are running
    OpenSIPS 2.4.2 on Amazon Linux.

    I obtained as much information as I could using ps, strace, and
    gdb here: https://pastebin.com/JP3DnCqs
    <https://pastebin.com/JP3DnCqs>. We can reproduce the failure
    consistently by removing a server during call traffic.

    A few things I noticed:

      * The number of running threads reported by OpenSIPS doesn’t
        align with our configuration, copied here:

    ####### Global Parameters #########

    children=32

    #// Allow 503 to pass back to Control

    disable_503_translation=yes

    #// Even though we are not receiving HEP,

    #// this listener is required by OpenSIPS

    #// in order to use the proto_hep module. :/

    listen=hep_tcp:10.32.40.245:9061 use_children 1

    #// Configure the listeners

    listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX

    listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX

    #// Transaction Module

    loadmodule "tm.so"

    modparam("tm", "restart_fr_on_each_reply", 0)

    modparam("tm", "timer_partitions", 8)

    modparam("tm", "onreply_avp_mode", 1)

    modparam("tm", "wt_timer", 10)

    According to the documentation if “tcp_children” is not set then
    the value of “children” will be used [1], but we have set
    “children” to 32 and only have the default 8 TCP processes. Also
    we appear to only have 1 timer process, although we have set the
    number of timer partitions to 8.

      * The server that is terminated was using TCP connections
        exclusively, but all of the CPU seems to be in the UDP
        threads. The one I looked at appeared to be handling a CANCEL
        to one of the calls that was active and was attempting to send
        it out via TCP. I’m not sure why it would be trying to relay
        the CANCEL as no 100 Trying had been received from the server.
        I have noticed that in 2.x OpenSIPS will now send CANCELs for
        transactions even when 100 Trying was not received. Is that
        intentional? RFC 3261 states that no CANCEL should be sent
        unless a provisional response has been received.

    Any assistance with this would be appreciated.

    [1] -
    http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66

    Ben Newlin




    _______________________________________________

    Users mailing list

    [email protected] <mailto:[email protected]>

    http://lists.opensips.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] CPU 100% with TCP

Reply via email to