Bogdan,

I have run the command but the output was too large for pastebin so I have sent 
it to you directly.

Ben Newlin

From: Bogdan-Andrei Iancu <[email protected]>
Date: Wednesday, October 24, 2018 at 5:17 AM
To: OpenSIPS users mailling list <[email protected]>, Ben Newlin 
<[email protected]>
Subject: Re: [OpenSIPS-Users] CPU 100% with TCP

Hi Ben,

Could you run "opensipsctl trap" ?

Regards,


Bogdan-Andrei Iancu



OpenSIPS Founder and Developer

  http://www.opensips-solutions.com<http://www.opensips-solutions.com>

OpenSIPS Bootcamp 2018

  
http://opensips.org/training/OpenSIPS_Bootcamp_2018/<http://opensips.org/training/OpenSIPS_Bootcamp_2018/>
On 10/24/2018 12:56 AM, Ben Newlin wrote:
Hi,

We have implemented TCP recently and are performing TCP<->UDP translation on 
one of our proxy types. This proxy only exists for that purpose; there are no 
DB queries, REST calls, or anything like that. It is designed to be very fast 
and high throughput.

Recently we have found that when the remote endpoint of a TCP connection is 
lost, i.e. the server goes down, while under moderate load OpenSIPS quickly 
reaches 100% CPU and becomes unresponsive. When this occurs, the “top” command 
shows that between 30-90% CPU is in System (kernel) space, and each OpenSIPS 
TCP process shows many times the normal CPU. We are running OpenSIPS 2.4.2 on 
Amazon Linux.

I obtained as much information as I could using ps, strace, and gdb here: 
https://pastebin.com/JP3DnCqs<https://pastebin.com/JP3DnCqs>. We can reproduce 
the failure consistently by removing a server during call traffic.

A few things I noticed:

  *   The number of running threads reported by OpenSIPS doesn’t align with our 
configuration, copied here:

####### Global Parameters #########

children=32

#// Allow 503 to pass back to Control
disable_503_translation=yes

#// Even though we are not receiving HEP,
#// this listener is required by OpenSIPS
#// in order to use the proto_hep module. :/
listen=hep_tcp:10.32.40.245:9061 use_children 1

#// Configure the listeners
listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX

#// Transaction Module
loadmodule "tm.so"
modparam("tm", "restart_fr_on_each_reply", 0)
modparam("tm", "timer_partitions", 8)
modparam("tm", "onreply_avp_mode", 1)
modparam("tm", "wt_timer", 10)


According to the documentation if “tcp_children” is not set then the value of 
“children” will be used [1], but we have set “children” to 32 and only have the 
default 8 TCP processes. Also we appear to only have 1 timer process, although 
we have set the number of timer partitions to 8.

  *   The server that is terminated was using TCP connections exclusively, but 
all of the CPU seems to be in the UDP threads. The one I looked at appeared to 
be handling a CANCEL to one of the calls that was active and was attempting to 
send it out via TCP. I’m not sure why it would be trying to relay the CANCEL as 
no 100 Trying had been received from the server. I have noticed that in 2.x 
OpenSIPS will now send CANCELs for transactions even when 100 Trying was not 
received. Is that intentional? RFC 3261 states that no CANCEL should be sent 
unless a provisional response has been received.

Any assistance with this would be appreciated.

[1] - 
http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66<http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66>


Ben Newlin




_______________________________________________

Users mailing list

[email protected]<mailto:[email protected]>

http://lists.opensips.org/cgi-bin/mailman/listinfo/users<http://lists.opensips.org/cgi-bin/mailman/listinfo/users>


_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to