After this had been happening for a while, OpenSIPS began constantly logging a 
single line:

Jul 26 18:20:17 [7006] ERROR:core:handle_new_connect: failed to accept 
connection(24): Too many open files

This was logging thousands of times per second and quickly filled our entire 
disk. It also began logging every SIP message received in full, even though 
logging levels had not been changed.

I took a look at file descriptors on the system and found that OpenSIPS’ TCP 
Main thread has 65536 open file descriptors.

# opensipsctl ps
Process::  ID=0 PID=6981 Type=attendant
Process::  ID=1 PID=6983 Type=event-route handler
Process::  ID=2 PID=6984 Type=MI Datagram
Process::  ID=3 PID=6985 Type=MI Datagram
Process::  ID=4 PID=6986 Type=MI Datagram
Process::  ID=5 PID=6987 Type=time_keeper
Process::  ID=6 PID=6988 Type=timer
Process::  ID=7 PID=6989 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=8 PID=6990 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=9 PID=6991 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=10 PID=6992 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=11 PID=6993 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=12 PID=6994 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=13 PID=6995 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=14 PID=6996 Type=SIP receiver udp:10.32.18.155:5060
Process::  ID=15 PID=6997 Type=TCP receiver
Process::  ID=16 PID=6998 Type=TCP receiver
Process::  ID=17 PID=6999 Type=TCP receiver
Process::  ID=18 PID=7000 Type=TCP receiver
Process::  ID=19 PID=7001 Type=TCP receiver
Process::  ID=20 PID=7002 Type=TCP receiver
Process::  ID=21 PID=7003 Type=TCP receiver
Process::  ID=22 PID=7004 Type=TCP receiver
Process::  ID=23 PID=7005 Type=Timer handler
Process::  ID=24 PID=7006 Type=TCP main

# ls /proc/7006/fd/ | wc -l
65536

All of the other OpenSIPS process had hundreds of open FDs as well. My first 
thought is that many of the connections being opened for HEP are being leaked.

I captured a trap with the system in this state: https://pastebin.com/H4M9eUSu

Ben Newlin

From: Users <[email protected]> on behalf of Ben Newlin 
<[email protected]>
Reply-To: OpenSIPS users mailling list <[email protected]>
Date: Friday, July 26, 2019 at 12:46 PM
To: OpenSIPS users mailling list <[email protected]>
Subject: Re: [OpenSIPS-Users] HEP Tracing exhausting TCP connections

It turns out the netstat example I provided was a smaller case. Here is an 
example where a single OpenSIPS instance had attempted over 400 TCP connections 
to the capture server: https://pastebin.com/4jKQtZZY

Ben Newlin

From: Users <[email protected]> on behalf of Ben Newlin 
<[email protected]>
Reply-To: OpenSIPS users mailling list <[email protected]>
Date: Friday, July 26, 2019 at 12:17 PM
To: "[email protected]" <[email protected]>
Subject: [OpenSIPS-Users] HEP Tracing exhausting TCP connections

Hello,

We are experiencing an issue with the siptrace module using HEPv3 over TCP. 
What we are seeing is that when traffic increases, OpenSIPS is opening a much 
larger number of sockets to our capture server than we expect. We have our 
capture server configured to accept a maximum of 2048 connections.

We believe that the module is intended to share TCP connections to the capture 
server; something like one per process. We have found that if we increase 
traffic at a certain pace, OpenSIPS will attempt to open multiple connections 
per process and will exhaust the 2048 port limit on the server. When this 
happens, connections begin to fail and the max async messages 
(hep_async_max_postponed_chunks) is exceeded. We believe this causes the 
proto_hep module to switch to blocking TCP, as we see all OpenSIPS processes 
then block against siptrace for up to 5 minutes. The proto_hep TCP connect and 
send timeouts are set to the defaults, but these do not seem to be being 
honored for non-async connections.

I was able to capture the output of netstat and opensipsctl ps during the issue 
and they are available here: https://pastebin.com/jAMnSE8z. You can see that 
even though we only have 16 UDP/TCP listener threads, there are almost 100 TCP 
connections open to the configured proto_hep server. This is for just one 
instance.

I was also able to capture a trap while this was occurring, but not while 
OpenSIPS was blocked which was my hope. It is difficult to time that occurrence 
exactly but we will keep trying. Trap output is here: 
https://pastebin.com/cqGhrZrf


Ben Newlin
_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to