Hi List, I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP. I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic. I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual. I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue. If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!) I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172 I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification) Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13) [... one minute without nothing...] Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying... As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1? I just restarted again my service (to see if it restart better after 30 minutes only instead of a week) Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13) [... one minute without nothing...] Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying... Still not able to restart in a clean way! Tks! Regards Aymeric Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla <[email protected]> a écrit : > Hello, > > based on the trap output I think I could figure out what happened there. > > You have tcp_children to very low value (1 or so), the problem is not > actually that one, but the fact that the connection to upstream (the > device/app sending the request) was closed after receiving the request > and routing of the reply gets stuck in the way of: > > - a reply is received and has to be forwarded > - connection was lost, so Kamailio tries to establish a new one, but > takes time till fails because the upstream is behind nat or so based on > the via header: > > Via: SIP/2.0/TLS > 10.1.0.4:10002 > ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 > > - the reply is retransmitted and gets to another worker, which tries > to forward it again, but discovers a connection structure for that > destination exists (created by previous reply worker) and now waits for > the connection to be released (or better said, for the mutex on writing > buffer to be unlocked) > > - as the second reply waits, there can be other retransmissions of the > reply ending up in other workers stuck on waiting for the mutex of the > connection write buffer > > The solution here is to use set_reply_no_connect() -- you can put it > first in request_route block. I think this would be a good addition to > the default configuration file as well, IMO, the sip server should not > connect for sending replies and should do it also for requests that go > behind nat. > > Cheers, > Daniel > > On 19.03.19 10:53, Kristijan Vrban wrote: > > So i had again the situation. But this time, incoming udp was > > affected. Kamailio was sending out OPTIONS (via dispatcher module) to > > a group of asterisk machines > > but the 200 OK reply to the OPTIONS where not processed, so the > > dispatcher module set all asterisk to inactive, even though they > > replied 200 OK > > > > Attached the output of kamctl trap during the situation. Hope there is > > any useful in it. Because after "kamctl trap" it was working again > > without kamailio restart. > > > > Best > > Kristijan > > > > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla > > <[email protected]>: > >> Hello, > >> > >> setting tcp_children=1 is not a god option for scallability, practically > >> you set kamailio to process a single tcp message at one time, on high > >> traffic, that won't work well. > >> > >> Maybe try to set tcp_children to 2 or 4, that should make an eventual > >> race appear faster. > >> > >> Regarding the pid, if it is an outgoing connection, then it can be > >> created by any worker process, including a UDP worker, if that was the > >> one receiving the sip message over udp and sends it out via tcp. > >> > >> Cheers, > >> Daniel > >> > >> On 18.03.19 10:09, Kristijan Vrban wrote: > >>> Hi Daniel, > >>> > >>> for testing, i now had set: "tcp_children=1" and so far this issue did > not occur > >>> ever since. So now value to provide for "kamctl trap" yet. > >>> > >>> "kamctl ps" show this two process to handle tcp: > >>> > >>> ... > >>> }, { > >>> "IDX": 25, > >>> "PID": 71929, > >>> "DSC": "tcp receiver (generic) child=0" > >>> }, { > >>> "IDX": 26, > >>> "PID": 71933, > >>> "DSC": "tcp main process" > >>> } > >>> ... > >>> > >>> > >>> Ok, but then is was wondering to see a TCP connection on a udp > receiver child: > >>> > >>> > >>> netstat -ntp |grep 5061 > >>> > >>> ... > >>> tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 > >>> ESTABLISHED 71895/kamailio > >>> ... > >>> > >>> An pid 71895 is: > >>> > >>> }, { > >>> "IDX": 3, > >>> "PID": 71895, > >>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" > >>> }, { > >>> > >>> > >>> > >>> And if i look into it via "lsof -p 71895" (the udp receiver child) > >>> > >>> ... > >>> kamailio 71895 kamailio 14u sock 0,9 0t0 > >>> 8856085 protocol: TCP > >>> kamailio 71895 kamailio 15u sock 0,9 0t0 > >>> 8886886 protocol: TCP > >>> kamailio 71895 kamailio 16u sock 0,9 0t0 > >>> 8854886 protocol: TCP > >>> kamailio 71895 kamailio 17u sock 0,9 0t0 > >>> 8828915 protocol: TCP > >>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > >>> 1680314 type=DGRAM > >>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 > >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > >>> kamailio 71895 kamailio 20u sock 0,9 0t0 > >>> 8887192 protocol: TCP > >>> kamailio 71895 kamailio 21u sock 0,9 0t0 > >>> 8813634 protocol: TCP > >>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > >>> 1681407 type=STREAM > >>> kamailio 71895 kamailio 23u sock 0,9 0t0 > >>> 8850488 protocol: TCP > >>> ... > >>> > >>> Not only the ESTABLISHED TCP session. But also this empty sockets > >>> "protocol: TCP" > >>> What are they doing there in the udp receiver? Is that how it's > supposed to be? > >>> > >>> Kristijan > >>> > >>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > >>> <[email protected]>: > >>>> Can you get file written by `kamctl trap`? It should have the > backtrace > >>>> for all kamailio processes. You need latest kamailio 5.2. > >>>> > >>>> Also, get the output for: kamctl ps > >>>> > >>>> Cheers, > >>>> Daniel > >>>> > >>>> On 14.03.19 13:52, Kristijan Vrban wrote: > >>>>> When i attach via gdb to one of the tcp worker, i see this: > >>>>> > >>>>> (gdb) bt > >>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, > >>>>> expected=1, futex_word=0x7fdaeca92f8c) at > >>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 > >>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, > >>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 > >>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at > >>>>> pthread_rwlock_wrlock.c:67 > >>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from > >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > >>>>> #4 0x00007fdaf08e1c08 in ?? () from > /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > >>>>> #5 0x00007fdaf08a6f69 in ?? () from > /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > >>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from > >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > >>>>> #7 0x00007fdaf0c31144 in ?? () from > /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > >>>>> #8 0x00007fdaf0c2bddb in ?? () from > /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > >>>>> #9 0x00007fdaf0c22858 in ?? () from > /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > >>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from > >>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > >>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, > >>>>> error=0x7ffffe2a2df0) at tls_server.c:422 > >>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, > >>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 > >>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, > >>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 > >>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, > >>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at > >>>>> core/tcp_read.c:1496 > >>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, > >>>>> idx=-1) at core/tcp_read.c:1862 > >>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 > <io_w>, > >>>>> t=2, repeat=0) at core/io_wait.h:1065 > >>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at > >>>>> core/tcp_read.c:1974 > >>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at > core/tcp_main.c:4853 > >>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 > >>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at > main.c:2675 > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban > >>>>> <[email protected]>: > >>>>>> Hi, with full debug is see this in log for every incoming TCP SIP > request: > >>>>>> > >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > >>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp > >>>>>> receiver, connection passed to the least busy one (105) > >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > >>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 > >>>>>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 > >>>>>> > >>>>>> So the Kamailio TCP process is working, and received TCP traffic. > But > >>>>>> the tcp workers are somehow busy. > >>>>>> > >>>>>> When i attach via strace to the TCP worker, i do not see any > activity. Just: > >>>>>> > >>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL > >>>>>> > >>>>>> and nothing, even when i see the main tcp process choose this > worker process. > >>>>>> > >>>>>> Kristijan > >>>>>> > >>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban > >>>>>> <[email protected]>: > >>>>>>> first of all thanks for the feedback. i prepared our system now to > run > >>>>>>> with debug=3 > >>>>>>> I hope to see more then then. > >>>>>>> > >>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban > >>>>>>> <[email protected]>: > >>>>>>>> Hi kamailios, > >>>>>>>> > >>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day > or > >>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The > >>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, > >>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any > processing for > >>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic > via > >>>>>>>> UDP is working just totally fine. > >>>>>>>> > >>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get > >>>>>>>> bigger and bigger. e.g.: > >>>>>>>> > >>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State > PID/Program > >>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 > ESTABLISHED > >>>>>>>> 31347/kamailio > >>>>>>>> > >>>>>>>> After Kamailio restart, all is working fine again for a day. We > have > >>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call > per > >>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" > >>>>>>>> > >>>>>>>> How to could we debug this situation? Again, no error, no warings > in > >>>>>>>> the log. Just nothing. > >>>>>>>> > >>>>>>>> Kristijan > >>>>> _______________________________________________ > >>>>> Kamailio (SER) - Users Mailing List > >>>>> [email protected] > >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > >>>> -- > >>>> Daniel-Constantin Mierla -- www.asipto.com > >>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda > >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA > -- www.asipto.com > >>>> > >>> _______________________________________________ > >>> Kamailio (SER) - Users Mailing List > >>> [email protected] > >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > >> -- > >> Daniel-Constantin Mierla -- www.asipto.com > >> www.twitter.com/miconda -- www.linkedin.com/in/miconda > >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- > www.asipto.com > >> > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- > www.asipto.com > > > _______________________________________________ > Kamailio (SER) - Users Mailing List > [email protected] > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > -- Antisip - http://www.antisip.com
_______________________________________________ Kamailio (SER) - Users Mailing List [email protected] https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
