Hi Aymeric, Are you sure the issue is with TCP and not strictly related to TLS? I highly suggest you compile with ssl1.0 and give it a try...
If you want to read how I got to that conclusion: https://github.com/kamailio/kamailio/issues/1172 Hope it helps! Joel. On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard <amoiz...@gmail.com> wrote: > Hi Daniel, > > Tks for the tips. > > My traffic does include TLS as well. > > For TCP settings: > > tcp_connection_lifetime=3600 > tcp_async=yes > tcp_rd_buf_size=16384 > tcp_accept_no_cl=yes > tcp_max_connections=50000 > tcp_connect_timeout=7 > > For TLS: > enable_tls=yes > tls_max_connections=50000 > > I'm using "set_forward_no_connect();" after lookup(location) since a long > time. > > I have added this week "set_reply_no_connect();" in case it will help to > avoid the issue. > > If the issue occurs, I will try to get something via "kamctrl trap". > > In order to get a coredump (on restart timeout?) I have added this in > my kamailio.service > > WorkingDirectory=/var/run/kamailio > LimitCORE=infinity > > I have also DUMP_CORE=yes in /etc/default/kamailio > and disable_core_dump=no in my kamailio.cfg > > However, I'm not able to see any core dumps when restarting kamailio > even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"... > > Am I supposed to get a core dump in such case? > > Tks a lot! > Aymeric > > Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla <mico...@gmail.com> > a écrit : > >> Do you have pure tcp traffic and facing this issue, or there are actually >> tls connections? >> >> What are the values for core parameters related to tcp connect and tcp >> send timeouts? >> >> As for restart taking long, see exit_timeout parameter: >> >> * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout >> >> As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue >> is. With v1.1 they use their own internal locking functions, not exposing >> any api to set them from outside. Before, kamailio was initializing the >> libray telling to use Kamailio locks, giving one lock per connection. As i >> could get from some gdb traces I received, with libssl 1.1, the same >> internal lock is used for when attempting to connect to different addresses >> as well as when trying to write to different connections. If one operation >> is slow for what so ever reason, the others are waiting for the lock to be >> lifted by the slow operation. I am digging in the source code of libssl1.1 >> to figure out a solution, it can still take a bit because I am travelling >> for several days with no much spare time. >> >> Among the tunnings would be lower timeouts to connect and send, do not >> attempt to connect unless you are sure the target expects new connections >> (e.g., sending to a gateway/sip server accepting traffic via tls, but don't >> do it even for the requests routed via lookup(location) as the registration >> is using a connection with an ephemeral source port and trying to connect >> back to it will fail). If still a major issue for what so ever reason, >> using a version compiled with libssl1.0 would be something to go for it. >> >> Cheers, >> Daniel >> On 21.03.19 19:17, Aymeric Moizard wrote: >> >> Hi List, >> >> I want to share that I also met this issue last week with my kamailio >> 5.2.2. >> >> As far as I was able to see, SIP application were able to "connect()" >> with TCP, but my logs wasn't reporting any of the SIP message received >> with TCP. >> >> I have an pike right before an xlog showing every incoming request. >> However >> I suspect the issue was not related to pike module. The log didn't showed >> unusual >> number of blocked traffic. >> >> I'm almost sure I haven't reached any ulimit restrictions. >> I have many TCP, UDP childreen... >> Server was not under high load >> Nothing unusual. >> >> I'm running the default build for debian stretch from here: >> http://deb.kamailio.org/kamailio52 stretch >> >> And unfortunatly, I had some tiny pressure to restart the service so I was >> not able to get deeper into the issue. >> >> If I'm correct, I will certainly improve much things by using >> "set_reply_no_connect()". >> I have added it and restarted! >> (Tks Daniel for this tip!) >> >> I have been looking at issue reported here: >> "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core >> when restarting." >> https://github.com/kamailio/kamailio/issues/1172 >> >> I have to say that I do have libssl1.1. >> And I do have crash when I restart my kamailio. (even when I simply >> restart after a configuration modification) >> >> Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: >> sig_usr(): signal 15 received >> Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: >> mod_destroy(): ERROR: ctl: could not delete unix socket >> /var/run/kamailio/kamailio_ctl: Permission denied (13) >> >> [... one minute without nothing...] >> >> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs >> [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission >> denied >> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs >> [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: >> Permission denied >> Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: >> sig_alarm_abort(): shutdown timeout triggered, dying... >> >> As the 1172 issue is closed, should I expect kamailio to still have >> trouble with libssl1.1? >> >> I just restarted again my service (to see if it restart better after 30 >> minutes only instead of a week) >> >> Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: >> sig_usr(): signal 15 received >> Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: >> mod_destroy(): ERROR: ctl: could not delete unix socket >> /var/run/kamailio/kamailio_ctl: Permission denied (13) >> >> [... one minute without nothing...] >> >> Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: >> sig_alarm_abort(): shutdown timeout triggered, dying... >> >> Still not able to restart in a clean way! >> Tks! >> Regards >> Aymeric >> >> >> Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla <mico...@gmail.com> >> a écrit : >> >>> Hello, >>> >>> based on the trap output I think I could figure out what happened there. >>> >>> You have tcp_children to very low value (1 or so), the problem is not >>> actually that one, but the fact that the connection to upstream (the >>> device/app sending the request) was closed after receiving the request >>> and routing of the reply gets stuck in the way of: >>> >>> - a reply is received and has to be forwarded >>> - connection was lost, so Kamailio tries to establish a new one, but >>> takes time till fails because the upstream is behind nat or so based on >>> the via header: >>> >>> Via: SIP/2.0/TLS >>> 10.1.0.4:10002 >>> ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 >>> >>> - the reply is retransmitted and gets to another worker, which tries >>> to forward it again, but discovers a connection structure for that >>> destination exists (created by previous reply worker) and now waits for >>> the connection to be released (or better said, for the mutex on writing >>> buffer to be unlocked) >>> >>> - as the second reply waits, there can be other retransmissions of the >>> reply ending up in other workers stuck on waiting for the mutex of the >>> connection write buffer >>> >>> The solution here is to use set_reply_no_connect() -- you can put it >>> first in request_route block. I think this would be a good addition to >>> the default configuration file as well, IMO, the sip server should not >>> connect for sending replies and should do it also for requests that go >>> behind nat. >>> >>> Cheers, >>> Daniel >>> >>> On 19.03.19 10:53, Kristijan Vrban wrote: >>> > So i had again the situation. But this time, incoming udp was >>> > affected. Kamailio was sending out OPTIONS (via dispatcher module) to >>> > a group of asterisk machines >>> > but the 200 OK reply to the OPTIONS where not processed, so the >>> > dispatcher module set all asterisk to inactive, even though they >>> > replied 200 OK >>> > >>> > Attached the output of kamctl trap during the situation. Hope there is >>> > any useful in it. Because after "kamctl trap" it was working again >>> > without kamailio restart. >>> > >>> > Best >>> > Kristijan >>> > >>> > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla >>> > <mico...@gmail.com>: >>> >> Hello, >>> >> >>> >> setting tcp_children=1 is not a god option for scallability, >>> practically >>> >> you set kamailio to process a single tcp message at one time, on high >>> >> traffic, that won't work well. >>> >> >>> >> Maybe try to set tcp_children to 2 or 4, that should make an eventual >>> >> race appear faster. >>> >> >>> >> Regarding the pid, if it is an outgoing connection, then it can be >>> >> created by any worker process, including a UDP worker, if that was the >>> >> one receiving the sip message over udp and sends it out via tcp. >>> >> >>> >> Cheers, >>> >> Daniel >>> >> >>> >> On 18.03.19 10:09, Kristijan Vrban wrote: >>> >>> Hi Daniel, >>> >>> >>> >>> for testing, i now had set: "tcp_children=1" and so far this issue >>> did not occur >>> >>> ever since. So now value to provide for "kamctl trap" yet. >>> >>> >>> >>> "kamctl ps" show this two process to handle tcp: >>> >>> >>> >>> ... >>> >>> }, { >>> >>> "IDX": 25, >>> >>> "PID": 71929, >>> >>> "DSC": "tcp receiver (generic) child=0" >>> >>> }, { >>> >>> "IDX": 26, >>> >>> "PID": 71933, >>> >>> "DSC": "tcp main process" >>> >>> } >>> >>> ... >>> >>> >>> >>> >>> >>> Ok, but then is was wondering to see a TCP connection on a udp >>> receiver child: >>> >>> >>> >>> >>> >>> netstat -ntp |grep 5061 >>> >>> >>> >>> ... >>> >>> tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 >>> >>> ESTABLISHED 71895/kamailio >>> >>> ... >>> >>> >>> >>> An pid 71895 is: >>> >>> >>> >>> }, { >>> >>> "IDX": 3, >>> >>> "PID": 71895, >>> >>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" >>> >>> }, { >>> >>> >>> >>> >>> >>> >>> >>> And if i look into it via "lsof -p 71895" (the udp receiver child) >>> >>> >>> >>> ... >>> >>> kamailio 71895 kamailio 14u sock 0,9 0t0 >>> >>> 8856085 protocol: TCP >>> >>> kamailio 71895 kamailio 15u sock 0,9 0t0 >>> >>> 8886886 protocol: TCP >>> >>> kamailio 71895 kamailio 16u sock 0,9 0t0 >>> >>> 8854886 protocol: TCP >>> >>> kamailio 71895 kamailio 17u sock 0,9 0t0 >>> >>> 8828915 protocol: TCP >>> >>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >>> >>> 1680314 type=DGRAM >>> >>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >>> >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >>> >>> kamailio 71895 kamailio 20u sock 0,9 0t0 >>> >>> 8887192 protocol: TCP >>> >>> kamailio 71895 kamailio 21u sock 0,9 0t0 >>> >>> 8813634 protocol: TCP >>> >>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >>> >>> 1681407 type=STREAM >>> >>> kamailio 71895 kamailio 23u sock 0,9 0t0 >>> >>> 8850488 protocol: TCP >>> >>> ... >>> >>> >>> >>> Not only the ESTABLISHED TCP session. But also this empty sockets >>> >>> "protocol: TCP" >>> >>> What are they doing there in the udp receiver? Is that how it's >>> supposed to be? >>> >>> >>> >>> Kristijan >>> >>> >>> >>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla >>> >>> <mico...@gmail.com>: >>> >>>> Can you get file written by `kamctl trap`? It should have the >>> backtrace >>> >>>> for all kamailio processes. You need latest kamailio 5.2. >>> >>>> >>> >>>> Also, get the output for: kamctl ps >>> >>>> >>> >>>> Cheers, >>> >>>> Daniel >>> >>>> >>> >>>> On 14.03.19 13:52, Kristijan Vrban wrote: >>> >>>>> When i attach via gdb to one of the tcp worker, i see this: >>> >>>>> >>> >>>>> (gdb) bt >>> >>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>> >>>>> expected=1, futex_word=0x7fdaeca92f8c) at >>> >>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>> >>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>> >>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>> >>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>> >>>>> pthread_rwlock_wrlock.c:67 >>> >>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>> >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> >>>>> #4 0x00007fdaf08e1c08 in ?? () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> >>>>> #5 0x00007fdaf08a6f69 in ?? () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> >>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>> >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> >>>>> #7 0x00007fdaf0c31144 in ?? () from >>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> >>>>> #8 0x00007fdaf0c2bddb in ?? () from >>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> >>>>> #9 0x00007fdaf0c22858 in ?? () from >>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> >>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>> >>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> >>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>> >>>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>> >>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>> >>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>> >>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>> >>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>> >>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>> >>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>> >>>>> core/tcp_read.c:1496 >>> >>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>> >>>>> idx=-1) at core/tcp_read.c:1862 >>> >>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 >>> <io_w>, >>> >>>>> t=2, repeat=0) at core/io_wait.h:1065 >>> >>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>> >>>>> core/tcp_read.c:1974 >>> >>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at >>> core/tcp_main.c:4853 >>> >>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>> >>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at >>> main.c:2675 >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>> >>>>> <vrban.l...@gmail.com>: >>> >>>>>> Hi, with full debug is see this in log for every incoming TCP SIP >>> request: >>> >>>>>> >>> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> >>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>> >>>>>> receiver, connection passed to the least busy one (105) >>> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> >>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>> >>>>>> 27(17937) for activity on [tls:172.17.217.10:5061], >>> 0x7fdaeda8f928 >>> >>>>>> >>> >>>>>> So the Kamailio TCP process is working, and received TCP traffic. >>> But >>> >>>>>> the tcp workers are somehow busy. >>> >>>>>> >>> >>>>>> When i attach via strace to the TCP worker, i do not see any >>> activity. Just: >>> >>>>>> >>> >>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>> >>>>>> >>> >>>>>> and nothing, even when i see the main tcp process choose this >>> worker process. >>> >>>>>> >>> >>>>>> Kristijan >>> >>>>>> >>> >>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>> >>>>>> <vrban.l...@gmail.com>: >>> >>>>>>> first of all thanks for the feedback. i prepared our system now >>> to run >>> >>>>>>> with debug=3 >>> >>>>>>> I hope to see more then then. >>> >>>>>>> >>> >>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>> >>>>>>> <vrban.l...@gmail.com>: >>> >>>>>>>> Hi kamailios, >>> >>>>>>>> >>> >>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a >>> day or >>> >>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>> >>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>> >>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any >>> processing for >>> >>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While >>> traffic via >>> >>>>>>>> UDP is working just totally fine. >>> >>>>>>>> >>> >>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q >>> get >>> >>>>>>>> bigger and bigger. e.g.: >>> >>>>>>>> >>> >>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State >>> PID/Program >>> >>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 >>> ESTABLISHED >>> >>>>>>>> 31347/kamailio >>> >>>>>>>> >>> >>>>>>>> After Kamailio restart, all is working fine again for a day. We >>> have >>> >>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 >>> call per >>> >>>>>>>> minute). The only settings for tcp we have is >>> "tcp_delayed_ack=no" >>> >>>>>>>> >>> >>>>>>>> How to could we debug this situation? Again, no error, no >>> warings in >>> >>>>>>>> the log. Just nothing. >>> >>>>>>>> >>> >>>>>>>> Kristijan >>> >>>>> _______________________________________________ >>> >>>>> Kamailio (SER) - Users Mailing List >>> >>>>> sr-users@lists.kamailio.org >>> >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> >>>> -- >>> >>>> Daniel-Constantin Mierla -- www.asipto.com >>> >>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>> >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >>> >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, >>> USA -- www.asipto.com >>> >>>> >>> >>> _______________________________________________ >>> >>> Kamailio (SER) - Users Mailing List >>> >>> sr-users@lists.kamailio.org >>> >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> >> -- >>> >> Daniel-Constantin Mierla -- www.asipto.com >>> >> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>> >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >>> >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA >>> -- www.asipto.com >>> >> >>> -- >>> Daniel-Constantin Mierla -- www.asipto.com >>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- >>> www.asipto.com >>> >>> >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> sr-users@lists.kamailio.org >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> >> >> >> -- >> Antisip - http://www.antisip.com >> >> -- >> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- >> www.linkedin.com/in/miconda >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- >> www.asipto.com >> >> > > -- > Antisip - http://www.antisip.com > _______________________________________________ > Kamailio (SER) - Users Mailing List > sr-users@lists.kamailio.org > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >
_______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users