Process 24103 (the blocked proc) last logged at 19:02:01, and in doing so, it had:
Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: start searching: hash=34811, isACK=0 Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:matching_3261: RFC3261 transaction matching failed Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: no transaction found Oct 8 19:02:01 hvprxy osips[24103]: DBG:statistics:w_update_stat: needed statistic is <cpbx_presence_notify_rcv> Oct 8 19:02:01 hvprxy osips[24103]: DBG:statistics:parse_groupname: group: '', name: 'cpbx_presence_notify_rcv' Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:select_contacts: ct: sip:[snip]@[snip]:55087;transport=tls Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:push_branch: setting msg R-URI <sip:[snip]@[snip]:55087;transport=tls> Oct 8 19:02:01 hvprxy osips[24103]: [RELAY] Performing mid-registrar lookup on sip:[snip]:5060 Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:lookup: '' Not found in usrloc Oct 8 19:02:01 hvprxy osips[24103]: DBG:statistics:w_update_stat: needed statistic is <cpbx_contact_lookup_failed> Oct 8 19:02:01 hvprxy osips[24103]: DBG:statistics:parse_groupname: group: '', name: 'cpbx_contact_lookup_failed' Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:select_contacts: ct: sip:[snip]@[snip]:55087;transport=tls Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:push_branch: setting msg R-URI <sip:xxx@xxx:55087;transport=tls> Oct 8 19:02:01 hvprxy osips[24103]: [RELAY] Performed mid_registrar_lookup: sip:xxx@xxx:55087;transport=tls Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:t_newtran: transaction on entrance=(nil) Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers: flags=ffffffffffffffff Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers: flags=78 Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: start searching: hash=34811, isACK=0 Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:matching_3261: RFC3261 transaction matching failed Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: no transaction found Oct 8 19:02:01 hvprxy osips[24103]: DBG:tm:run_reqin_callbacks: trans=0x7fafa71d8150, callback type 1, id 0 entered Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:mk_proxy: doing DNS lookup... Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers: flags=2000 Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers: flags=ffffffffffffffff Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: con found in state 0 Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: tcp connection found (0x7fafa84997e8), acquiring fd Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: c= 0x7fafa84997e8, n=16, Usock=647 < no more > Proc 11468 (tcp main thread) logged: Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: getsockopt: snd is initially 425984 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: setting snd buf to 851968 had no effect Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: setting snd buf to 428032 had no effect Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: using snd buffer of 416 kb Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:init_sock_keepalive: TCP keepalive enabled on socket 636 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:print_ip: tcpconn_new: new tcp connection to: xxx Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_new: on port 11934, proto 3 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_add: hashes: 1004, 320 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:handle_new_connect: new connection: 0x7fafa7cc0c80 636 flags: 001c Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: no free tcp receiver, connection passed to the least busy one (proc #1, 3 con) Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: to tcp worker 1 (0), 0x7fafa7cc0c80 rw 1 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:ipc_handle_job: received job type 0[RPC] from process 1 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: getsockopt: snd is initially 425984 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: setting snd buf to 851968 had no effect Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: setting snd buf to 428032 had no effect Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff: using snd buffer of 416 kb Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:init_sock_keepalive: TCP keepalive enabled on socket 637 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:print_ip: tcpconn_new: new tcp connection to: xxx Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_new: on port 11850, proto 3 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_add: hashes: 824, 321 Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:handle_new_connect: new connection: 0x7fafa7e76170 637 flags: 001c Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: no free tcp receiver, connection passed to the least busy one (proc #5, 3 con) Oct 8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: to tcp worker 5 (0), 0x7fafa7e76170 rw 1 Looking for maybe other interesting things⦠Oct 8 19:01:55 hvprxy osips[11397]: DBG:core:tcp_conn_get: tcp connection found (0x7fafa84997e8), acquiring fd Oct 8 19:01:55 hvprxy osips[11397]: DBG:core:tcp_conn_get: c= 0x7fafa84997e8, n=16, Usock=555 Oct 8 19:01:55 hvprxy osips[11397]: DBG:core:tcp_conn_get: after receive_fd: c= 0x7fafa84997e8 n=8 fd=5 Oct 8 19:01:55 hvprxy osips[11397]: DBG:proto_tls:proto_tls_send: after write: c=0x7fafa84997e8 n=483 fd=5 Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: tcp connection found (0x7fafa84997e8), acquiring fd Oct 8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: c= 0x7fafa84997e8, n=16, Usock=647 Oct 8 19:02:13 hvprxy osips[11398]: DBG:core:tcp_conn_get: tcp connection found (0x7fafa84997e8), acquiring fd Oct 8 19:02:13 hvprxy osips[11398]: DBG:core:tcp_conn_get: c= 0x7fafa84997e8, n=16, Usock=557 Oct 8 19:02:13 hvprxy osips[11398]: DBG:core:tcp_conn_get: after receive_fd: c= 0x7fafa84997e8 n=8 fd=5 Oct 8 19:02:13 hvprxy osips[11398]: DBG:proto_tls:proto_tls_send: after write: c=0x7fafa84997e8 n=360 fd=5 Oct 8 19:02:13 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req: Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0 Oct 8 19:02:43 hvprxy osips[11398]: DBG:core:tcp_conn_get: tcp connection found (0x7fafa84997e8), acquiring fd Oct 8 19:02:43 hvprxy osips[11398]: DBG:core:tcp_conn_get: c= 0x7fafa84997e8, n=16, Usock=557 Oct 8 19:02:43 hvprxy osips[11398]: DBG:core:tcp_conn_get: after receive_fd: c= 0x7fafa84997e8 n=8 fd=5 Oct 8 19:02:43 hvprxy osips[11398]: DBG:proto_tls:proto_tls_send: after write: c=0x7fafa84997e8 n=360 fd=5 Oct 8 19:02:43 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req: Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0 Oct 8 19:02:44 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req: Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0 Oct 8 19:02:44 hvprxy osips[24113]: DBG:core:tcp_conn_get: tcp connection found (0x7fafa84997e8), acquiring fd Oct 8 19:02:44 hvprxy osips[24113]: DBG:core:tcp_conn_get: c= 0x7fafa84997e8, n=16, Usock=649 Oct 8 19:03:13 hvprxy osips[11458]: DBG:core:tcp_conn_get: tcp connection found (0x7fafa84997e8), acquiring fd Oct 8 19:03:13 hvprxy osips[11458]: DBG:core:tcp_conn_get: c= 0x7fafa84997e8, n=16, Usock=625 Oct 8 19:03:13 hvprxy osips[11458]: DBG:core:tcp_conn_get: after receive_fd: c= 0x7fafa84997e8 n=8 fd=950 Oct 8 19:03:13 hvprxy osips[11458]: DBG:proto_tls:proto_tls_send: after write: c=0x7fafa84997e8 n=360 fd=950 Oct 8 19:03:13 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req: Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0 All this said, I can't say I'm totally sure I know what I would be looking for. Nothing seems to stand out in the logs to me; but there are a few gigs of logs so it's hard to quite know what I'm looking for. Andrew On Sat, 9 Oct 2021 at 00:19, Bogdan-Andrei Iancu <[email protected]> wrote: > > Hi Andrew, > > The second blocked process (doing the TLS/TCP stuff) surprisingly got > stuck while waiting for a TCP fd from the TCP Main process. > > You mentioned that the logs of the UDP worker (doing the TCP send) > suddenly stopped - around that time, do you see any errors from that > process or from the TCP MAIN processes ? > > Regards, > > Bogdan-Andrei Iancu > > OpenSIPS Founder and Developer > https://www.opensips-solutions.com > OpenSIPS eBootcamp 2021 > https://opensips.org/training/OpenSIPS_eBootcamp_2021/ > > On 10/8/21 2:43 PM, Andrew Yager wrote: > > Hi Bogdan-Andrei, > > > > Have restarted since the last bt, but have recreated again and > > attached. Earlier today we did also get another bt full on the second > > blocked pid, but I didn't save it. In that case it was a UDP reply > > from one of our upstream servers that had gone through mid_registrar > > and was being relayed to a TCP endpoint. The TCP endpoint did have an > > open file descriptor we could see, and it had sent and was blocked on > > receive at the same point (I'm getting better at reading backtraces! > > :D). > > > > The thing I do note is happening is that every example I have is a UDP > > message being received from an upstream server being relayed to a > > client on a TCP/TLS connection via a UDP worker. > > > > While we are using WolfSSL in this box, the other box where we have > > the same behaviour (but I haven't taken backtraces yet) is running > > OpenSSL and on 3.1.3; so it's not SSL library specific. > > > > I'm going to see if I can get a backtrace from the 3.1.3 box shortly. > > > > Andrew > > > > On Fri, 8 Oct 2021 at 17:13, Bogdan-Andrei Iancu <[email protected]> > > wrote: > >> Hi Andrew, > >> > >> OK, interesting progress here. So, the FIFO process blocks as it is > >> trying to send an IPC JOB to an UDP process which looks like also being > >> blocked. > >> > >> Could you attach with GDB to the that UDP blocked process too ? (you > >> have its PID in the printing of the pt[x] in first gdb) > >> > >> Regards, > >> > >> Bogdan-Andrei Iancu > >> > >> OpenSIPS Founder and Developer > >> https://www.opensips-solutions.com > >> OpenSIPS eBootcamp 2021 > >> https://opensips.org/training/OpenSIPS_eBootcamp_2021/ > >> > >> On 10/8/21 1:43 AM, Andrew Yager wrote: > >>> Interestingly, where I usually see a range of continued messages from > >>> a process continually in the debug log, they appear to stop for this > >>> PID at 3:47am, and that process seems blocked on a tcp/tls send: > >>> > >>> > _______________________________________________ Users mailing list [email protected] http://lists.opensips.org/cgi-bin/mailman/listinfo/users
