Re: [OpenSIPS-Users] mi_fifo lock on reply after a period of time in 3.1.3

Andrew Yager Fri, 08 Oct 2021 07:39:44 -0700

Process 24103 (the blocked proc) last logged at 19:02:01, and in doing
so, it had:


Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: start
searching: hash=34811, isACK=0
Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:matching_3261: RFC3261
transaction matching failed
Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: no
transaction found
Oct  8 19:02:01 hvprxy osips[24103]: DBG:statistics:w_update_stat:
needed statistic is <cpbx_presence_notify_rcv>
Oct  8 19:02:01 hvprxy osips[24103]: DBG:statistics:parse_groupname:
group: '', name: 'cpbx_presence_notify_rcv'
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:select_contacts: ct:
sip:[snip]@[snip]:55087;transport=tls
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:push_branch: setting msg
R-URI <sip:[snip]@[snip]:55087;transport=tls>
Oct  8 19:02:01 hvprxy osips[24103]: [RELAY] Performing mid-registrar
lookup on sip:[snip]:5060
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:lookup: '' Not found in usrloc
Oct  8 19:02:01 hvprxy osips[24103]: DBG:statistics:w_update_stat:
needed statistic is <cpbx_contact_lookup_failed>
Oct  8 19:02:01 hvprxy osips[24103]: DBG:statistics:parse_groupname:
group: '', name: 'cpbx_contact_lookup_failed'
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:select_contacts: ct:
sip:[snip]@[snip]:55087;transport=tls
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:push_branch: setting msg
R-URI <sip:xxx@xxx:55087;transport=tls>
Oct  8 19:02:01 hvprxy osips[24103]: [RELAY] Performed
mid_registrar_lookup: sip:xxx@xxx:55087;transport=tls
Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:t_newtran: transaction on
entrance=(nil)
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers:
flags=ffffffffffffffff
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers: flags=78
Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: start
searching: hash=34811, isACK=0
Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:matching_3261: RFC3261
transaction matching failed
Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:t_lookup_request: no
transaction found
Oct  8 19:02:01 hvprxy osips[24103]: DBG:tm:run_reqin_callbacks:
trans=0x7fafa71d8150, callback type 1, id 0 entered
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:mk_proxy: doing DNS lookup...
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers: flags=2000
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:parse_headers:
flags=ffffffffffffffff
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: con found in state 0
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: tcp
connection found (0x7fafa84997e8), acquiring fd
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: c=
0x7fafa84997e8, n=16, Usock=647
< no more >

Proc 11468 (tcp main thread) logged:

Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
getsockopt: snd is initially 425984
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
setting snd buf to 851968 had no effect
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
setting snd buf to 428032 had no effect
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
using snd buffer of 416 kb
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:init_sock_keepalive: TCP
keepalive enabled on socket 636
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:print_ip: tcpconn_new:
new tcp connection to: xxx
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_new: on port
11934, proto 3
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_add: hashes: 1004, 320
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:handle_new_connect: new
connection: 0x7fafa7cc0c80 636 flags: 001c
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: no free tcp
receiver, connection passed to the least busy one (proc #1, 3 con)
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: to tcp
worker 1 (0), 0x7fafa7cc0c80 rw 1
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:ipc_handle_job: received
job type 0[RPC] from process 1
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
getsockopt: snd is initially 425984
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
setting snd buf to 851968 had no effect
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
setting snd buf to 428032 had no effect
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:probe_max_sock_buff:
using snd buffer of 416 kb
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:init_sock_keepalive: TCP
keepalive enabled on socket 637
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:print_ip: tcpconn_new:
new tcp connection to: xxx
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_new: on port
11850, proto 3
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:tcpconn_add: hashes: 824, 321
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:handle_new_connect: new
connection: 0x7fafa7e76170 637 flags: 001c
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: no free tcp
receiver, connection passed to the least busy one (proc #5, 3 con)
Oct  8 19:02:01 hvprxy osips[11468]: DBG:core:send2worker: to tcp
worker 5 (0), 0x7fafa7e76170 rw 1

Looking for maybe other interesting things…

Oct  8 19:01:55 hvprxy osips[11397]: DBG:core:tcp_conn_get: tcp
connection found (0x7fafa84997e8), acquiring fd
Oct  8 19:01:55 hvprxy osips[11397]: DBG:core:tcp_conn_get: c=
0x7fafa84997e8, n=16, Usock=555
Oct  8 19:01:55 hvprxy osips[11397]: DBG:core:tcp_conn_get: after
receive_fd: c= 0x7fafa84997e8 n=8 fd=5
Oct  8 19:01:55 hvprxy osips[11397]: DBG:proto_tls:proto_tls_send:
after write: c=0x7fafa84997e8 n=483 fd=5
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: tcp
connection found (0x7fafa84997e8), acquiring fd
Oct  8 19:02:01 hvprxy osips[24103]: DBG:core:tcp_conn_get: c=
0x7fafa84997e8, n=16, Usock=647
Oct  8 19:02:13 hvprxy osips[11398]: DBG:core:tcp_conn_get: tcp
connection found (0x7fafa84997e8), acquiring fd
Oct  8 19:02:13 hvprxy osips[11398]: DBG:core:tcp_conn_get: c=
0x7fafa84997e8, n=16, Usock=557
Oct  8 19:02:13 hvprxy osips[11398]: DBG:core:tcp_conn_get: after
receive_fd: c= 0x7fafa84997e8 n=8 fd=5
Oct  8 19:02:13 hvprxy osips[11398]: DBG:proto_tls:proto_tls_send:
after write: c=0x7fafa84997e8 n=360 fd=5
Oct  8 19:02:13 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req:
Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0
Oct  8 19:02:43 hvprxy osips[11398]: DBG:core:tcp_conn_get: tcp
connection found (0x7fafa84997e8), acquiring fd
Oct  8 19:02:43 hvprxy osips[11398]: DBG:core:tcp_conn_get: c=
0x7fafa84997e8, n=16, Usock=557
Oct  8 19:02:43 hvprxy osips[11398]: DBG:core:tcp_conn_get: after
receive_fd: c= 0x7fafa84997e8 n=8 fd=5
Oct  8 19:02:43 hvprxy osips[11398]: DBG:proto_tls:proto_tls_send:
after write: c=0x7fafa84997e8 n=360 fd=5
Oct  8 19:02:43 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req:
Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0
Oct  8 19:02:44 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req:
Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0
Oct  8 19:02:44 hvprxy osips[24113]: DBG:core:tcp_conn_get: tcp
connection found (0x7fafa84997e8), acquiring fd
Oct  8 19:02:44 hvprxy osips[24113]: DBG:core:tcp_conn_get: c=
0x7fafa84997e8, n=16, Usock=649
Oct  8 19:03:13 hvprxy osips[11458]: DBG:core:tcp_conn_get: tcp
connection found (0x7fafa84997e8), acquiring fd
Oct  8 19:03:13 hvprxy osips[11458]: DBG:core:tcp_conn_get: c=
0x7fafa84997e8, n=16, Usock=625
Oct  8 19:03:13 hvprxy osips[11458]: DBG:core:tcp_conn_get: after
receive_fd: c= 0x7fafa84997e8 n=8 fd=950
Oct  8 19:03:13 hvprxy osips[11458]: DBG:proto_tls:proto_tls_send:
after write: c=0x7fafa84997e8 n=360 fd=950
Oct  8 19:03:13 hvprxy osips[11460]: DBG:proto_tls:tcp_handle_req:
Nothing more to read on TCP conn 0x7fafa84997e8, currently in state 0

All this said, I can't say I'm totally sure I know what I would be
looking for. Nothing seems to stand out in the logs to me; but there
are a few gigs of logs so it's hard to quite know what I'm looking
for.

Andrew


On Sat, 9 Oct 2021 at 00:19, Bogdan-Andrei Iancu <[email protected]> wrote:
>
> Hi Andrew,
>
> The second blocked process (doing the TLS/TCP stuff) surprisingly got
> stuck while waiting for a TCP fd from the TCP Main process.
>
> You mentioned that the logs of the UDP worker (doing the TCP send)
> suddenly stopped - around that time, do you see any errors from that
> process or from the TCP MAIN processes ?
>
> Regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>    https://www.opensips-solutions.com
> OpenSIPS eBootcamp 2021
>    https://opensips.org/training/OpenSIPS_eBootcamp_2021/
>
> On 10/8/21 2:43 PM, Andrew Yager wrote:
> > Hi Bogdan-Andrei,
> >
> > Have restarted since the last bt, but have recreated again and
> > attached. Earlier today we did also get another bt full on the second
> > blocked pid, but I didn't save it. In that case it was a UDP reply
> > from one of our upstream servers that had gone through mid_registrar
> > and was being relayed to a TCP endpoint. The TCP endpoint did have an
> > open file descriptor we could see, and it had sent and was blocked on
> > receive at the same point (I'm getting better at reading backtraces!
> > :D).
> >
> > The thing I do note is happening is that every example I have is a UDP
> > message being received from an upstream server being relayed to a
> > client on a TCP/TLS connection via a UDP worker.
> >
> > While we are using WolfSSL in this box, the other box where we have
> > the same behaviour (but I haven't taken backtraces yet) is running
> > OpenSSL and on 3.1.3; so it's not SSL library specific.
> >
> > I'm going to see if I can get a backtrace from the 3.1.3 box shortly.
> >
> > Andrew
> >
> > On Fri, 8 Oct 2021 at 17:13, Bogdan-Andrei Iancu <[email protected]> 
> > wrote:
> >> Hi Andrew,
> >>
> >> OK, interesting progress here. So, the FIFO process blocks as it is
> >> trying to send an IPC JOB to an UDP process which looks like also being
> >> blocked.
> >>
> >> Could you attach with GDB to the that UDP blocked process too ? (you
> >> have its PID in the printing of the pt[x] in first gdb)
> >>
> >> Regards,
> >>
> >> Bogdan-Andrei Iancu
> >>
> >> OpenSIPS Founder and Developer
> >>     https://www.opensips-solutions.com
> >> OpenSIPS eBootcamp 2021
> >>     https://opensips.org/training/OpenSIPS_eBootcamp_2021/
> >>
> >> On 10/8/21 1:43 AM, Andrew Yager wrote:
> >>> Interestingly, where I usually see a range of continued messages from
> >>> a process continually in the debug log, they appear to stop for this
> >>> PID at 3:47am, and that process seems blocked on a tcp/tls send:
> >>>
> >>>
>

_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] mi_fifo lock on reply after a period of time in 3.1.3

Reply via email to