Hi all,
I’m running ovn 22.09 and sometimes see that ovn-controllers crash with
segmentation fault. The backtrace is next:
(gdb) bt
#0 0x00007f0742707de1 in __strlen_sse2 () from /lib64/libc.so.6
#1 0x00007f0742788c5d in inet_pton () from /lib64/libc.so.6
#2 0x0000564f45a1c784 in ip_parse (s=<optimized out>,
ip=ip@entry=0x7f074040f90c) at lib/packets.c:698
#3 0x0000564f4594cbfb in svc_monitor_send_tcp_health_check__
(swconn=swconn@entry=0x7f0738000940,
svc_mon=svc_mon@entry=0x564f4c2960c0, ctl_flags=ctl_flags@entry=2,
tcp_seq=3858078915, tcp_ack=tcp_ack@entry=0,
tcp_src=<optimized out>) at controller/pinctrl.c:7513
#4 0x0000564f4594d47c in svc_monitor_send_tcp_health_check__
(tcp_src=<optimized out>, tcp_ack=0, tcp_seq=<optimized out>,
ctl_flags=2, svc_mon=0x564f4c2960c0, swconn=0x7f0738000940) at
controller/pinctrl.c:7502
#5 svc_monitor_send_health_check (swconn=swconn@entry=0x7f0738000940,
svc_mon=svc_mon@entry=0x564f4c2960c0)
at controller/pinctrl.c:7621
#6 0x0000564f4595869b in svc_monitors_run
(svc_monitors_next_run_time=0x564f45dd3970 <svc_monitors_next_run_time.37793>,
swconn=0x7f0738000940) at controller/pinctrl.c:7693
#7 pinctrl_handler (arg_=0x564f45e11240 <pinctrl>) at controller/pinctrl.c:3499
#8 0x0000564f45a0ad6f in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:422
#9 0x00007f074325bea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f07427798dd in clone () from /lib64/libc.so.6
After moving to frame #3, I can get actual data from svc_mon structure
(port/protocol/dp_key/port_key) - I’ve looked them up in SB DB and found
port_binding, which belongs to a logical port, which resides on this chassis.
It has configured LB with HC. Here everything seems good. But if to check
svc_mon->sb_svc_mon structure, it seems to me that it contains garbage -
Address 0x564f00000000 out of bounds; logical_port == 0, etc (but I can be
wrong):
$1 = (const struct sbrec_service_monitor *) 0x564f54db2b40
(gdb) print *svc_mon->sb_svc_mon
$2 = {header_ = {hmap_node = {hash = 94898726054728, next = 0x0}, uuid = {parts
= {0, 0, 0, 0}}, src_arcs = {prev = 0x564f54aae0d0, next = 0x0}, dst_arcs =
{prev = 0x564f7f8bd470, next = 0x564f7f8bd540}, table = 0x64, old_datum = 0xf,
parsed = 152, reparse_node = {prev = 0x0, next = 0x0}, new_datum = 0x0,
prereqs = 0x52eb8916, written = 0x171, txn_node = {hash = 1, next =
0x564f54db2db0}, map_op_written = 0x0, map_op_lists = 0x0, set_op_written = 0x0,
set_op_lists = 0x0, change_seqno = {0, 0, 0}, track_node = {prev =
0x564f00000000, next = 0x0}, updated = 0x0, tracked_old_datum = 0x0},
external_ids = {map = {buckets = 0x1, one = 0x564f54db2d90, mask = 0, n = 0}},
ip = 0x564f00000000 <Address 0x564f00000000 out of bounds>, logical_port =
0x0, options = {map = {buckets = 0x0, one = 0x0, mask = 1, n =
94898780242768}}, port = 0, protocol = 0x0, src_ip = 0x1 <Address 0x1 out of
bounds>,
src_mac = 0x564f54db2d70 "`Ջ\177OV", status = 0x0}
…
(gdb) print svc_mon->state
$8 = SVC_MON_S_ONLINE
(gdb) print svc_mon->status
$9 = SVC_MON_ST_ONLINE
(gdb) print svc_mon->protocol
$10 = SVC_MON_PROTO_TCP
(gdb) print svc_mon->sb_svc_mon
This crash occurred right after ovsdb SB connection loss due to inactivity
probe failure. So, ovn-controller was re-connecting to SB, and I guess, this
could somehow re-initialize SB IDL objects.
I’m not sure I can try to reproduce this behaviour on latest main branch, so my
question, if this theoretically can be connected with re-initialization of IDL?
If yes, what should be done to avoid such behavior?
Should ovn-controller process changes if its IDL is in inconsistent state?
Any help is appreciated.
Regards,
Vladislav Odintsov
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev