Hi OVS maintainers and others, In our deployments where we run OVS 3.2.2, we see a crash in ovs-vswitchd when the ovn-controller gets upgraded from 23.09 to 24.09. The crash is seen exactly when ovn-controller restarts and connects to ovs-vswitchd. We have configured ovn-ofctrl-wait-before-clear="300000". So ovn-controller doesn't delete the existing openflow rules as soon as it connects to ovs-vswitchd.
In 24.09, ovn-controller now configures the flow_table prefixes for each table [1] which was not the case in 23.09. We couldn't reproduce the issue again and it was seen only once on a compute node. Other compute nodes in the deployment didn't see this crash when the upgrade happened. I suspect if we are hitting the same issue which this commit tried to fix [2]. From the coredump analysis, `check_tries` function at `../lib/classifier.c:1585` is where the crash seems to occur. It's trying to access `ctx->trie->field` and it is NULL. Any pointers on what could be going on here ? [1] - 8d64f2b7dc ("controller: Fix IPv6 dp flow explosion by setting flow table prefixes.") https://github.com/ovn-org/ovn/commit/8d64f2b7dcef24d175151ba5e0732281cdeb6d54 [2] - a6117059904 "(classifier: Prevent tries vs n_tries race leading to NULL dereference.") https://github.com/openvswitch/ovs/commit/a6117059904bb692039c926221964dd6d49b3bfd Below is the core dump backtrace ------------------------------------------------------------------------------------ warning: Section `.reg-xstate/13332' in core file too small. #0 0x0000562dcf0432fc in check_tries (trie_ctx=trie_ctx@entry=0x7f70cc845aa0, n_tries=n_tries@entry=3, field_plen=field_plen@entry=0x562ddbe82de8, range_map=..., flow=flow@entry=0x7f70cc84ab20, wc=0x7f70cc84aee0) at ../lib/classifier.c:1585 1585 = ovsrcu_get(struct mf_field *, &ctx->trie->field); [Current thread is 1 (Thread 0x7f70cc851640 (LWP 13332))] Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-8.el9.x86_64 libacl-2.3.1-3.el9.x86_64 libattr-2.5.1-3.el9.x86_64 libevent-2.1.12-6.el9.x86_64 libnghttp2-1.43.0-5.el9.x86_64 libxml2-2.9.13-2.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 python3-libs-3.9.14-1.el9_1.1.x86_64 sssd-client-2.7.3-4.el9_1.1.x86_64 xz-libs-5.2.5-8.el9_0.x86_64 (gdb) bt #0 0x0000562dcf0432fc in check_tries (trie_ctx=trie_ctx@entry=0x7f70cc845aa0, n_tries=n_tries@entry=3, field_plen=field_plen@entry=0x562ddbe82de8, range_map=..., flow=flow@entry=0x7f70cc84ab20, wc=0x7f70cc84aee0) at ../lib/classifier.c:1585 #1 0x0000562dcf04b196 in find_match_wc (wc=0x7f70cc84aee0, n_tries=3, trie_ctx=0x7f70cc845aa0, flow=0x7f70cc84ab20, version=327971, subtable=0x562ddbe82d80) at ../lib/classifier.c:1704 #2 classifier_lookup__ (cls=<optimized out>, version=<optimized out>, flow=<optimized out>, wc=<optimized out>, allow_conjunctive_matches=<optimized out>) at ../lib/classifier.c:975 #3 0x0000562dcf008f17 in classifier_lookup (wc=0x7f70cc84aee0, flow=0x7f70cc84ab20, version=327971, cls=<optimized out>) at ../lib/classifier.c:1169 #4 rule_dpif_lookup_in_table (ofproto=0x562dd837cc30, ofproto=0x562dd837cc30, wc=0x7f70cc84aee0, flow=0x7f70cc84ab20, table_id=10 '\n', version=327971) at ../ofproto/ofproto-dpif.c:4393 #5 rule_dpif_lookup_from_table (ofproto=0x562dd837cc30, version=327971, flow=0x7f70cc84ab20, wc=<optimized out>, stats=<optimized out>, table_id=0x7f70cc84a5d0 "\n", in_port=65535, may_packet_in=false, honor_table_miss=false, xcache=0x7f70b4010590) at ../ofproto/ofproto-dpif.c:4500 #6 0x0000562dcf0327f1 in xlate_table_action (ctx=ctx@entry=0x7f70cc84a290, in_port=in_port@entry=65535, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=false, xlator=0x562dcf034390 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4650 #7 0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>, is_last_action=<optimized out>, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:7397 #8 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:7397 #9 0x0000562dcf03274f in xlate_recursively (actions_xlator=0x562dcf034390 <do_xlate_actions>, is_last_action=false, deepens=<optimized out>, rule=0x562dec06f8e0, ctx=0x7f70cc84a290) at ../ofproto/ofproto-dpif-xlate.c:4548 #10 xlate_table_action (ctx=ctx@entry=0x7f70cc84a290, in_port=in_port@entry=65535, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=false, xlator=0x562dcf034390 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4677 #11 0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>, is_last_action=<optimized out>, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:7397 ... ... #128 0x0000562dcf03274f in xlate_recursively (actions_xlator=0x562dcf034390 <do_xlate_actions>, is_last_action=true, deepens=<optimized out>, rule=0x562deed5d5b0, ctx=0x7f70cc84a290) at ../ofproto/ofproto-dpif-xlate.c:4548 #129 xlate_table_action (ctx=ctx@entry=0x7f70cc84a290, in_port=in_port@entry=2004, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=true, xlator=0x562dcf034390 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4677 #130 0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>, is_last_action=<optimized out>, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:7397 #131 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:7397 #132 0x0000562dcf03274f in xlate_recursively (actions_xlator=0x562dcf034390 <do_xlate_actions>, is_last_action=true, deepens=<optimized out>, rule=0x562deed520d0, ctx=0x7f70cc84a290) at ../ofproto/ofproto-dpif-xlate.c:4548 #133 xlate_table_action (ctx=ctx@entry=0x7f70cc84a290, in_port=in_port@entry=2004, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=true, xlator=0x562dcf034390 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4677 #134 0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>, is_last_action=<optimized out>, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:7397 #135 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:7397 #136 0x0000562dcf03acc0 in xlate_actions (xin=0x7f70cc84ab10, xout=0x7f70cc84b440) at ../ofproto/ofproto-dpif-xlate.c:8256 #137 0x0000562dcf025cc4 in xlate_key (udpif=udpif@entry=0x562dd83058b0, key=<optimized out>, len=<optimized out>, push=push@entry=0x7f70cc84aec0, ctx=ctx@entry=0x7f70cc84b420) at ../ofproto/ofproto-dpif-upcall.c:2218 #138 0x0000562dcf02665a in xlate_ukey (ukey=0x7f7094098ae0, ukey=0x7f7094098ae0, ctx=0x7f70cc84b420, tcp_flags=<optimized out>, udpif=0x562dd83058b0) at ../ofproto/ofproto-dpif-upcall.c:2233 #139 revalidate_ukey__ (udpif=udpif@entry=0x562dd83058b0, ukey=ukey@entry=0x7f7094098ae0, tcp_flags=<optimized out>, odp_actions=odp_actions@entry=0x7f70cc84b890, recircs=recircs@entry=0x7f70cc84b970, xcache=<optimized out>) at ../ofproto/ofproto-dpif-upcall.c:2282 #140 0x0000562dcf026a96 in revalidate_ukey (udpif=<optimized out>, ukey=0x7f7094098ae0, stats=0x7f70cc84b870, odp_actions=0x7f70cc84b890, reval_seq=5277717917, recircs=<optimized out>) at ../ofproto/ofproto-dpif-upcall.c:2419 #141 0x0000562dcf24fe7b in revalidate.isra.0 (revalidator=<optimized out>, revalidator=<optimized out>) at ../ofproto/ofproto-dpif-upcall.c:2882 #142 0x0000562dcf027505 in udpif_revalidator (arg=0x562dd8399950) at ../ofproto/ofproto-dpif-upcall.c:1015 #143 0x0000562dcf11ff13 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:423 #144 0x00007f70d0717802 in start_thread () from /lib64/libc.so.6 #145 0x00007f70d06b7450 in clone3 () from /lib64/libc.so.6 (gdb) print n_tries $1 = 3 (gdb) print field_plen[0] $2 = 0 (gdb) print field_plen[1] $3 = 0 (gdb) print field_plen[2] $4 = 128 (gdb) print trie_ctx $5 = (struct trie_ctx *) 0x7f70cc845aa0 (gdb) print trie_ctx[0] $6 = {trie = 0x562dd837e8b0, lookup_done = false, maskbits = 32624, match_plens = {ipv6 = {__in6_u = {__u6_addr8 = "\r\000\000\000\000\000\000\000\000\016ҲR\252\021f", __u6_addr16 = {13, 0, 0, 0, 3584, 45778, 43602, 26129}, __u6_addr32 = {13, 0, 3000110592, 1712433746}}}, be32 = 13}} (gdb) print trie_ctx[1] $7 = {trie = 0x562dd837e8c0, lookup_done = false, maskbits = 32624, match_plens = {ipv6 = {__in6_u = {__u6_addr8 = "\000\000\000\000\000\000\000\000\f\000\000\000\000\000\000", __u6_addr16 = {0, 0, 0, 0, 12, 0, 0, 0}, __u6_addr32 = {0, 0, 12, 0}}}, be32 = 0}} (gdb) print trie_ctx[2] $8 = {trie = 0x0, lookup_done = false, maskbits = 2155551583, match_plens = {ipv6 = {__in6_u = {__u6_addr8 = "\000\000\000\000\000\b\000\000\000\000\000\000\000\000\000", __u6_addr16 = {0, 0, 2048, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 2048, 0, 0}}}, be32 = 0}} ------------------------------------------------------------------------------------ Thanks Numan _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev