Hi OVS maintainers and others,

In our deployments where we run OVS 3.2.2, we see a crash in
ovs-vswitchd when the ovn-controller
gets upgraded from 23.09 to 24.09.  The crash is seen exactly when
ovn-controller restarts
and connects to ovs-vswitchd.  We have configured
ovn-ofctrl-wait-before-clear="300000".
So ovn-controller doesn't delete the existing openflow rules as soon
as it connects to ovs-vswitchd.

In 24.09, ovn-controller now configures the flow_table prefixes for
each table [1] which was not
the case in 23.09.

We couldn't reproduce the issue again and it was seen only once on a
compute node.
Other compute nodes in the deployment didn't see this crash when the
upgrade happened.

I suspect if we are hitting the same issue which this commit tried to fix [2].

From the coredump analysis,
`check_tries` function at `../lib/classifier.c:1585` is where the
crash seems to occur.
It's trying to access `ctx->trie->field` and it is NULL.

Any pointers on what could be going on here ?

[1] - 8d64f2b7dc ("controller: Fix IPv6 dp flow explosion by setting
flow table prefixes.")
      
https://github.com/ovn-org/ovn/commit/8d64f2b7dcef24d175151ba5e0732281cdeb6d54

[2] - a6117059904 "(classifier: Prevent tries vs n_tries race leading
to NULL dereference.")
      
https://github.com/openvswitch/ovs/commit/a6117059904bb692039c926221964dd6d49b3bfd


Below is the core dump backtrace
------------------------------------------------------------------------------------
warning: Section `.reg-xstate/13332' in core file too small.
#0  0x0000562dcf0432fc in check_tries
(trie_ctx=trie_ctx@entry=0x7f70cc845aa0, n_tries=n_tries@entry=3,
field_plen=field_plen@entry=0x562ddbe82de8, range_map=...,
flow=flow@entry=0x7f70cc84ab20, wc=0x7f70cc84aee0) at
../lib/classifier.c:1585
1585                 = ovsrcu_get(struct mf_field *, &ctx->trie->field);
[Current thread is 1 (Thread 0x7f70cc851640 (LWP 13332))]
Missing separate debuginfos, use: dnf debuginfo-install
bzip2-libs-1.0.8-8.el9.x86_64 libacl-2.3.1-3.el9.x86_64
libattr-2.5.1-3.el9.x86_64 libevent-2.1.12-6.el9.x86_64
libnghttp2-1.43.0-5.el9.x86_64 libxml2-2.9.13-2.el9.x86_64
libzstd-1.5.1-2.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64
python3-libs-3.9.14-1.el9_1.1.x86_64
sssd-client-2.7.3-4.el9_1.1.x86_64 xz-libs-5.2.5-8.el9_0.x86_64
(gdb) bt
#0  0x0000562dcf0432fc in check_tries
(trie_ctx=trie_ctx@entry=0x7f70cc845aa0, n_tries=n_tries@entry=3,
field_plen=field_plen@entry=0x562ddbe82de8, range_map=...,
flow=flow@entry=0x7f70cc84ab20, wc=0x7f70cc84aee0) at
../lib/classifier.c:1585
#1  0x0000562dcf04b196 in find_match_wc (wc=0x7f70cc84aee0, n_tries=3,
trie_ctx=0x7f70cc845aa0, flow=0x7f70cc84ab20, version=327971,
subtable=0x562ddbe82d80) at ../lib/classifier.c:1704
#2  classifier_lookup__ (cls=<optimized out>, version=<optimized out>,
flow=<optimized out>, wc=<optimized out>,
allow_conjunctive_matches=<optimized out>) at ../lib/classifier.c:975
#3  0x0000562dcf008f17 in classifier_lookup (wc=0x7f70cc84aee0,
flow=0x7f70cc84ab20, version=327971, cls=<optimized out>) at
../lib/classifier.c:1169
#4  rule_dpif_lookup_in_table (ofproto=0x562dd837cc30,
ofproto=0x562dd837cc30, wc=0x7f70cc84aee0, flow=0x7f70cc84ab20,
table_id=10 '\n', version=327971) at ../ofproto/ofproto-dpif.c:4393
#5  rule_dpif_lookup_from_table (ofproto=0x562dd837cc30,
version=327971, flow=0x7f70cc84ab20, wc=<optimized out>,
stats=<optimized out>, table_id=0x7f70cc84a5d0 "\n", in_port=65535,
may_packet_in=false, honor_table_miss=false, xcache=0x7f70b4010590) at
../ofproto/ofproto-dpif.c:4500
#6  0x0000562dcf0327f1 in xlate_table_action
(ctx=ctx@entry=0x7f70cc84a290, in_port=in_port@entry=65535,
table_id=<optimized out>, may_packet_in=<optimized out>,
honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
is_last_action=false, xlator=0x562dcf034390 <do_xlate_actions>) at
../ofproto/ofproto-dpif-xlate.c:4650
#7  0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>,
is_last_action=<optimized out>, with_ct_orig=<optimized out>,
honor_table_miss=<optimized out>, may_packet_in=<optimized out>,
table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized
out>) at ../ofproto/ofproto-dpif-xlate.c:7397
#8  do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized
out>, ctx=<optimized out>, is_last_action=<optimized out>,
group_bucket_action=<optimized out>) at
../ofproto/ofproto-dpif-xlate.c:7397
#9  0x0000562dcf03274f in xlate_recursively
(actions_xlator=0x562dcf034390 <do_xlate_actions>,
is_last_action=false, deepens=<optimized out>, rule=0x562dec06f8e0,
ctx=0x7f70cc84a290) at ../ofproto/ofproto-dpif-xlate.c:4548
#10 xlate_table_action (ctx=ctx@entry=0x7f70cc84a290,
in_port=in_port@entry=65535, table_id=<optimized out>,
may_packet_in=<optimized out>, honor_table_miss=<optimized out>,
with_ct_orig=<optimized out>, is_last_action=false,
xlator=0x562dcf034390 <do_xlate_actions>) at
../ofproto/ofproto-dpif-xlate.c:4677
#11 0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>,
is_last_action=<optimized out>, with_ct_orig=<optimized out>,
honor_table_miss=<optimized out>, may_packet_in=<optimized out>,
table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized
out>) at ../ofproto/ofproto-dpif-xlate.c:7397
...
...
#128 0x0000562dcf03274f in xlate_recursively
(actions_xlator=0x562dcf034390 <do_xlate_actions>,
is_last_action=true, deepens=<optimized out>, rule=0x562deed5d5b0,
ctx=0x7f70cc84a290) at ../ofproto/ofproto-dpif-xlate.c:4548
#129 xlate_table_action (ctx=ctx@entry=0x7f70cc84a290,
in_port=in_port@entry=2004, table_id=<optimized out>,
may_packet_in=<optimized out>, honor_table_miss=<optimized out>,
with_ct_orig=<optimized out>, is_last_action=true,
xlator=0x562dcf034390 <do_xlate_actions>) at
../ofproto/ofproto-dpif-xlate.c:4677
#130 0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>,
is_last_action=<optimized out>, with_ct_orig=<optimized out>,
honor_table_miss=<optimized out>, may_packet_in=<optimized out>,
table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized
out>) at ../ofproto/ofproto-dpif-xlate.c:7397
#131 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized
out>, ctx=<optimized out>, is_last_action=<optimized out>,
group_bucket_action=<optimized out>) at
../ofproto/ofproto-dpif-xlate.c:7397
#132 0x0000562dcf03274f in xlate_recursively
(actions_xlator=0x562dcf034390 <do_xlate_actions>,
is_last_action=true, deepens=<optimized out>, rule=0x562deed520d0,
ctx=0x7f70cc84a290) at ../ofproto/ofproto-dpif-xlate.c:4548
#133 xlate_table_action (ctx=ctx@entry=0x7f70cc84a290,
in_port=in_port@entry=2004, table_id=<optimized out>,
may_packet_in=<optimized out>, honor_table_miss=<optimized out>,
with_ct_orig=<optimized out>, is_last_action=true,
xlator=0x562dcf034390 <do_xlate_actions>) at
../ofproto/ofproto-dpif-xlate.c:4677
#134 0x0000562dcf03635d in xlate_table_action (xlator=<optimized out>,
is_last_action=<optimized out>, with_ct_orig=<optimized out>,
honor_table_miss=<optimized out>, may_packet_in=<optimized out>,
table_id=<optimized out>, in_port=<optimized out>, ctx=<optimized
out>) at ../ofproto/ofproto-dpif-xlate.c:7397
#135 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized
out>, ctx=<optimized out>, is_last_action=<optimized out>,
group_bucket_action=<optimized out>) at
../ofproto/ofproto-dpif-xlate.c:7397
#136 0x0000562dcf03acc0 in xlate_actions (xin=0x7f70cc84ab10,
xout=0x7f70cc84b440) at ../ofproto/ofproto-dpif-xlate.c:8256
#137 0x0000562dcf025cc4 in xlate_key
(udpif=udpif@entry=0x562dd83058b0, key=<optimized out>, len=<optimized
out>, push=push@entry=0x7f70cc84aec0, ctx=ctx@entry=0x7f70cc84b420) at
../ofproto/ofproto-dpif-upcall.c:2218
#138 0x0000562dcf02665a in xlate_ukey (ukey=0x7f7094098ae0,
ukey=0x7f7094098ae0, ctx=0x7f70cc84b420, tcp_flags=<optimized out>,
udpif=0x562dd83058b0) at ../ofproto/ofproto-dpif-upcall.c:2233
#139 revalidate_ukey__ (udpif=udpif@entry=0x562dd83058b0,
ukey=ukey@entry=0x7f7094098ae0, tcp_flags=<optimized out>,
odp_actions=odp_actions@entry=0x7f70cc84b890,
recircs=recircs@entry=0x7f70cc84b970, xcache=<optimized out>) at
../ofproto/ofproto-dpif-upcall.c:2282
#140 0x0000562dcf026a96 in revalidate_ukey (udpif=<optimized out>,
ukey=0x7f7094098ae0, stats=0x7f70cc84b870, odp_actions=0x7f70cc84b890,
reval_seq=5277717917, recircs=<optimized out>) at
../ofproto/ofproto-dpif-upcall.c:2419
#141 0x0000562dcf24fe7b in revalidate.isra.0 (revalidator=<optimized
out>, revalidator=<optimized out>) at
../ofproto/ofproto-dpif-upcall.c:2882

#142 0x0000562dcf027505 in udpif_revalidator (arg=0x562dd8399950) at
../ofproto/ofproto-dpif-upcall.c:1015
#143 0x0000562dcf11ff13 in ovsthread_wrapper (aux_=<optimized out>) at
../lib/ovs-thread.c:423
#144 0x00007f70d0717802 in start_thread () from /lib64/libc.so.6
#145 0x00007f70d06b7450 in clone3 () from /lib64/libc.so.6


(gdb) print n_tries
$1 = 3
(gdb) print field_plen[0]
$2 = 0
(gdb) print field_plen[1]
$3 = 0
(gdb) print field_plen[2]
$4 = 128
(gdb) print trie_ctx
$5 = (struct trie_ctx *) 0x7f70cc845aa0
(gdb) print trie_ctx[0]
$6 = {trie = 0x562dd837e8b0, lookup_done = false, maskbits = 32624,
match_plens = {ipv6 = {__in6_u = {__u6_addr8 =
"\r\000\000\000\000\000\000\000\000\016ҲR\252\021f", __u6_addr16 =
{13, 0, 0, 0, 3584, 45778, 43602, 26129}, __u6_addr32 = {13, 0,
3000110592, 1712433746}}}, be32 = 13}}
(gdb) print trie_ctx[1]
$7 = {trie = 0x562dd837e8c0, lookup_done = false, maskbits = 32624,
match_plens = {ipv6 = {__in6_u = {__u6_addr8 =
"\000\000\000\000\000\000\000\000\f\000\000\000\000\000\000",
__u6_addr16 = {0, 0, 0, 0, 12, 0, 0, 0}, __u6_addr32 = {0, 0, 12,
0}}}, be32 = 0}}
(gdb) print trie_ctx[2]
$8 = {trie = 0x0, lookup_done = false, maskbits = 2155551583,
match_plens = {ipv6 = {__in6_u = {__u6_addr8 =
"\000\000\000\000\000\b\000\000\000\000\000\000\000\000\000",
__u6_addr16 = {0, 0, 2048, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 2048, 0,
0}}}, be32 = 0}}

------------------------------------------------------------------------------------

Thanks
Numan
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to