Hi Aron,
Here is the stack trace of the failure / segmentation fault:
#0 0x00007f33b423c1c1 in ofproto_flow_mod_learn_finish (ofm=0x230db00,
orig_ofproto=orig_ofproto@entry=0x1d39220) at ofproto/ofproto-provider.h:509
#1 0x00007f33b4241e9a in ofproto_dpif_xcache_execute (ofproto=0x1d39200,
xcache=<optimized out>, stats=0x7fff8e76eb10) at ofproto/ofproto-dpif.c:4966
#2 0x00007f33b4247723 in nxt_resume (ofproto_=0x1d39220, pin=0x7fff8e76f7b0)
at ofproto/ofproto-dpif.c:5811
#3 0x00007f33b42308dd in handle_nxt_resume (ofconn=ofconn@entry=0x20689a0,
oh=<optimized out>) at ofproto/ofproto.c:3851
#4 0x00007f33b423f429 in handle_single_part_openflow (type=OFPTYPE_NXT_RESUME,
oh=<optimized out>, ofconn=0x20689a0) at ofproto/ofproto.c:8823
#5 handle_openflow (ofconn=0x20689a0, msgs=0x7fff8e7707a0) at
ofproto/ofproto.c:8959
#6 0x00007f33b422cce0 in ofconn_run (handle_openflow=0x7f33b423e560
<handle_openflow>, ofconn=0x20689a0) at ofproto/connmgr.c:1329
#7 connmgr_run (mgr=0x1cfc140,
handle_openflow=handle_openflow@entry=0x7f33b423e560 <handle_openflow>) at
ofproto/connmgr.c:356
#8 0x00007f33b4236f70 in ofproto_run (p=0x1d39220) at ofproto/ofproto.c:1990
#9 0x00000000004101fc in bridge_run__ () at vswitchd/bridge.c:3287
#10 0x000000000041615e in bridge_run () at vswitchd/bridge.c:3346
#11 0x000000000040be05 in main (argc=<optimized out>, argv=<optimized out>) at
vswitchd/ovs-vswitchd.c:132
The learn.ofm structure appears to be valid at this point, except that
learn_adds_rule and old_rules new_rules fields does not appear to be
initialized:
(gdb) p entry->learn.ofm[0]
$3 = {
temp_rule = 0x15fb520,
criteria = {
table_id = 1 '\001',
cr = {
node = {
prev = 0x15bd570,
next = {
p = 0x15bd570
}
},
priority = 31,
cls_match = {
p = 0x0
},
match = {
{
{
flow = 0x21b77a0,
mask = 0x21b77d0
},
flows = {0x21b77a0, 0x21b77d0}
},
tun_md = 0x0
}
},
version = 18446744073709551614,
cookie = 0,
cookie_mask = 0,
out_port = 65535,
out_group = 4294967295,
include_hidden = false,
include_readonly = false
},
conjs = 0x0,
n_conjs = 0,
command = 2,
modify_cookie = true,
modify_may_add_flow = true,
modify_keep_counts = true,
event = 492535614,
table_id = 1 '\001',
version = 492535619,
learn_adds_rule = 2,
old_rules = {
collection = {
objs = 0x1,
n = 492535617,
capacity = 5,
stub = {0x1d5b7f41, 0x30, 0x0, 0x0, 0x0}
}
},
new_rules = {
collection = {
objs = 0x0,
n = 0,
capacity = 0,
stub = {0x0, 0x0, 0x0, 0x600000000, 0x0}
}
}
}
In this particular case the following line in ofproto_flow_mod_learn_finish()
results in NULL pointer that later is de-referenced in subsequent if statement.
struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
As far as I can tell, presently the flag initialization only exists in one
place: inside packet_xlate(). Though it does not appear to me that
packet_xlate() is guaranteed be invoked after the structure is allocated inside
xlate_learn_action().
The old and new rules collections appears to be initialized by the following
call chain, but it is not guaranteed that this call chain is always /
unconditionally executed.
ofproto_flow_mod_learn() /
ofproto_flow_mod_learn_start() /
ofproto_flow_mod_start() /
rule_collection_init()
I did notice that the learn_adds_rule flag was introduced by 1f4a89336682
change that you’ve mentioned, though I’m not convinced that the flag is
guaranteed to be always initialized.
Thank you,
— Mike.
On Oct 29, 2024, at 6:59 AM, Aaron Conole <[email protected]> wrote:
!-------------------------------------------------------------------|
CAUTION: External Email
|-------------------------------------------------------------------!
Mike Ovsiannikov <[email protected]> writes:
Initialize learn_adds_rule in order to prevent
crash in ofproto_flow_mod_learn_finish() due to
being invoked with uninitialized rules
collections.
Signed-off-by: Mike Ovsiannikov <[email protected]>
---
Hi Mike,
I think this should have:
Fixes: 1f4a89336682 ("ofproto: Refactor packet_out handling.")
Also, I don't see many details on how we have uninitialized rules
collections. Is it possible to include a stack trace / steps to
reproduce the problem? Could it be that there is a race somewhere?
Is it during add/remove of some group? We shouldn't get into this case
without at least a valid rule, I think (at least it seems like all paths
need to cross the XC_LEARN / or a learn xlate test).
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev