Mike Ovsiannikov <[email protected]> writes: > Hi Aron, > > Here is the stack trace of the failure / segmentation fault: > > #0 0x00007f33b423c1c1 in ofproto_flow_mod_learn_finish (ofm=0x230db00, > orig_ofproto=orig_ofproto@entry=0x1d39220) at ofproto/ofproto-provider.h:509 > #1 0x00007f33b4241e9a in ofproto_dpif_xcache_execute (ofproto=0x1d39200, > xcache=<optimized out>, stats=0x7fff8e76eb10) at ofproto/ofproto-dpif.c:4966 > #2 0x00007f33b4247723 in nxt_resume (ofproto_=0x1d39220, pin=0x7fff8e76f7b0) > at > ofproto/ofproto-dpif.c:5811 > #3 0x00007f33b42308dd in handle_nxt_resume (ofconn=ofconn@entry=0x20689a0, > oh=<optimized out>) at ofproto/ofproto.c:3851 > #4 0x00007f33b423f429 in handle_single_part_openflow > (type=OFPTYPE_NXT_RESUME, > oh=<optimized out>, ofconn=0x20689a0) at ofproto/ofproto.c:8823 > #5 handle_openflow (ofconn=0x20689a0, msgs=0x7fff8e7707a0) at > ofproto/ofproto.c:8959 > #6 0x00007f33b422cce0 in ofconn_run (handle_openflow=0x7f33b423e560 > <handle_openflow>, ofconn=0x20689a0) at ofproto/connmgr.c:1329 > #7 connmgr_run (mgr=0x1cfc140, > handle_openflow=handle_openflow@entry=0x7f33b423e560 <handle_openflow>) at > ofproto/connmgr.c:356 > #8 0x00007f33b4236f70 in ofproto_run (p=0x1d39220) at ofproto/ofproto.c:1990 > #9 0x00000000004101fc in bridge_run__ () at vswitchd/bridge.c:3287 > #10 0x000000000041615e in bridge_run () at vswitchd/bridge.c:3346 > #11 0x000000000040be05 in main (argc=<optimized out>, argv=<optimized out>) at > vswitchd/ovs-vswitchd.c:132 > > The learn.ofm structure appears to be valid at this point, except that > learn_adds_rule and > old_rules new_rules fields does not appear to be initialized: > > (gdb) p entry->learn.ofm[0] > $3 = { > temp_rule = 0x15fb520, > criteria = { > table_id = 1 '\001', > cr = { > node = { > prev = 0x15bd570, > next = { > p = 0x15bd570 > } > }, > priority = 31, > cls_match = { > p = 0x0 > }, > match = { > { > { > flow = 0x21b77a0, > mask = 0x21b77d0 > }, > flows = {0x21b77a0, 0x21b77d0} > }, > tun_md = 0x0 > } > }, > version = 18446744073709551614, > cookie = 0, > cookie_mask = 0, > out_port = 65535, > out_group = 4294967295, > include_hidden = false, > include_readonly = false > }, > conjs = 0x0, > n_conjs = 0, > command = 2, > modify_cookie = true, > modify_may_add_flow = true, > modify_keep_counts = true, > event = 492535614, > table_id = 1 '\001', > version = 492535619, > learn_adds_rule = 2, > old_rules = { > collection = { > objs = 0x1, > n = 492535617, > capacity = 5, > stub = {0x1d5b7f41, 0x30, 0x0, 0x0, 0x0} > } > }, > new_rules = { > collection = { > objs = 0x0, > n = 0, > capacity = 0, > stub = {0x0, 0x0, 0x0, 0x600000000, 0x0} > } > } > } > > In this particular case the following line in ofproto_flow_mod_learn_finish() > results in NULL > pointer that later is de-referenced in subsequent if statement. > struct rule *rule = rule_collection_rules(&ofm->new_rules)[0]; > > As far as I can tell, presently the flag initialization only exists in one > place: inside packet_xlate > (). Though it does not appear to me that packet_xlate() is guaranteed be > invoked after the > structure is allocated inside xlate_learn_action(). > The old and new rules collections appears to be initialized by the following > call chain, but it is > not guaranteed that this call chain is always / unconditionally executed. > ofproto_flow_mod_learn() / > ofproto_flow_mod_learn_start() / > ofproto_flow_mod_start() / > rule_collection_init() > > I did notice that the learn_adds_rule flag was introduced by 1f4a89336682 > change that > you’ve mentioned, though I’m not convinced that the flag is guaranteed to be > always > initialized.
Thanks for the details above. You're correct - it isn't being initialized properly. However, this looks like it is happening with a specific flow controller setup using some kind of continuation chain. If you have that flow chain, maybe we can also emulate it in a test. I'm actually concerned that we may have other uninitialized (or incorrectly initialized) details. > Thank you, > — Mike. > > On Oct 29, 2024, at 6:59 AM, Aaron Conole <[email protected]> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > Mike Ovsiannikov <[email protected]> writes: > > Initialize learn_adds_rule in order to prevent > crash in ofproto_flow_mod_learn_finish() due to > being invoked with uninitialized rules > collections. > > Signed-off-by: Mike Ovsiannikov <[email protected]> > > --- > > Hi Mike, > > I think this should have: > > Fixes: 1f4a89336682 ("ofproto: Refactor packet_out handling.") > > Also, I don't see many details on how we have uninitialized rules > collections. Is it possible to include a stack trace / steps to > reproduce the problem? Could it be that there is a race somewhere? > Is it during add/remove of some group? We shouldn't get into this case > without at least a valid rule, I think (at least it seems like all paths > need to cross the XC_LEARN / or a learn xlate test). _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
