Mike Ovsiannikov <[email protected]> writes:

> Hi Aron, 
>
> Here is the stack trace of the failure / segmentation fault:
>
> #0  0x00007f33b423c1c1 in ofproto_flow_mod_learn_finish (ofm=0x230db00,
> orig_ofproto=orig_ofproto@entry=0x1d39220) at ofproto/ofproto-provider.h:509
> #1  0x00007f33b4241e9a in ofproto_dpif_xcache_execute (ofproto=0x1d39200,
> xcache=<optimized out>, stats=0x7fff8e76eb10) at ofproto/ofproto-dpif.c:4966
> #2  0x00007f33b4247723 in nxt_resume (ofproto_=0x1d39220, pin=0x7fff8e76f7b0) 
> at
> ofproto/ofproto-dpif.c:5811
> #3  0x00007f33b42308dd in handle_nxt_resume (ofconn=ofconn@entry=0x20689a0,
> oh=<optimized out>) at ofproto/ofproto.c:3851
> #4  0x00007f33b423f429 in handle_single_part_openflow 
> (type=OFPTYPE_NXT_RESUME,
> oh=<optimized out>, ofconn=0x20689a0) at ofproto/ofproto.c:8823
> #5  handle_openflow (ofconn=0x20689a0, msgs=0x7fff8e7707a0) at 
> ofproto/ofproto.c:8959
> #6  0x00007f33b422cce0 in ofconn_run (handle_openflow=0x7f33b423e560
> <handle_openflow>, ofconn=0x20689a0) at ofproto/connmgr.c:1329
> #7  connmgr_run (mgr=0x1cfc140,
> handle_openflow=handle_openflow@entry=0x7f33b423e560 <handle_openflow>) at
> ofproto/connmgr.c:356
> #8  0x00007f33b4236f70 in ofproto_run (p=0x1d39220) at ofproto/ofproto.c:1990
> #9  0x00000000004101fc in bridge_run__ () at vswitchd/bridge.c:3287
> #10 0x000000000041615e in bridge_run () at vswitchd/bridge.c:3346
> #11 0x000000000040be05 in main (argc=<optimized out>, argv=<optimized out>) at
> vswitchd/ovs-vswitchd.c:132
>
> The learn.ofm structure appears to be valid at this point, except that 
> learn_adds_rule and
> old_rules new_rules fields does not appear to be initialized:
>
> (gdb) p entry->learn.ofm[0]
> $3 = {
>   temp_rule = 0x15fb520, 
>   criteria = {
>     table_id = 1 '\001', 
>     cr = {
>       node = {
>         prev = 0x15bd570, 
>         next = {
>           p = 0x15bd570
>         }
>       }, 
>       priority = 31, 
>       cls_match = {
>         p = 0x0
>       }, 
>       match = {
>         {
>           {
>             flow = 0x21b77a0, 
>             mask = 0x21b77d0
>           }, 
>           flows = {0x21b77a0, 0x21b77d0}
>         }, 
>         tun_md = 0x0
>       }
>     }, 
>     version = 18446744073709551614, 
>     cookie = 0, 
>     cookie_mask = 0, 
>     out_port = 65535, 
>     out_group = 4294967295, 
>     include_hidden = false, 
>     include_readonly = false
>   }, 
>   conjs = 0x0, 
>   n_conjs = 0, 
>   command = 2, 
>   modify_cookie = true, 
>   modify_may_add_flow = true, 
>   modify_keep_counts = true, 
>   event = 492535614, 
>   table_id = 1 '\001', 
>   version = 492535619, 
>   learn_adds_rule = 2, 
>   old_rules = {
>     collection = {
>       objs = 0x1, 
>       n = 492535617, 
>       capacity = 5, 
>       stub = {0x1d5b7f41, 0x30, 0x0, 0x0, 0x0}
>     }
>   }, 
>   new_rules = {
>     collection = {
>       objs = 0x0, 
>       n = 0, 
>       capacity = 0, 
>       stub = {0x0, 0x0, 0x0, 0x600000000, 0x0}
>     }
>   }
> }
>
> In this particular case the following line in ofproto_flow_mod_learn_finish() 
>  results in NULL
> pointer that later is de-referenced in subsequent if statement.
> struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
>
> As far as I can tell, presently the flag initialization only exists in one 
> place: inside packet_xlate
> (). Though it does not appear to me that packet_xlate() is guaranteed be 
> invoked after the
> structure is allocated inside xlate_learn_action().
> The old and new rules collections appears to be initialized by the following 
> call chain, but it is
> not  guaranteed that this call chain is always / unconditionally executed.
>  ofproto_flow_mod_learn() /
>   ofproto_flow_mod_learn_start() /
>    ofproto_flow_mod_start() /
>     rule_collection_init()
>
> I did notice that the learn_adds_rule flag was introduced by 1f4a89336682 
> change that
> you’ve mentioned, though I’m not convinced that the flag is guaranteed to be 
> always
> initialized.

Thanks for the details above.  You're correct - it isn't being
initialized properly.  However, this looks like it is happening with a
specific flow controller setup using some kind of continuation chain.
If you have that flow chain, maybe we can also emulate it in a test.
I'm actually concerned that we may have other uninitialized (or
incorrectly initialized) details.

> Thank you,
>  — Mike.
>
>  On Oct 29, 2024, at 6:59 AM, Aaron Conole <[email protected]> wrote:
>
>  !-------------------------------------------------------------------|
>   CAUTION: External Email
>
>  |-------------------------------------------------------------------!
>
>  Mike Ovsiannikov <[email protected]> writes:
>
>  Initialize learn_adds_rule in order to prevent
>  crash in ofproto_flow_mod_learn_finish() due to
>  being invoked with uninitialized rules
>  collections.
>
>  Signed-off-by: Mike Ovsiannikov <[email protected]>
>
>  ---
>
>  Hi Mike,
>
>  I think this should have:
>
>  Fixes: 1f4a89336682 ("ofproto: Refactor packet_out handling.")
>
>  Also, I don't see many details on how we have uninitialized rules
>  collections.  Is it possible to include a stack trace / steps to
>  reproduce the problem?  Could it be that there is a race somewhere?
>  Is it during add/remove of some group?  We shouldn't get into this case
>  without at least a valid rule, I think (at least it seems like all paths
>  need to cross the XC_LEARN / or a learn xlate test).

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to