Hi Aron,

Here is the stack trace of the failure / segmentation fault:

#0  0x00007f33b423c1c1 in ofproto_flow_mod_learn_finish (ofm=0x230db00, 
orig_ofproto=orig_ofproto@entry=0x1d39220) at ofproto/ofproto-provider.h:509
#1  0x00007f33b4241e9a in ofproto_dpif_xcache_execute (ofproto=0x1d39200, 
xcache=<optimized out>, stats=0x7fff8e76eb10) at ofproto/ofproto-dpif.c:4966
#2  0x00007f33b4247723 in nxt_resume (ofproto_=0x1d39220, pin=0x7fff8e76f7b0) 
at ofproto/ofproto-dpif.c:5811
#3  0x00007f33b42308dd in handle_nxt_resume (ofconn=ofconn@entry=0x20689a0, 
oh=<optimized out>) at ofproto/ofproto.c:3851
#4  0x00007f33b423f429 in handle_single_part_openflow (type=OFPTYPE_NXT_RESUME, 
oh=<optimized out>, ofconn=0x20689a0) at ofproto/ofproto.c:8823
#5  handle_openflow (ofconn=0x20689a0, msgs=0x7fff8e7707a0) at 
ofproto/ofproto.c:8959
#6  0x00007f33b422cce0 in ofconn_run (handle_openflow=0x7f33b423e560 
<handle_openflow>, ofconn=0x20689a0) at ofproto/connmgr.c:1329
#7  connmgr_run (mgr=0x1cfc140, 
handle_openflow=handle_openflow@entry=0x7f33b423e560 <handle_openflow>) at 
ofproto/connmgr.c:356
#8  0x00007f33b4236f70 in ofproto_run (p=0x1d39220) at ofproto/ofproto.c:1990
#9  0x00000000004101fc in bridge_run__ () at vswitchd/bridge.c:3287
#10 0x000000000041615e in bridge_run () at vswitchd/bridge.c:3346
#11 0x000000000040be05 in main (argc=<optimized out>, argv=<optimized out>) at 
vswitchd/ovs-vswitchd.c:132

The learn.ofm structure appears to be valid at this point, except that 
learn_adds_rule and old_rules new_rules fields does not appear to be 
initialized:

(gdb) p entry->learn.ofm[0]
$3 = {
  temp_rule = 0x15fb520,
  criteria = {
    table_id = 1 '\001',
    cr = {
      node = {
        prev = 0x15bd570,
        next = {
          p = 0x15bd570
        }
      },
      priority = 31,
      cls_match = {
        p = 0x0
      },
      match = {
        {
          {
            flow = 0x21b77a0,
            mask = 0x21b77d0
          },
          flows = {0x21b77a0, 0x21b77d0}
        },
        tun_md = 0x0
      }
    },
    version = 18446744073709551614,
    cookie = 0,
    cookie_mask = 0,
    out_port = 65535,
    out_group = 4294967295,
    include_hidden = false,
    include_readonly = false
  },
  conjs = 0x0,
  n_conjs = 0,
  command = 2,
  modify_cookie = true,
  modify_may_add_flow = true,
  modify_keep_counts = true,
  event = 492535614,
  table_id = 1 '\001',
  version = 492535619,
  learn_adds_rule = 2,
  old_rules = {
    collection = {
      objs = 0x1,
      n = 492535617,
      capacity = 5,
      stub = {0x1d5b7f41, 0x30, 0x0, 0x0, 0x0}
    }
  },
  new_rules = {
    collection = {
      objs = 0x0,
      n = 0,
      capacity = 0,
      stub = {0x0, 0x0, 0x0, 0x600000000, 0x0}
    }
  }
}

In this particular case the following line in ofproto_flow_mod_learn_finish()  
results in NULL pointer that later is de-referenced in subsequent if statement.
struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];

As far as I can tell, presently the flag initialization only exists in one 
place: inside packet_xlate(). Though it does not appear to me that 
packet_xlate() is guaranteed be invoked after the structure is allocated inside 
xlate_learn_action().
The old and new rules collections appears to be initialized by the following 
call chain, but it is not  guaranteed that this call chain is always / 
unconditionally executed.
 ofproto_flow_mod_learn() /
  ofproto_flow_mod_learn_start() /
   ofproto_flow_mod_start() /
    rule_collection_init()

I did notice that the learn_adds_rule flag was introduced by 1f4a89336682 
change that you’ve mentioned, though I’m not convinced that the flag is 
guaranteed to be always initialized.

Thank you,
 — Mike.

On Oct 29, 2024, at 6:59 AM, Aaron Conole <[email protected]> wrote:

!-------------------------------------------------------------------|
 CAUTION: External Email

|-------------------------------------------------------------------!

Mike Ovsiannikov <[email protected]> writes:

Initialize learn_adds_rule in order to prevent
crash in ofproto_flow_mod_learn_finish() due to
being invoked with uninitialized rules
collections.

Signed-off-by: Mike Ovsiannikov <[email protected]>

---

Hi Mike,

I think this should have:

Fixes: 1f4a89336682 ("ofproto: Refactor packet_out handling.")

Also, I don't see many details on how we have uninitialized rules
collections.  Is it possible to include a stack trace / steps to
reproduce the problem?  Could it be that there is a race somewhere?
Is it during add/remove of some group?  We shouldn't get into this case
without at least a valid rule, I think (at least it seems like all paths
need to cross the XC_LEARN / or a learn xlate test).


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to