On 2/19/21 1:09 AM, William Tu wrote: > On Wed, Feb 17, 2021 at 1:09 PM Yifeng Sun <[email protected]> wrote: >> >> ovs-vswitchd could crash under these circumstances: >> 1. When one bridge is being destroyed, ofproto_destroy() is called and >> connmgr pointer of its ofproto struct is nullified. This ofproto struct is >> deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'. >> 2. Before RCU enters quiesce state to actually free this ofproto struct, >> revalidator thread calls udpif_revalidator(), which could handle >> a learn flow and calls ofproto_flow_mod_learn(), it later calls >> ofmonitor_report() and ofproto struct's connmgr pointer is accessed. >> >> The crash stack trace is shown below: >> >> 0 ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, >> event=event@entry=NXFME_ADDED, >> reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, >> old_actions=old_actions@entry=0x0) >> at ofproto/connmgr.c:2160 >> 1 0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, >> ofm=<optimized out>, req=req@entry=0x0) >> at ofproto/ofproto.c:5221 >> 2 0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, >> ofproto=0x55d9075d4ab0) >> at ofproto/ofproto.c:5823 >> 3 ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, >> ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0) >> at ofproto/ofproto.c:8088 >> 4 0x00007fa4d680372d in ofproto_flow_mod_learn_finish >> (ofm=ofm@entry=0x7fa4980753f0, >> orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439 >> 5 0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, >> keep_ref=keep_ref@entry=true, >> limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at >> ofproto/ofproto.c:5499 >> 6 0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, >> stats=stats@entry=0x7fa4d2701a10, >> offloaded=offloaded@entry=false) at >> ofproto/ofproto-dpif-xlate-cache.c:127 >> 7 0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, >> stats=stats@entry=0x7fa4d2701a10, >> offloaded=offloaded@entry=false) at >> ofproto/ofproto-dpif-xlate-cache.c:181 >> 8 0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, >> ukey=ukey@entry=0x7fa4b0191660, >> stats=stats@entry=0x7fa4d2705118, >> odp_actions=odp_actions@entry=0x7fa4d2701b50, >> reval_seq=reval_seq@entry=5655486242, >> recircs=recircs@entry=0x7fa4d2701b40, offloaded=false) >> at ofproto/ofproto-dpif-upcall.c:2294 >> 9 0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at >> ofproto/ofproto-dpif-upcall.c:2683 >> 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at >> ofproto/ofproto-dpif-upcall.c:936 >> 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at >> lib/ovs-thread.c:423 >> 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0 >> 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6 >> >> At the time of crash, the involved ofproto was already deallocated: >> >> (gdb) print *ofproto >> $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {..., >> one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ... >> >> This patch fixes it. >> >> VMware-BZ: #2700626 >> Signed-off-by: Yifeng Sun <[email protected]> >> --- >> v1->v2: Add check for ofmonitor_flush, thanks William. >> > > LGTM, thanks. > Acked-by: William Tu < [email protected]>
Thanks! Applied to master and backported down to 2.12. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
