On Wed, Oct 17, 2018 at 7:45 PM Eelco Chaudron <echau...@redhat.com> wrote:
> > > On 17 Oct 2018, at 14:03, nusid...@redhat.com wrote: > > > From: Numan Siddique <nusid...@redhat.com> > > > > We see the below trace when a port is added to a bridge and the > > configured > > controller is down > > > > 0x00007fb002f8b207 in raise () from /lib64/libc.so.6 > > 0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6 > > 0x00007fb004953026 in ofputil_protocol_to_ofp_version () from > > /lib64/libopenvswitch-2.10.so.0 > > 0x00007fb00494e38e in ofputil_encode_port_status () from > > /lib64/libopenvswitch-2.10.so.0 > > 0x00007fb004ef1c5b in connmgr_send_port_status () from > > /lib64/libofproto-2.10.so.0 > > 0x00007fb004efa9f4 in ofport_install () from > > /lib64/libofproto-2.10.so.0 > > 0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0 > > 0x00007fb004efc7f9 in ofproto_port_add () from > > /lib64/libofproto-2.10.so.0 > > 0x0000556d540a3f95 in bridge_add_ports__ () > > 0x0000556d540a5a47 in bridge_reconfigure () > > 0x0000556d540a9199 in bridge_run () > > 0x0000556d540a02a5 in main () > > > > I have a similar crash with the following backtrace: > > #0 0x00007f3c6524b207 in raise () from /lib64/libc.so.6 > #1 0x00007f3c6524c8f8 in abort () from /lib64/libc.so.6 > #2 0x00007f3c66c06cb7 in ofputil_encode_flow_removed > (fr=fr@entry=0x7f3c59ff9b80, protocol=<optimized out>) > at lib/ofp-monitor.c:293 > #3 0x00007f3c671b1db3 in connmgr_send_flow_removed > (mgr=mgr@entry=0x56197f5a4800, fr=fr@entry=0x7f3c59ff9b80) > at ofproto/connmgr.c:1702 > #4 0x00007f3c671b7464 in ofproto_rule_send_removed > (rule=0x56197f69db80) at ofproto/ofproto.c:5729 > #5 0x00007f3c671bdc3d in rule_destroy_cb (rule=0x56197f69db80) at > ofproto/ofproto.c:2839 > #6 0x00007f3c66c1e88e in ovsrcu_call_postponed () at lib/ovs-rcu.c:342 > #7 0x00007f3c66c1ea94 in ovsrcu_postpone_thread (arg=<optimized out>) > at lib/ovs-rcu.c:357 > #8 0x00007f3c66c20d2f in ovsthread_wrapper (aux_=<optimized out>) at > lib/ovs-thread.c:354 > #9 0x00007f3c66000dd5 in start_thread () from /lib64/libpthread.so.0 > #10 0x00007f3c65313b3d in clone () from /lib64/libc.so.6 > > > When connmgr detects that the connection to the controller is down, it > > resets the ofconn's protocol to 'OFPUTIL_P_NONE' and that's why we > > see the above abort. This patch fixes the issue by also checking the > > connection status before sending the port status in the > > connmgr_send_port_status(). > > Same issue, in my case the connection is in S_BACKOFF state. > > > > The issue can be reproduced by running the test added in this patch > > without the fix. > > > > Signed-off-by: Numan Siddique <nusid...@redhat.com> > > --- > > ofproto/connmgr.c | 3 ++- > > tests/bridge.at | 21 +++++++++++++++++++++ > > 2 files changed, 23 insertions(+), 1 deletion(-) > > > > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c > > index f78b4c5ff..02ba75938 100644 > > --- a/ofproto/connmgr.c > > +++ b/ofproto/connmgr.c > > @@ -1624,7 +1624,8 @@ connmgr_send_port_status(struct connmgr *mgr, > > struct ofconn *source, > > ps.reason = reason; > > ps.desc = *pp; > > LIST_FOR_EACH (ofconn, node, &mgr->all_conns) { > > - if (ofconn_receives_async_msg(ofconn, OAM_PORT_STATUS, > > reason)) { > > + if (ofconn_receives_async_msg(ofconn, OAM_PORT_STATUS, > > reason) && > > + rconn_is_connected(ofconn->rconn)) { > > struct ofpbuf *msg; > > I could add a similar fix in connmgr_send_flow_removed(). However, I was > wondering why this problem is surfacing now, did anything change that > would start to trigger this issue? > You are right. Probably it's better to figure out why the function "ofconn_receives_async_msg" is returning true now while it was returning false earlier when connection to controller is lost and fix in the right place. I can see the issue with master, branch 2.10, v2.10.0 and with branch 2.9. However I don't see the issue with v2.9.2. Thanks Numan > > /* Before 1.5, OpenFlow specified that OFPT_PORT_MOD > > should not > > diff --git a/tests/bridge.at b/tests/bridge.at > > index 1c3618563..ee398bdb1 100644 > > --- a/tests/bridge.at > > +++ b/tests/bridge.at > > @@ -79,3 +79,24 @@ AT_CHECK([ovs-vsctl --columns=status list > > controller | dnl > > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > AT_CLEANUP > > + > > +AT_SETUP([bridge - add port after stopping controller]) > > +OVS_VSWITCHD_START > > + > > +dnl Start ovs-testcontroller > > +ovs-testcontroller --detach punix:controller > > --pidfile=ovs-testcontroller.pid > > +OVS_WAIT_UNTIL([test -e controller]) > > + > > +AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) > > +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 > > type=internal], [0], [ignore]) > > +AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore]) > > + > > +# Now kill the ovs-testcontroller > > +kill `cat ovs-testcontroller.pid` > > +OVS_WAIT_UNTIL([! test -e controller]) > > +AT_CHECK([ovs-vsctl --no-wait add-port br0 p2 -- set Interface p2 > > type=internal], [0], [ignore]) > > +AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore]) > > + > > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > +AT_CLEANUP > > -- > > 2.17.2 > > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev