On 17 Oct 2018, at 14:03, nusid...@redhat.com wrote:

From: Numan Siddique <nusid...@redhat.com>

We see the below trace when a port is added to a bridge and the configured
controller is down

0x00007fb002f8b207 in raise () from /lib64/libc.so.6
0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
0x00007fb004953026 in ofputil_protocol_to_ofp_version () from /lib64/libopenvswitch-2.10.so.0 0x00007fb00494e38e in ofputil_encode_port_status () from /lib64/libopenvswitch-2.10.so.0 0x00007fb004ef1c5b in connmgr_send_port_status () from /lib64/libofproto-2.10.so.0 0x00007fb004efa9f4 in ofport_install () from /lib64/libofproto-2.10.so.0
0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0
0x00007fb004efc7f9 in ofproto_port_add () from /lib64/libofproto-2.10.so.0
0x0000556d540a3f95 in bridge_add_ports__ ()
0x0000556d540a5a47 in bridge_reconfigure ()
0x0000556d540a9199 in bridge_run ()
0x0000556d540a02a5 in main ()


I have a similar crash with the following backtrace:

#0  0x00007f3c6524b207 in raise () from /lib64/libc.so.6
#1  0x00007f3c6524c8f8 in abort () from /lib64/libc.so.6
#2 0x00007f3c66c06cb7 in ofputil_encode_flow_removed (fr=fr@entry=0x7f3c59ff9b80, protocol=<optimized out>)
    at lib/ofp-monitor.c:293
#3 0x00007f3c671b1db3 in connmgr_send_flow_removed (mgr=mgr@entry=0x56197f5a4800, fr=fr@entry=0x7f3c59ff9b80)
    at ofproto/connmgr.c:1702
#4 0x00007f3c671b7464 in ofproto_rule_send_removed (rule=0x56197f69db80) at ofproto/ofproto.c:5729 #5 0x00007f3c671bdc3d in rule_destroy_cb (rule=0x56197f69db80) at ofproto/ofproto.c:2839
#6  0x00007f3c66c1e88e in ovsrcu_call_postponed () at lib/ovs-rcu.c:342
#7 0x00007f3c66c1ea94 in ovsrcu_postpone_thread (arg=<optimized out>) at lib/ovs-rcu.c:357 #8 0x00007f3c66c20d2f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:354
#9  0x00007f3c66000dd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3c65313b3d in clone () from /lib64/libc.so.6

When connmgr detects that the connection to the controller is down, it
resets the ofconn's protocol to 'OFPUTIL_P_NONE' and that's why we
see the above abort. This patch fixes the issue by also checking the
connection status before sending the port status in the
 connmgr_send_port_status().

Same issue, in my case the connection is in S_BACKOFF state.

The issue can be reproduced by running the test added in this patch
without the fix.

Signed-off-by: Numan Siddique <nusid...@redhat.com>
---
 ofproto/connmgr.c |  3 ++-
 tests/bridge.at   | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index f78b4c5ff..02ba75938 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1624,7 +1624,8 @@ connmgr_send_port_status(struct connmgr *mgr, struct ofconn *source,
     ps.reason = reason;
     ps.desc = *pp;
     LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
- if (ofconn_receives_async_msg(ofconn, OAM_PORT_STATUS, reason)) { + if (ofconn_receives_async_msg(ofconn, OAM_PORT_STATUS, reason) &&
+            rconn_is_connected(ofconn->rconn)) {
             struct ofpbuf *msg;

I could add a similar fix in connmgr_send_flow_removed(). However, I was wondering why this problem is surfacing now, did anything change that would start to trigger this issue?

/* Before 1.5, OpenFlow specified that OFPT_PORT_MOD should not
diff --git a/tests/bridge.at b/tests/bridge.at
index 1c3618563..ee398bdb1 100644
--- a/tests/bridge.at
+++ b/tests/bridge.at
@@ -79,3 +79,24 @@ AT_CHECK([ovs-vsctl --columns=status list controller | dnl
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 AT_CLEANUP
+
+AT_SETUP([bridge - add port after stopping controller])
+OVS_VSWITCHD_START
+
+dnl Start ovs-testcontroller
+ovs-testcontroller --detach punix:controller --pidfile=ovs-testcontroller.pid
+OVS_WAIT_UNTIL([test -e controller])
+
+AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=internal], [0], [ignore])
+AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore])
+
+# Now kill the ovs-testcontroller
+kill `cat ovs-testcontroller.pid`
+OVS_WAIT_UNTIL([! test -e controller])
+AT_CHECK([ovs-vsctl --no-wait add-port br0 p2 -- set Interface p2 type=internal], [0], [ignore])
+AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore])
+
+OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+AT_CLEANUP
--
2.17.2

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to