Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
On Tue, May 16, 2017 at 12:07:38PM +0800, ychen wrote: > I can reproduce this problem with the script supported by vguntaka in both > ovs version 2.5 and ovs version 2.6. > > 1. Add bridge > > ovs-vsctl add-br br0 > > > > 2. Add vm port > > ovs-vsctl add-port br0 tap0 – set interface tap0 type=internal > > > > ip netns add ns0 > > ip link set dev tap0 netns ns0 > > ip netns exec ns0 ip link set dev tap0 up > > ip netns exec ns0 ip addr add dev tap0 1.1.1.2/24 > > ip netns exec ns0 ip route add default via 1.1.1.1 > > ip netns exec ns0 ip neigh add 1.1.1.1 lladdr 00:00:00:00:11:11 dev tap0 > > > > 3. Send packet > > Ip netns exec ns0 ping 2.2.2.2 > > > > 4. Add flows (make sure packet is always sending) > > while true > > do > > ovs-ofctl add-flow br0 > "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" > > ovs-ofctl add-flow br0 > "priority=200,table=123,idle_timeout=1,in_port=2,actions=controller" > > ovs-ofctl add-flow br0 > "priority=200,table=123,idle_timeout=1,in_port=3,actions=controller" > > ovs-ofctl add-flow br0 > "priority=200,table=123,idle_timeout=1,in_port=4,actions=controller" > > ovs-ofctl del-flows br0 > > done > > > > waiting about 1 minute or 2 minute, there is error “OFPT_ERROR (xid=0x4): > OFPFMFC_BAD_COMMAND” printed in console. Thanks for working on debugging this. Can you get a backtrace from GDB, or an error report from valgrind? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
I can reproduce this problem with the script supported by vguntaka in both ovs version 2.5 and ovs version 2.6. 1. Add bridge ovs-vsctl add-br br0 2. Add vm port ovs-vsctl add-port br0 tap0 – set interface tap0 type=internal ip netns add ns0 ip link set dev tap0 netns ns0 ip netns exec ns0 ip link set dev tap0 up ip netns exec ns0 ip addr add dev tap0 1.1.1.2/24 ip netns exec ns0 ip route add default via 1.1.1.1 ip netns exec ns0 ip neigh add 1.1.1.1 lladdr 00:00:00:00:11:11 dev tap0 3. Send packet Ip netns exec ns0 ping 2.2.2.2 4. Add flows (make sure packet is always sending) while true do ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=2,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=3,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=4,actions=controller" ovs-ofctl del-flows br0 done waiting about 1 minute or 2 minute, there is error “OFPT_ERROR (xid=0x4): OFPFMFC_BAD_COMMAND” printed in console. Also, I noticed that when using Openflow13, the error disappeared, like this: ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" –O openflow13 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
Confirmed, can be easily reproduced using described method. Using ovs 2.6.2 On 01/23/2017 11:58 PM, Vidyasagara Guntaka via dev wrote: Hi Ben, We could reproduce this with the latest version 2.6.1. When we compiled the code, we removed -O2 from CFLAGS. This seems to make it happen more frequently. With the following script running, the error starts happening within a few seconds and then continues to happen after every few seconds. In summary, our suspicion is that having no controller set and having no NORMAL processing flow seems to trigger the stack that was pointed out in the gdb session more often and hence we hit this race condition very easily using the following script. (Even if there is a default NORMAL processing flow entry, after the first iteration of the script below, that will be deleted). Also, a few things about the setup - just in case: * enp5s0 belongs to the physical interface on this hypervisor. * vport0 and vport1 belong to tap interfaces corresponding to two VMs running on this hypervisor. * When the script was running, we had pings issued from the VMs so that packets make it to the bridge br0. Here is a small script that makes it happen on multiple of our hypervisors: while true do ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=2,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=3,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=4,actions=controller" ovs-ofctl del-flows br0 done Here is our bridge br0 setup: [root@deepspace ~]# ifconfig br0 br0: flags=4163mtu 1500 inet 192.168.2.142 netmask 255.255.255.0 broadcast 192.168.2.255 inet6 fe80::213:3bff:fe0f:1301 prefixlen 64 scopeid 0x20 ether 00:13:3b:0f:13:01 txqueuelen 1000 (Ethernet) RX packets 89417814 bytes 12088012200 (11.2 GiB) RX errors 0 dropped 82 overruns 0 frame 0 TX packets 32330647 bytes 3168352394 (2.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@deepspace ~]# ovs-vsctl show 54f89e00-edd2-486e-9626-6d11c7d8b0b6 Bridge "br0" Port "vport1" Interface "vport1" Port "br0" Interface "br0" type: internal Port vtep Interface vtep type: vxlan options: {key=flow, remote_ip="192.168.1.141"} Port "vport0" Interface "vport0" Port "enp5s0" Interface "enp5s0" ovs_version: “2.6.1" [root@deepspace ~]# ovs-ofctl show br0 OFPT_FEATURES_REPLY (xid=0x2): dpid:00133b0f1301 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(enp5s0): addr:00:13:3b:0f:13:01 config: 0 state: 0 current:1GB-FD AUTO_NEG advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE AUTO_PAUSE_ASYM supported: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG speed: 1000 Mbps now, 1000 Mbps max 2(vport0): addr:fe:00:00:00:00:03 config: 0 state: 0 current:10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 3(vport1): addr:fe:00:00:00:00:04 config: 0 state: 0 current:10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 4(vtep): addr:aa:97:d2:a9:19:ed config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br0): addr:00:13:3b:0f:13:01 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 [root@deepspace ~]# ovs-ofctl --version ovs-ofctl (Open vSwitch) 2.6.1 OpenFlow versions 0x1:0x4 Please let us know if you need anything else to reproduce this. Thanks, Sagar. On Jan 18, 2017, at 1:19 PM, Ben Pfaff wrote: If you can come up with simple reproduction instructions that work for me, I'm happy to track this down. It's probably something very simple. On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote: This issue happened on our in-use systems and we were trying to find a way to move forward avoiding this issue so that we do not have to upgrade OVS on thousands of our hypervisors causing down time. Our debugging did help us avoid the issue for now by installing an explicit rule to to drop packets when there is no match and this issue is not seen over many hours of test runs. We will definitely run this test with latest version. But, will need more time since we are busy with our release related activities. Regards, Sagar. On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
Hi Ben, We could reproduce this with the latest version 2.6.1. When we compiled the code, we removed -O2 from CFLAGS. This seems to make it happen more frequently. With the following script running, the error starts happening within a few seconds and then continues to happen after every few seconds. In summary, our suspicion is that having no controller set and having no NORMAL processing flow seems to trigger the stack that was pointed out in the gdb session more often and hence we hit this race condition very easily using the following script. (Even if there is a default NORMAL processing flow entry, after the first iteration of the script below, that will be deleted). Also, a few things about the setup - just in case: * enp5s0 belongs to the physical interface on this hypervisor. * vport0 and vport1 belong to tap interfaces corresponding to two VMs running on this hypervisor. * When the script was running, we had pings issued from the VMs so that packets make it to the bridge br0. Here is a small script that makes it happen on multiple of our hypervisors: while true do ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=2,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=3,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=4,actions=controller" ovs-ofctl del-flows br0 done Here is our bridge br0 setup: [root@deepspace ~]# ifconfig br0 br0: flags=4163mtu 1500 inet 192.168.2.142 netmask 255.255.255.0 broadcast 192.168.2.255 inet6 fe80::213:3bff:fe0f:1301 prefixlen 64 scopeid 0x20 ether 00:13:3b:0f:13:01 txqueuelen 1000 (Ethernet) RX packets 89417814 bytes 12088012200 (11.2 GiB) RX errors 0 dropped 82 overruns 0 frame 0 TX packets 32330647 bytes 3168352394 (2.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@deepspace ~]# ovs-vsctl show 54f89e00-edd2-486e-9626-6d11c7d8b0b6 Bridge "br0" Port "vport1" Interface "vport1" Port "br0" Interface "br0" type: internal Port vtep Interface vtep type: vxlan options: {key=flow, remote_ip="192.168.1.141"} Port "vport0" Interface "vport0" Port "enp5s0" Interface "enp5s0" ovs_version: “2.6.1" [root@deepspace ~]# ovs-ofctl show br0 OFPT_FEATURES_REPLY (xid=0x2): dpid:00133b0f1301 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(enp5s0): addr:00:13:3b:0f:13:01 config: 0 state: 0 current:1GB-FD AUTO_NEG advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE AUTO_PAUSE_ASYM supported: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG speed: 1000 Mbps now, 1000 Mbps max 2(vport0): addr:fe:00:00:00:00:03 config: 0 state: 0 current:10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 3(vport1): addr:fe:00:00:00:00:04 config: 0 state: 0 current:10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 4(vtep): addr:aa:97:d2:a9:19:ed config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br0): addr:00:13:3b:0f:13:01 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 [root@deepspace ~]# ovs-ofctl --version ovs-ofctl (Open vSwitch) 2.6.1 OpenFlow versions 0x1:0x4 Please let us know if you need anything else to reproduce this. Thanks, Sagar. > On Jan 18, 2017, at 1:19 PM, Ben Pfaff wrote: > > If you can come up with simple reproduction instructions that work for > me, I'm happy to track this down. It's probably something very simple. > > On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote: >> This issue happened on our in-use systems and we were trying to find a way >> to move forward avoiding this issue so that we do not have to upgrade OVS >> on thousands of our hypervisors causing down time. Our debugging did help >> us avoid the issue for now by installing an explicit rule to to drop >> packets when there is no match and this issue is not seen over many hours >> of test runs. >> >> We will definitely run this test with latest version. But, will need more >> time since we are busy with our release related activities. >> >> Regards, >> Sagar. >> >> On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff wrote: >> >>> It would be more helpful to have a simple reproduction case. >>> >>>
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
If you can come up with simple reproduction instructions that work for me, I'm happy to track this down. It's probably something very simple. On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote: > This issue happened on our in-use systems and we were trying to find a way > to move forward avoiding this issue so that we do not have to upgrade OVS > on thousands of our hypervisors causing down time. Our debugging did help > us avoid the issue for now by installing an explicit rule to to drop > packets when there is no match and this issue is not seen over many hours > of test runs. > > We will definitely run this test with latest version. But, will need more > time since we are busy with our release related activities. > > Regards, > Sagar. > > On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaffwrote: > > > It would be more helpful to have a simple reproduction case. > > > > Why haven't you tried a newer version from branch-2.5? > > > > On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote: > > > Hi Ben, > > > > > > Here i is more debug information related to this incident (still using > > version 2.5.0): > > > > > > Summary : > > > > > > We think that there is some race condition involved in processing OF > > Controller connections and Packet miss processing in ovs-vswitchd. > > > > > > Reasoning : > > > > > > Please consider the following GDB Debug Session: > > > > > > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, > > protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 > > > (gdb) f 2 > > > #2 0x0045f586 in connmgr_wants_packet_in_on_miss > > (mgr=0x16a6de0) at ofproto/connmgr.c:1613 > > > 1613 enum ofputil_protocol protocol = > > ofconn_get_protocol(ofconn); > > > (gdb) p *ofconn > > > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = > > 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = > > OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, > > > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > > > controller_id = 0, reply_counter = 0x1673190, master_async_config = > > {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, > > n_delete = 0, n_modify = 0, > > > first_op = -9223372036854775808, last_op = -9223372036854775808, > > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, > > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, > > > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = > > {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = > > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} > > > (gdb) bt > > > #0 ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) > > at ofproto/connmgr.c:999 > > > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > > ofproto/connmgr.c:982 > > > #2 0x0045f586 in connmgr_wants_packet_in_on_miss > > (mgr=0x16a6de0) at ofproto/connmgr.c:1613 > > > #3 0x00435261 in rule_dpif_lookup_from_table > > (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, > > stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, > > > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973 > > > #4 0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, > > xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188 > > > #5 0x004481b1 in revalidate_ukey (udpif=0x16a7300, > > ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, > > reval_seq=585728, recircs=0x7f2ace7f8a30) > > > at ofproto/ofproto-dpif-upcall.c:1866 > > > #6 0x00448fb2 in revalidate (revalidator=0x1691990) at > > ofproto/ofproto-dpif-upcall.c:2186 > > > #7 0x0044593e in udpif_revalidator (arg=0x1691990) at > > ofproto/ofproto-dpif-upcall.c:862 > > > #8 0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at > > lib/ovs-thread.c:340 > > > #9 0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/ > > libpthread.so.0 > > > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > > (gdb) f 1 > > > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > > ofproto/connmgr.c:982 > > > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, > > ofconn), > > > (gdb) l > > > 977 { > > > 978 if (ofconn->protocol == OFPUTIL_P_NONE && > > > 979 rconn_is_connected(ofconn->rconn)) { > > > 980 int version = rconn_get_version(ofconn->rconn); > > > 981 if (version > 0) { > > > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, > > ofconn), > > > 983 ofputil_protocol_from_ofp_ > > version(version)); > > > 984 } > > > 985 } > > > 986 > > > (gdb) p *ofconn > > > $3 = {node = {prev =
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
This issue happened on our in-use systems and we were trying to find a way to move forward avoiding this issue so that we do not have to upgrade OVS on thousands of our hypervisors causing down time. Our debugging did help us avoid the issue for now by installing an explicit rule to to drop packets when there is no match and this issue is not seen over many hours of test runs. We will definitely run this test with latest version. But, will need more time since we are busy with our release related activities. Regards, Sagar. On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaffwrote: > It would be more helpful to have a simple reproduction case. > > Why haven't you tried a newer version from branch-2.5? > > On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote: > > Hi Ben, > > > > Here i is more debug information related to this incident (still using > version 2.5.0): > > > > Summary : > > > > We think that there is some race condition involved in processing OF > Controller connections and Packet miss processing in ovs-vswitchd. > > > > Reasoning : > > > > Please consider the following GDB Debug Session: > > > > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, > protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 > > (gdb) f 2 > > #2 0x0045f586 in connmgr_wants_packet_in_on_miss > (mgr=0x16a6de0) at ofproto/connmgr.c:1613 > > 1613 enum ofputil_protocol protocol = > ofconn_get_protocol(ofconn); > > (gdb) p *ofconn > > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = > 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = > OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, > > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > > controller_id = 0, reply_counter = 0x1673190, master_async_config = > {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, > n_delete = 0, n_modify = 0, > > first_op = -9223372036854775808, last_op = -9223372036854775808, > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, > > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = > {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} > > (gdb) bt > > #0 ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) > at ofproto/connmgr.c:999 > > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > > #2 0x0045f586 in connmgr_wants_packet_in_on_miss > (mgr=0x16a6de0) at ofproto/connmgr.c:1613 > > #3 0x00435261 in rule_dpif_lookup_from_table > (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, > stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, > > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973 > > #4 0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, > xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188 > > #5 0x004481b1 in revalidate_ukey (udpif=0x16a7300, > ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, > reval_seq=585728, recircs=0x7f2ace7f8a30) > > at ofproto/ofproto-dpif-upcall.c:1866 > > #6 0x00448fb2 in revalidate (revalidator=0x1691990) at > ofproto/ofproto-dpif-upcall.c:2186 > > #7 0x0044593e in udpif_revalidator (arg=0x1691990) at > ofproto/ofproto-dpif-upcall.c:862 > > #8 0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at > lib/ovs-thread.c:340 > > #9 0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/ > libpthread.so.0 > > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) f 1 > > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, > ofconn), > > (gdb) l > > 977 { > > 978 if (ofconn->protocol == OFPUTIL_P_NONE && > > 979 rconn_is_connected(ofconn->rconn)) { > > 980 int version = rconn_get_version(ofconn->rconn); > > 981 if (version > 0) { > > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, > ofconn), > > 983 ofputil_protocol_from_ofp_ > version(version)); > > 984 } > > 985 } > > 986 > > (gdb) p *ofconn > > $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = > 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = > OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, > > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > > controller_id = 0, reply_counter =
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
It would be more helpful to have a simple reproduction case. Why haven't you tried a newer version from branch-2.5? On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote: > Hi Ben, > > Here i is more debug information related to this incident (still using > version 2.5.0): > > Summary : > > We think that there is some race condition involved in processing OF > Controller connections and Packet miss processing in ovs-vswitchd. > > Reasoning : > > Please consider the following GDB Debug Session: > > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, > protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 > (gdb) f 2 > #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at > ofproto/connmgr.c:1613 > 1613 enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); > (gdb) p *ofconn > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, > next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, > band = OFPROTO_IN_BAND, enable_async_msgs = true, > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, > 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = > 0, n_modify = 0, > first_op = -9223372036854775808, last_op = -9223372036854775808, > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev > = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} > (gdb) bt > #0 ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at > ofproto/connmgr.c:999 > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at > ofproto/connmgr.c:1613 > #3 0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, > version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, > table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973 > #4 0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, > xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188 > #5 0x004481b1 in revalidate_ukey (udpif=0x16a7300, > ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, > reval_seq=585728, recircs=0x7f2ace7f8a30) > at ofproto/ofproto-dpif-upcall.c:1866 > #6 0x00448fb2 in revalidate (revalidator=0x1691990) at > ofproto/ofproto-dpif-upcall.c:2186 > #7 0x0044593e in udpif_revalidator (arg=0x1691990) at > ofproto/ofproto-dpif-upcall.c:862 > #8 0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at > lib/ovs-thread.c:340 > #9 0x7f2ad75c2184 in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > (gdb) f 1 > #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at > ofproto/connmgr.c:982 > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), > (gdb) l > 977 { > 978 if (ofconn->protocol == OFPUTIL_P_NONE && > 979 rconn_is_connected(ofconn->rconn)) { > 980 int version = rconn_get_version(ofconn->rconn); > 981 if (version > 0) { > 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), > 983 > ofputil_protocol_from_ofp_version(version)); > 984 } > 985 } > 986 > (gdb) p *ofconn > $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, > next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, > band = OFPROTO_IN_BAND, enable_async_msgs = true, > role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, > controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, > 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = > 0, n_modify = 0, > first_op = -9223372036854775808, last_op = -9223372036854775808, > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev > = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} > (gdb) p ofconn > $4 = (const struct ofconn *) 0x16d5810 > (gdb) c > Continuing. > [Thread 0x7f2ad79f5980 (LWP 20165) exited] > >
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
Hi Ben, Here i is more debug information related to this incident (still using version 2.5.0): Summary : We think that there is some race condition involved in processing OF Controller connections and Packet miss processing in ovs-vswitchd. Reasoning : Please consider the following GDB Debug Session: Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 (gdb) f 2 #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at ofproto/connmgr.c:1613 1613enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); (gdb) p *ofconn $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, n_modify = 0, first_op = -9223372036854775808, last_op = -9223372036854775808, next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} (gdb) bt #0 ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999 #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at ofproto/connmgr.c:982 #2 0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at ofproto/connmgr.c:1613 #3 0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, honor_table_miss=true) at ofproto/ofproto-dpif.c:3973 #4 0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188 #5 0x004481b1 in revalidate_ukey (udpif=0x16a7300, ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, reval_seq=585728, recircs=0x7f2ace7f8a30) at ofproto/ofproto-dpif-upcall.c:1866 #6 0x00448fb2 in revalidate (revalidator=0x1691990) at ofproto/ofproto-dpif-upcall.c:2186 #7 0x0044593e in udpif_revalidator (arg=0x1691990) at ofproto/ofproto-dpif-upcall.c:862 #8 0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at lib/ovs-thread.c:340 #9 0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) f 1 #1 0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at ofproto/connmgr.c:982 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), (gdb) l 977 { 978 if (ofconn->protocol == OFPUTIL_P_NONE && 979 rconn_is_connected(ofconn->rconn)) { 980 int version = rconn_get_version(ofconn->rconn); 981 if (version > 0) { 982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn), 983 ofputil_protocol_from_ofp_version(version)); 984 } 985 } 986 (gdb) p *ofconn $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true, role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, n_modify = 0, first_op = -9223372036854775808, last_op = -9223372036854775808, next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}} (gdb) p ofconn $4 = (const struct ofconn *) 0x16d5810 (gdb) c Continuing. [Thread 0x7f2ad79f5980 (LWP 20165) exited] >From the above GDB Session, ovs-vswitchd is in the middle of processing a >packet miss that was read from the data path. The break point was set inside ofconn_set_protocol inside so that we hit if protocol was already set to other than OFPUTIL_P_NONE and now is being set to OFPUTIL_P_OF10_STD. Yes, we modified the code in ofconn_set_protocol with this if
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
Thanks for the quick follow up Ben, So we'll indeed try against latest versions to rule out the possibility of a bug that has been fixed already although I could not find any commit with such mention. We'll report back here. At this moment, we can reproduce over and over within minutes. We've nailed down -- or at least we think -- to something related to race condition or memory overwrite between the time connection negotiation happens and the flow mod packet arrives. We've been able to run the stress test for hours when we used --flow-format=NXM+table_id as an argument to ovs-ofctl but eventually, we've hit the same error. Sagar is spending more time debugging this issue so maybe he'll be able to provide more information. On Fri, Jan 13, 2017 at 1:45 PM, Ben Pfaffwrote: > On Thu, Jan 12, 2017 at 03:54:42PM -0500, Samuel Jean via dev wrote: > > It seems that shelling out to ovs-ofctl very quickly can lead to bug > where > > it reports an OFPT_ERROR. > > > > We were able to constantly reproduce within minutes of running the above > > flow modifications on Unbutu. > > > > Any help, hints or guidance would be appreciated. I'd be happy to pursue > > some debugging that would be required to nail down the issue here. > > Thanks for the bug report and especially for the detailed reproduction > advice. > > I've now tried running this reproduction case against Open vSwitch from > latest master and against the latest versions from the 2.6.x and 2.5.x > branches, and I can't see any failures even after letting the script run > for a few minutes. > > Maybe you should try 2.5.1 or the latest from branch-2.5 and see if it > fixes the problem? And if not, then we'll have to figure out what's > different between your setup and mine. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
On Thu, Jan 12, 2017 at 03:54:42PM -0500, Samuel Jean via dev wrote: > It seems that shelling out to ovs-ofctl very quickly can lead to bug where > it reports an OFPT_ERROR. > > We were able to constantly reproduce within minutes of running the above > flow modifications on Unbutu. > > Any help, hints or guidance would be appreciated. I'd be happy to pursue > some debugging that would be required to nail down the issue here. Thanks for the bug report and especially for the detailed reproduction advice. I've now tried running this reproduction case against Open vSwitch from latest master and against the latest versions from the 2.6.x and 2.5.x branches, and I can't see any failures even after letting the script run for a few minutes. Maybe you should try 2.5.1 or the latest from branch-2.5 and see if it fixes the problem? And if not, then we'll have to figure out what's different between your setup and mine. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
Hi, It seems that shelling out to ovs-ofctl very quickly can lead to bug where it reports an OFPT_ERROR. We were able to constantly reproduce within minutes of running the above flow modifications on Unbutu. Any help, hints or guidance would be appreciated. I'd be happy to pursue some debugging that would be required to nail down the issue here. Best regards, Sam Jean # cat ./ofpcrash.sh #!/bin/sh ovs-ofctl add-flow br0 'priority=100,table=25,idle_timeout=0,actions=resubmit(,35)' || exit 1 ovs-ofctl add-flow br0 'priority=100,table=35,idle_timeout=0,actions=resubmit(,45)' || exit 1 ovs-ofctl add-flow br0 'priority=100,table=45,idle_timeout=0,actions=resubmit(,50)' || exit 1 ovs-ofctl add-flow br0 'priority=100,table=50,idle_timeout=0,actions=resubmit(,65)' || exit 1 ovs-ofctl add-flow br0 'priority=100,table=65,idle_timeout=0,actions=output:1' || exit 1 ovs-ofctl add-flow br0 'priority=3000,ip,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,table=0,idle_timeout=0,actions=drop' || exit 1 ovs-ofctl add-flow br0 'priority=1000,ip,in_port=1,dl_dst=0c:01:00:12:cf:01,table=0,idle_timeout=0,cookie=20250774994944,actions=resubmit(,25)' || exit 1 # while true; do ./ofpcrash.sh || break; done OFPT_ERROR (xid=0x4): OFPFMFC_BAD_COMMAND OFPT_FLOW_MOD (xid=0x4): (***truncated to 64 bytes from 88***) 01 0e 00 58 00 00 00 04-00 38 20 ff 00 00 00 00 |...X.8 .| 0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 || 0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 || 0030 00 00 00 00 00 00 00 00-32 00 00 00 00 00 00 64 |2..d| # ovs-ofctl --version ovs-ofctl (Open vSwitch) 2.5.0 Compiled Sep 15 2016 12:55:18 OpenFlow versions 0x1:0x4 # uname -srmvp Linux 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev