No, we didn’t. We downgrade our env back down to ovs 2.3.3, then actively migrated everything from using OVS to using linuxbridge in neutron. I talked to Ben Pfaff at an openstack conference about this and he mentioned that there might be a possible bug around this in lacp/bonding and certain switch configs. If you can consistently reproduce the issue, I believe he was willing to help do some debugging to try to flush out the bug.
From: Ray Li <[email protected]> Date: Thursday, October 18, 2018 at 12:51 PM To: "[email protected]" <[email protected]> Cc: "Kris G. Lindgren" <[email protected]>, "[email protected]" <[email protected]> Subject: Re: [ovs-discuss] OVS 2.5.1 in an LACP bond does not correctly handle unicast flooding Hi Kris, did you ever fix this issue? We're also seeing similar problems running balance-tcp bonds on OVS 2.5.0. Thanks, Ray On Mon, Feb 27, 2017 at 4:37 AM O'Reilly, Darragh <[email protected]<mailto:[email protected]>> wrote: Hi, I’m also running Neutron provider networks on OVS2.5 (DPDK) with LACP (balance-tcp), but I do not see this problem. OVS should not output a packet to the bundle it came in on: https://github.com/openvswitch/ovs/blob/branch-2.5/ofproto/ofproto-dpif-xlate.c#L2321 I have no idea why it could be happening. But it does remind me of a problem with a buggy NIC firmware in a blade system that was reflecting some out-bound packets back in, and confusing the OVS learning tables. Try looking at watch –n1 “ovs-appctl fdb/show br-ext” Do the OVS logs have anything? You could try a Linux bond and see if it makes a difference. Darragh. From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Kris G. Lindgren Sent: 25 February 2017 17:36 To: [email protected]<mailto:[email protected]> Subject: [ovs-discuss] OVS 2.5.1 in an LACP bond does not correctly handle unicast flooding We recently upgraded from OVS 2.3.3 to OVS 2.5.1 After upgrading we started getting mac’s for VM’s and HV’s learned on ports that they were not connected to. After a long investigation we were able to see that OVS does not correctly handle unicast flooding. As we would see OVS flood traffic that was not destined to a local mac back out on of the bond members. In the switch we see: 2017 Feb 23 12:11:20 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po19 and port Po22 2017 Feb 23 12:11:21 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po22 and port Po19 On the host connected to Po22 (which is not where fa16.e3ad.e6cf lives we see: 12:11:20.374794 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 12:11:20.374941 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 12:11:20.376145 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46 12:11:21.374628 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 12:11:21.375057 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46 12:11:22.374578 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 By using a span port in the network spanning only traffic sent that is sent from the server we were also able to see that traffic destined to: 00:00:0c:9f:f0:01 was sent back out. In this case 00:00:0c:9f:f0:01 is the virtual mac of the HSRP gateway. Under cisco nexus 3k (I assume other nexus products as well) when configured with VPC/LACP/HSRP any traffic destined to the virtual mac of the hsrp gateway, that ends up on the non-active hsrp side, will get flooded to all ports on the non-active side. This is done so that arp packet is seen by the active side. This is how this config from cisco has worked since day one. We have also seen this happen in bursts where the switch will see 26k+ mac moves in a minute and go into defense mode and stop mac-learning. We haven’t been able to specifically catch a large storm event, but due to the way OVS is handling unicast flooding of ARP packets, we have no reason to believe it won’t treat unicast flooding of other traffic the exact same way. Under OVS 2.3.3 the unicast flooding behavior was correctly handled where it was correctly dropped and packets were not flooded back out the bond member. Changing bonding mode from balance-slb, to active-backup or balance-tcp makes no difference the unicast traffic is still flooded back out the bond. Our OVS config is as follows: ovs-vsctl: ac83a7ff-0157-437c-bfba-8c038ec77c74 Bridge br-ext Port br-ext Interface br-ext type: internal Port "bond0" Interface "p3p1" Interface "p3p2" Port "mgmt0" Interface "mgmt0" type: internal Port "ext-vlan-215" tag: 215 Interface "ext-vlan-215" type: patch options: {peer="br215-ext"} Bridge br-int fail_mode: secure Port "int-br215" Interface "int-br215" type: patch options: {peer="phy-br215"} Port "qvo99ae272d-f8" tag: 1 Interface "qvo99ae272d-f8" Port "qvo1d5492c0-df" tag: 1 Interface "qvo1d5492c0-df" Port br-int Interface br-int type: internal Port "qvo6b7f3219-90" tag: 1 Interface "qvo6b7f3219-90" Port "qvo3b4f81ed-f4" tag: 1 Interface "qvo3b4f81ed-f4" Bridge "br215" Port "br215" Interface "br215" type: internal Port "phy-br215" Interface "phy-br215" type: patch options: {peer="int-br215"} Port "br215-ext" Interface "br215-ext" type: patch options: {peer="ext-vlan-215"} ovs_version: "2.5.1" # ovs-appctl bond/show ---- bond0 ---- bond_mode: balance-slb bond may use recirculation: no, Recirc-ID : -1 bond-hash-basis: 0 updelay: 0 ms downdelay: 0 ms next rebalance: 2426 ms lacp_status: negotiated active slave mac: 00:8c:fa:eb:2b:74(p3p1) slave p3p1: enabled active slave may_enable: true hash 140: 154 kB load slave p3p2: enabled may_enable: true hash 199: 69 kB load hash 220: 40 kB load hash 234: 21 kB load # ovs-appctl lacp/show ---- bond0 ---- status: active negotiated sys_id: 00:8c:fa:eb:2b:74 sys_priority: 65534 aggregation key: 9 lacp_time: slow slave: p3p1: current attached port_id: 9 port_priority: 65535 may_enable: true actor sys_id: 00:8c:fa:eb:2b:74 actor sys_priority: 65534 actor port_id: 9 actor port_priority: 65535 actor key: 9 actor state: activity aggregation synchronized collecting distributing partner sys_id: 02:1c:73:87:60:cd partner sys_priority: 32768 partner port_id: 52 partner port_priority: 32768 partner key: 52 partner state: activity aggregation synchronized collecting distributing slave: p3p2: current attached port_id: 10 port_priority: 65535 may_enable: true actor sys_id: 00:8c:fa:eb:2b:74 actor sys_priority: 65534 actor port_id: 10 actor port_priority: 65535 actor key: 9 actor state: activity aggregation synchronized collecting distributing partner sys_id: 02:1c:73:87:60:cd partner sys_priority: 32768 partner port_id: 32820 partner port_priority: 32768 partner key: 52 partner state: activity aggregation synchronized collecting distributing # ovs-ofctl dump-flows br-ext NXST_FLOW reply (xid=0x4): cookie=0x0, duration=713896.614s, table=0, n_packets=1369078301, n_bytes=130805436786, idle_age=0, hard_age=65534, priority=0 actions=NORMAL # ovs-ofctl dump-flows br-int NXST_FLOW reply (xid=0x4): cookie=0xb367eed8ac0e9e7d, duration=713933.475s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=2,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713932.943s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=3,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713929.414s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=5,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713928.888s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=4,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713933.280s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,arp,in_port=2 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713932.660s, table=0, n_packets=149398, n_bytes=6274716, idle_age=4, hard_age=65534, priority=10,arp,in_port=3 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713929.218s, table=0, n_packets=102577, n_bytes=4308234, idle_age=7, hard_age=65534, priority=10,arp,in_port=5 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713928.620s, table=0, n_packets=61321, n_bytes=2575482, idle_age=8, hard_age=65534, priority=10,arp,in_port=4 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713935.656s, table=0, n_packets=1274428312, n_bytes=105873932966, idle_age=0, hard_age=65534, priority=3,in_port=1,vlan_tci=0x0000 actions=mod_vlan_vid:1,NORMAL cookie=0xb367eed8ac0e9e7d, duration=713945.070s, table=0, n_packets=7817, n_bytes=707680, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop cookie=0xb367eed8ac0e9e7d, duration=713945.999s, table=0, n_packets=82510417, n_bytes=17955154731, idle_age=0, hard_age=65534, priority=0 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713945.936s, table=23, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop cookie=0xb367eed8ac0e9e7d, duration=713933.544s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=2,icmp_type=136,nd_target=fe80::f816:3eff:fe49:4dff actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713933.009s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=3,icmp_type=136,nd_target=fe80::f816:3eff:fec7:82b9 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713929.482s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=5,icmp_type=136,nd_target=fe80::f816:3eff:fe07:d92e actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.951s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=4,icmp_type=136,nd_target=fe80::f816:3eff:fe17:9919 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713933.410s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.87.153 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713933.344s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.52.87 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713932.877s, table=24, n_packets=149394, n_bytes=6274548, idle_age=4, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.53.163 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713932.807s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.208 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713932.728s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.209 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713929.349s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.85.218 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713929.284s, table=24, n_packets=102573, n_bytes=4308066, idle_age=7, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.53.86 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.817s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.87.99 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.752s, table=24, n_packets=61317, n_bytes=2575314, idle_age=8, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.53.197 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.686s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=198.71.248.104 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713945.871s, table=24, n_packets=16, n_bytes=672, idle_age=65534, hard_age=65534, priority=0 actions=drop ___________________________________________________________________ Kris Lindgren Senior Linux Systems Engineer GoDaddy _______________________________________________ discuss mailing list [email protected]<mailto:[email protected]> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
