We are also wondering if a packet arrives with a dest MAC not local to OVS, how will OVS treat it under the “normal” flow? Specifically, if the dest MAC that is arriving is the gateway of the server. As with the packet captures what we appear to be seeing is the switch does a unicast flood, where 00:00:0c:9f:f0:01 is the mac of the gateway and fa:16:e3:ad:e6:cf is the mac address of a VM that is not local to the server, that was received on the bonded interface. By looking at network spans, it looks like under ovs 2.5.1 this packet is flooded back out the bond port (the port that it came in on). Which is causing the switch to learn the mac on the new servers port.
I don’t have any captures of the previous behavior, but my guess is that the traffic was only flooded internally, and was not resent out of the bond port. This also happens when we change from using balance-slb to active-backup. ___________________________________________________________________ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: <[email protected]> on behalf of "Kris G. Lindgren" <[email protected]> Date: Thursday, February 23, 2017 at 6:08 PM To: "[email protected]" <[email protected]> Subject: [ovs-discuss] OVS 2.3.3 to OVS 2.5.1 Upgrade - now seeing random connectivity issues Hello all, Trying to track down a problem that started after a recent OVS update in our openstack environment. We updated from OVS 2.3.3 to OVS 2.5.1 and since then we have been having problems with servers and VM’s dropping off the network. In the switches we see a bunch of #mac_move notifications, sometimes upto 26k per second. Which causes the switch to go into defense mode and disable mac-learning. But most of the time we see only a few mac moves per minute. When we should be seeing exactly 0. Our networking team believes that what is happening is a temporary loop in the network or HV’s are somehow forwarding broadcast packets that they shouldn’t be. The only thing that we see around the time is the following: 2017 Feb 23 12:11:20 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po19 and port Po22 2017 Feb 23 12:11:21 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.3ead.e6cf in vlan 413 is flapping between port Po22 and port Po19 TCPDUMPS: 12:11:20.374794 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 12:11:20.374941 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 12:11:20.376145 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46 12:11:21.374628 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 12:11:21.375057 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 00:00:0c:9f:f0:01, length 46 12:11:22.374578 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q (0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 10.198.39.254 tell 10.198.38.178, length 46 Output of ovs-vsctl: ac83a7ff-0157-437c-bfba-8c038ec77c74 Bridge br-ext Port br-ext Interface br-ext type: internal Port "bond0" Interface "p3p1" Interface "p3p2" Port "mgmt0" Interface "mgmt0" type: internal Port "ext-vlan-215" tag: 215 Interface "ext-vlan-215" type: patch options: {peer="br215-ext"} Bridge br-int fail_mode: secure Port "int-br215" Interface "int-br215" type: patch options: {peer="phy-br215"} Port "qvo99ae272d-f8" tag: 1 Interface "qvo99ae272d-f8" Port "qvo1d5492c0-df" tag: 1 Interface "qvo1d5492c0-df" Port br-int Interface br-int type: internal Port "qvo6b7f3219-90" tag: 1 Interface "qvo6b7f3219-90" Port "qvo3b4f81ed-f4" tag: 1 Interface "qvo3b4f81ed-f4" Bridge "br215" Port "br215" Interface "br215" type: internal Port "phy-br215" Interface "phy-br215" type: patch options: {peer="int-br215"} Port "br215-ext" Interface "br215-ext" type: patch options: {peer="ext-vlan-215"} ovs_version: "2.5.1" # ovs-appctl bond/show ---- bond0 ---- bond_mode: balance-slb bond may use recirculation: no, Recirc-ID : -1 bond-hash-basis: 0 updelay: 0 ms downdelay: 0 ms next rebalance: 2426 ms lacp_status: negotiated active slave mac: 00:8c:fa:eb:2b:74(p3p1) slave p3p1: enabled active slave may_enable: true hash 140: 154 kB load slave p3p2: enabled may_enable: true hash 199: 69 kB load hash 220: 40 kB load hash 234: 21 kB load # ovs-appctl lacp/show ---- bond0 ---- status: active negotiated sys_id: 00:8c:fa:eb:2b:74 sys_priority: 65534 aggregation key: 9 lacp_time: slow slave: p3p1: current attached port_id: 9 port_priority: 65535 may_enable: true actor sys_id: 00:8c:fa:eb:2b:74 actor sys_priority: 65534 actor port_id: 9 actor port_priority: 65535 actor key: 9 actor state: activity aggregation synchronized collecting distributing partner sys_id: 02:1c:73:87:60:cd partner sys_priority: 32768 partner port_id: 52 partner port_priority: 32768 partner key: 52 partner state: activity aggregation synchronized collecting distributing slave: p3p2: current attached port_id: 10 port_priority: 65535 may_enable: true actor sys_id: 00:8c:fa:eb:2b:74 actor sys_priority: 65534 actor port_id: 10 actor port_priority: 65535 actor key: 9 actor state: activity aggregation synchronized collecting distributing partner sys_id: 02:1c:73:87:60:cd partner sys_priority: 32768 partner port_id: 32820 partner port_priority: 32768 partner key: 52 partner state: activity aggregation synchronized collecting distributing The server is connected to a nexus 3000 switch with vPC enabled, we are configured as lacp with balance-slb mode. Mgmt0 has the HV’s management IP assigned to it. We create the br<vlan> bridges and add the patch ports between br-ext and br-vlan. Neutron openvsiwtch agent configures br-int and adds the patch ports between br<vlan> and br-int. Along with any creating any tap devices. The configured openflow entries for each bridge are as follows: # ovs-ofctl dump-flows br-ext NXST_FLOW reply (xid=0x4): cookie=0x0, duration=713896.614s, table=0, n_packets=1369078301, n_bytes=130805436786, idle_age=0, hard_age=65534, priority=0 actions=NORMAL # ovs-ofctl dump-flows br-int NXST_FLOW reply (xid=0x4): cookie=0xb367eed8ac0e9e7d, duration=713933.475s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=2,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713932.943s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=3,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713929.414s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=5,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713928.888s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,icmp6,in_port=4,icmp_type=136 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713933.280s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,arp,in_port=2 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713932.660s, table=0, n_packets=149398, n_bytes=6274716, idle_age=4, hard_age=65534, priority=10,arp,in_port=3 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713929.218s, table=0, n_packets=102577, n_bytes=4308234, idle_age=7, hard_age=65534, priority=10,arp,in_port=5 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713928.620s, table=0, n_packets=61321, n_bytes=2575482, idle_age=8, hard_age=65534, priority=10,arp,in_port=4 actions=resubmit(,24) cookie=0xb367eed8ac0e9e7d, duration=713935.656s, table=0, n_packets=1274428312, n_bytes=105873932966, idle_age=0, hard_age=65534, priority=3,in_port=1,vlan_tci=0x0000 actions=mod_vlan_vid:1,NORMAL cookie=0xb367eed8ac0e9e7d, duration=713945.070s, table=0, n_packets=7817, n_bytes=707680, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop cookie=0xb367eed8ac0e9e7d, duration=713945.999s, table=0, n_packets=82510417, n_bytes=17955154731, idle_age=0, hard_age=65534, priority=0 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713945.936s, table=23, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop cookie=0xb367eed8ac0e9e7d, duration=713933.544s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=2,icmp_type=136,nd_target=fe80::f816:3eff:fe49:4dff actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713933.009s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=3,icmp_type=136,nd_target=fe80::f816:3eff:fec7:82b9 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713929.482s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=5,icmp_type=136,nd_target=fe80::f816:3eff:fe07:d92e actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.951s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,icmp6,in_port=4,icmp_type=136,nd_target=fe80::f816:3eff:fe17:9919 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713933.410s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.87.153 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713933.344s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=2,arp_spa=10.26.52.87 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713932.877s, table=24, n_packets=149394, n_bytes=6274548, idle_age=4, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.53.163 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713932.807s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.208 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713932.728s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=3,arp_spa=10.26.85.209 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713929.349s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.85.218 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713929.284s, table=24, n_packets=102573, n_bytes=4308066, idle_age=7, hard_age=65534, priority=2,arp,in_port=5,arp_spa=10.26.53.86 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.817s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.87.99 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.752s, table=24, n_packets=61317, n_bytes=2575314, idle_age=8, hard_age=65534, priority=2,arp,in_port=4,arp_spa=10.26.53.197 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713928.686s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=2,arp,in_port=4,arp_spa=198.71.248.104 actions=NORMAL cookie=0xb367eed8ac0e9e7d, duration=713945.871s, table=24, n_packets=16, n_bytes=672, idle_age=65534, hard_age=65534, priority=0 actions=drop Has any changes been made with LACP and handling of OVS NORMAL flows/mac learning and flooding that would cause it to flood a packet back out to the switch on which it was received on? That’s the only thing that we can think of that is causing this to happen. ___________________________________________________________________ Kris Lindgren Senior Linux Systems Engineer GoDaddy
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
