Hello all,

Trying to track down a problem that started after a recent OVS update in our 
openstack environment.  We updated from OVS 2.3.3 to OVS 2.5.1 and since then 
we have been having problems with servers and VM’s dropping off the network.  
In the switches we see a bunch of #mac_move notifications, sometimes upto 26k 
per second.  Which causes the switch to go into defense mode and disable 
mac-learning.  But most of the time we see only a few mac moves per minute.  
When we should be seeing exactly 0.  Our networking team believes that what is 
happening is a temporary loop in the network or HV’s are somehow forwarding 
broadcast packets that they shouldn’t be.  The only thing that we see around 
the time is the following:
2017 Feb 23 12:11:20 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host 
fa16.3ead.e6cf in vlan 413 is flapping between port Po19 and port Po22
2017 Feb 23 12:11:21 lfassi0114-02 %FWM-6-MAC_MOVE_NOTIFICATION: Host 
fa16.3ead.e6cf in vlan 413 is flapping between port Po22 and port Po19

TCPDUMPS:
12:11:20.374794 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q 
(0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 
10.198.39.254 tell 10.198.38.178, length 46
12:11:20.374941 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q 
(0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 
10.198.39.254 tell 10.198.38.178, length 46
12:11:20.376145 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q 
(0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 
00:00:0c:9f:f0:01, length 46
12:11:21.374628 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q 
(0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 
10.198.39.254 tell 10.198.38.178, length 46
12:11:21.375057 00:00:0c:9f:f0:01 > fa:16:3e:ad:e6:cf, ethertype 802.1Q 
(0x8100), length 64: vlan 413, p 0, ethertype ARP, Reply 10.198.39.254 is-at 
00:00:0c:9f:f0:01, length 46
12:11:22.374578 fa:16:3e:ad:e6:cf > 00:00:0c:9f:f0:01, ethertype 802.1Q 
(0x8100), length 64: vlan 413, p 0, ethertype ARP, Request who-has 
10.198.39.254 tell 10.198.38.178, length 46

Output of ovs-vsctl:
ac83a7ff-0157-437c-bfba-8c038ec77c74
    Bridge br-ext
        Port br-ext
            Interface br-ext
                type: internal
        Port "bond0"
            Interface "p3p1"
            Interface "p3p2"
        Port "mgmt0"
            Interface "mgmt0"
                type: internal
        Port "ext-vlan-215"
            tag: 215
            Interface "ext-vlan-215"
                type: patch
                options: {peer="br215-ext"}
    Bridge br-int
        fail_mode: secure
        Port "int-br215"
            Interface "int-br215"
                type: patch
                options: {peer="phy-br215"}
        Port "qvo99ae272d-f8"
            tag: 1
            Interface "qvo99ae272d-f8"
        Port "qvo1d5492c0-df"
            tag: 1
            Interface "qvo1d5492c0-df"
        Port br-int
            Interface br-int
                type: internal
        Port "qvo6b7f3219-90"
            tag: 1
            Interface "qvo6b7f3219-90"
        Port "qvo3b4f81ed-f4"
            tag: 1
            Interface "qvo3b4f81ed-f4"
    Bridge "br215"
        Port "br215"
            Interface "br215"
                type: internal
        Port "phy-br215"
            Interface "phy-br215"
                type: patch
                options: {peer="int-br215"}
        Port "br215-ext"
            Interface "br215-ext"
                type: patch
                options: {peer="ext-vlan-215"}
    ovs_version: "2.5.1"

# ovs-appctl bond/show
---- bond0 ----
bond_mode: balance-slb
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 2426 ms
lacp_status: negotiated
active slave mac: 00:8c:fa:eb:2b:74(p3p1)

slave p3p1: enabled
                active slave
                may_enable: true
                hash 140: 154 kB load

slave p3p2: enabled
                may_enable: true
                hash 199: 69 kB load
                hash 220: 40 kB load
                hash 234: 21 kB load

# ovs-appctl lacp/show
---- bond0 ----
                status: active negotiated
                sys_id: 00:8c:fa:eb:2b:74
                sys_priority: 65534
                aggregation key: 9
                lacp_time: slow

slave: p3p1: current attached
                port_id: 9
                port_priority: 65535
                may_enable: true

                actor sys_id: 00:8c:fa:eb:2b:74
                actor sys_priority: 65534
                actor port_id: 9
                actor port_priority: 65535
                actor key: 9
                actor state: activity aggregation synchronized collecting 
distributing

                partner sys_id: 02:1c:73:87:60:cd
                partner sys_priority: 32768
                partner port_id: 52
                partner port_priority: 32768
                partner key: 52
                partner state: activity aggregation synchronized collecting 
distributing

slave: p3p2: current attached
                port_id: 10
                port_priority: 65535
                may_enable: true

                actor sys_id: 00:8c:fa:eb:2b:74
                actor sys_priority: 65534
                actor port_id: 10
                actor port_priority: 65535
                actor key: 9
                actor state: activity aggregation synchronized collecting 
distributing

                partner sys_id: 02:1c:73:87:60:cd
                partner sys_priority: 32768
                partner port_id: 32820
                partner port_priority: 32768
                partner key: 52
                partner state: activity aggregation synchronized collecting 
distributing

The server is connected to a nexus 3000 switch with vPC enabled, we are 
configured as lacp with balance-slb mode.  Mgmt0 has the HV’s management IP 
assigned to it.  We create the br<vlan> bridges and add the patch ports between 
br-ext and br-vlan.  Neutron openvsiwtch agent configures br-int and adds the 
patch ports between br<vlan> and br-int.  Along with any creating any tap 
devices.

The configured openflow entries for each bridge are as follows:
# ovs-ofctl dump-flows br-ext
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=713896.614s, table=0, n_packets=1369078301, 
n_bytes=130805436786, idle_age=0, hard_age=65534, priority=0 actions=NORMAL

# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0xb367eed8ac0e9e7d, duration=713933.475s, table=0, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=10,icmp6,in_port=2,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713932.943s, table=0, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=10,icmp6,in_port=3,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713929.414s, table=0, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=10,icmp6,in_port=5,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713928.888s, table=0, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=10,icmp6,in_port=4,icmp_type=136 actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713933.280s, table=0, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, priority=10,arp,in_port=2 
actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713932.660s, table=0, n_packets=149398, 
n_bytes=6274716, idle_age=4, hard_age=65534, priority=10,arp,in_port=3 
actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713929.218s, table=0, n_packets=102577, 
n_bytes=4308234, idle_age=7, hard_age=65534, priority=10,arp,in_port=5 
actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713928.620s, table=0, n_packets=61321, 
n_bytes=2575482, idle_age=8, hard_age=65534, priority=10,arp,in_port=4 
actions=resubmit(,24)
 cookie=0xb367eed8ac0e9e7d, duration=713935.656s, table=0, 
n_packets=1274428312, n_bytes=105873932966, idle_age=0, hard_age=65534, 
priority=3,in_port=1,vlan_tci=0x0000 actions=mod_vlan_vid:1,NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713945.070s, table=0, n_packets=7817, 
n_bytes=707680, idle_age=65534, hard_age=65534, priority=2,in_port=1 
actions=drop
 cookie=0xb367eed8ac0e9e7d, duration=713945.999s, table=0, n_packets=82510417, 
n_bytes=17955154731, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713945.936s, table=23, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0xb367eed8ac0e9e7d, duration=713933.544s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,icmp6,in_port=2,icmp_type=136,nd_target=fe80::f816:3eff:fe49:4dff 
actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713933.009s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,icmp6,in_port=3,icmp_type=136,nd_target=fe80::f816:3eff:fec7:82b9 
actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713929.482s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,icmp6,in_port=5,icmp_type=136,nd_target=fe80::f816:3eff:fe07:d92e 
actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.951s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,icmp6,in_port=4,icmp_type=136,nd_target=fe80::f816:3eff:fe17:9919 
actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713933.410s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,arp,in_port=2,arp_spa=10.26.87.153 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713933.344s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,arp,in_port=2,arp_spa=10.26.52.87 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713932.877s, table=24, n_packets=149394, 
n_bytes=6274548, idle_age=4, hard_age=65534, 
priority=2,arp,in_port=3,arp_spa=10.26.53.163 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713932.807s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,arp,in_port=3,arp_spa=10.26.85.208 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713932.728s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,arp,in_port=3,arp_spa=10.26.85.209 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713929.349s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,arp,in_port=5,arp_spa=10.26.85.218 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713929.284s, table=24, n_packets=102573, 
n_bytes=4308066, idle_age=7, hard_age=65534, 
priority=2,arp,in_port=5,arp_spa=10.26.53.86 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.817s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,arp,in_port=4,arp_spa=10.26.87.99 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.752s, table=24, n_packets=61317, 
n_bytes=2575314, idle_age=8, hard_age=65534, 
priority=2,arp,in_port=4,arp_spa=10.26.53.197 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713928.686s, table=24, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=2,arp,in_port=4,arp_spa=198.71.248.104 actions=NORMAL
 cookie=0xb367eed8ac0e9e7d, duration=713945.871s, table=24, n_packets=16, 
n_bytes=672, idle_age=65534, hard_age=65534, priority=0 actions=drop

Has any changes been made with LACP and handling of OVS NORMAL flows/mac 
learning and flooding that would cause it to flood a packet back out to the 
switch on which it was received on?  That’s the only thing that we can think of 
that is causing this to happen.


___________________________________________________________________
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to