Hello,

does anyone have an idea what the following failure could be caused by? In 
summary: guest VMs connected to a tenant network are receiving bogus ARP 
responses. These are mapping unused IP addresses to virtual bridge ports 
belonging to other ports on the same compute host.

We are using Kilo openvswitch-agent with ml2 plugin.

Please have a look at the following example. A VM with the fixed-ip 
192.168.1.15 reports the following ARP cache:

   root@michael-test2:~# arp
   Address HWtype HWaddress Flags Mask Iface
   host-192-168-1-2.openst ether fa:16:3e:de:ab:ea C eth0
   192.168.1.13 ether a6:b2:dc:d8:39:c1 C eth0
   192.168.1.119 (incomplete) eth0
   host-192-168-1-20.opens ether fa:16:3e:76:43:ce C eth0
   host-192-168-1-19.opens ether fa:16:3e:0d:a6:0b C eth0
   host-192-168-1-1.openst ether fa:16:3e:2a:81:ff C eth0
   192.168.1.14 ether 0e:bf:04:b7:ed:52 C eth0

Please note that both 192.168.1.13 and 192.168.1.14 are not in use in this 
subnet. The displayed MAC addresses a6:b2:dc:d8:39:c1 and 0e:bf:04:b7:ed:52 
actually belong to other instance qbr* and qvb* devices, living on their 
respective hypervisor hosts!

Looking at 0e:bf:04:b7:ed:52, for example, yields

   # ip link list | grep -C1 -e 0e:bf:04:b7:ed:52
   59: qbr9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP mode DEFAULT group default
       link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
   60: qvo9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc 
pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
   --
   61: qvb9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc 
pfifo_fast master qbr9ac24ac1-e1 state UP mode DEFAULT group default qlen 1000
       link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
   62: tap9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
pfifo_fast master qbr9ac24ac1-e1 state UNKNOWN mode DEFAULT group default qlen 
500

on the compute node. Using tcpdump on qbr9ac24ac1-e1 on the host and triggering 
a fresh ARM lookup on the guest VM results in

   # tcpdump -i qbr9ac24ac1-e1 -vv -l | grep ARP
   tcpdump: WARNING: qbr9ac24ac1-e1: no IPv4 address assigned
   tcpdump: listening on qbr9ac24ac1-e1, link-type EN10MB (Ethernet), capture 
size 65535 bytes
   14:00:32.089726 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
192.168.1.14 tell 192.168.1.15, length 28
   14:00:32.089740 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 
is-at 0e:bf:04:b7:ed:52 (oui Unknown), length 28
   14:00:32.090141 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 
is-at 7a:a5:71:63:47:94 (oui Unknown), length 28
   14:00:32.090160 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 
is-at 02:f9:33:d5:04:0d (oui Unknown), length 28
   14:00:32.090168 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 
is-at 9a:a0:46:e4:03:06 (oui Unknown), length 28

As you can see there are four different devices claiming to own the unused IP 
address! Looking them up in neutron shows they are all related to existing 
ports on the subnet, but different ones:

   # neutron port-list | grep -e 47fbb8b5-55 -e 46647cca-32 -e e9e2d7c3-7e -e 
9ac24ac1-e1
   | 46647cca-3293-42ea-8ec2-0834e19422fa | | fa:16:3e:7d:9c:45 | {"subnet_id": 
"25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.8"} |
   | 47fbb8b5-5549-46e4-850e-bd382375e0f8 | | fa:16:3e:fa:df:32 | {"subnet_id": 
"25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.7"} |
   | 9ac24ac1-e157-484e-b6a2-a1dded4731ac | | fa:16:3e:2a:80:6b | {"subnet_id": 
"25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.15"} |
   | e9e2d7c3-7e58-4bc2-a25f-d48e658b2d56 | | fa:16:3e:0d:a6:0b | {"subnet_id": 
"25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.19"} |


Impact: Linux guest don't seem to suffer from bogus ARP entries, so the problem 
may not be noticed in a pure Linux environment. Windows guest do, however. They 
verify IP addresses offered by DHCP against ARP, and reject IP configuration in 
case of conflicts. In the example above any Windows VM offered 192.168.1.13 or 
192.168.1.14 will fail to configure its network interface. This is actually how 
we noticed the issue.

Cheers!
Michael
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : [email protected]
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to