Public bug reported: Summary:
When a VM has a Floating IP, any attempt to reach a routed network results in an ARP being sent instead of the traffic being sent to the Gateway. Description: I have two VM's: $ openstack server list -f yaml - Flavor: '' ID: f875fc7c-f743-4234-8ccb-c03f6ae66289 Image: Fedora_32 Name: fedora_no_fip Networks: infra_external=172.20.10.201 Status: ACTIVE - Flavor: '' ID: 4dd45015-9ad6-4388-b458-3128cbdd784b Image: Fedora_32 Name: fedora_test Networks: infra_internal=192.168.10.102, 172.20.10.107 Status: ACTIVE The one without the FIP can reach anything fine. For example, ping 1.1.1.1: [root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.201 -nevvv tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 22:52:13.970470 P fa:16:3e:47:ee:dd ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 59289, offset 0, flags [DF], proto ICMP (1), length 84) 172.20.10.201 > 1.1.1.1: ICMP echo request, id 1, seq 36, length 64 22:52:13.978619 P 00:e0:67:15:cc:2f ethertype 802.1Q (0x8100), length 104: vlan 4, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 38296, offset 0, flags [none], proto ICMP (1), length 84) 1.1.1.1 > 172.20.10.201: ICMP echo reply, id 1, seq 36, length 64 But, when I try the same from the VM with the Floating IP, I can see that an ARP is being sent for 1.1.1.1: [root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.107 -nevvv tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 22:55:42.779383 B fa:16:3e:d7:80:3a ethertype 802.1Q (0x8100), length 48: vlan 4, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28 22:55:42.779476 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28 22:55:42.779510 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28 The router has the gateway network set: $ openstack router show infra_r1 -f yaml admin_state_up: true availability_zone_hints: null availability_zones: null created_at: '2020-05-27T11:43:43Z' description: '' external_gateway_info: enable_snat: true external_fixed_ips: - ip_address: 172.20.10.118 subnet_id: bf21b56a-65c4-49fb-b345-b804c0429167 network_id: 2561f8db-e1c8-4185-9056-0883686a8a53 flavor_id: null id: 15c1b81d-b833-4d34-b622-4c6a0bd6c0d7 interfaces_info: - ip_address: 192.168.10.1 port_id: 65a28088-761c-461c-912c-7d0a3781ab6b subnet_id: 27382151-dbcc-4356-a080-47e181414e0b location: cloud: '' project: domain_id: null domain_name: Default id: 0e446e02e899455193635c877772fae7 name: admin region_name: regionOne zone: null name: infra_r1 project_id: 0e446e02e899455193635c877772fae7 revision_number: 3 routes: [] status: ACTIVE tags: [] updated_at: '2020-05-27T11:44:05Z' Reproducer for me has been: 1. Deploy OpenStack with OVN DVR (Using TripleO, so the settings by default here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/environments/services/neutron-ovn-dvr-ha.yaml) 2. Create an external network that is a VLAN: $ openstack network show infra_external -f yaml admin_state_up: true availability_zone_hints: [] availability_zones: [] created_at: '2020-05-27T11:43:24Z' description: '' dns_domain: '' id: 2561f8db-e1c8-4185-9056-0883686a8a53 ipv4_address_scope: null ipv6_address_scope: null is_default: false is_vlan_transparent: null location: cloud: '' project: domain_id: null domain_name: Default id: 0e446e02e899455193635c877772fae7 name: admin region_name: regionOne zone: null mtu: 9000 name: infra_external port_security_enabled: true project_id: 0e446e02e899455193635c877772fae7 provider:network_type: vlan provider:physical_network: datacentre provider:segmentation_id: 4 qos_policy_id: null revision_number: 2 router:external: true segments: null shared: false status: ACTIVE subnets: - bf21b56a-65c4-49fb-b345-b804c0429167 tags: [] updated_at: '2020-05-27T11:43:30Z' 3. Subnet with the corresponding details: $ openstack subnet show infra_external_subnet -f yaml allocation_pools: - end: 172.20.10.250 start: 172.20.10.70 cidr: 172.20.0.0/16 created_at: '2020-05-27T11:43:30Z' description: '' dns_nameservers: - 8.8.8.8 dns_publish_fixed_ip: null enable_dhcp: true gateway_ip: 172.20.0.254 host_routes: [] id: bf21b56a-65c4-49fb-b345-b804c0429167 ip_version: 4 ipv6_address_mode: null ipv6_ra_mode: null location: cloud: '' project: domain_id: null domain_name: Default id: 0e446e02e899455193635c877772fae7 name: admin region_name: regionOne zone: null name: infra_external_subnet network_id: 2561f8db-e1c8-4185-9056-0883686a8a53 prefix_length: null project_id: 0e446e02e899455193635c877772fae7 revision_number: 0 segment_id: null service_types: [] subnetpool_id: null tags: [] updated_at: '2020-05-27T11:43:30Z' 4. Internal network and a router, with the infra_external network set as the gateway (output provided earlier) 5. Create two VM's, one with a FIP and one directly attached to infra_external 6. Try to ping anything that would need to be routed by the gateway for infra_external_subnet: gateway_ip: 172.20.0.254 I can ping that gateway fine, it's just when the traffic would need to be routed by 172.20.0.254 that we have an issue. Versions: $ cat /etc/redhat-release CentOS Linux release 8.1.1911 (Core) # rpm -qa | grep ovn ovn-20.03.0-2.el8.x86_64 puppet-ovn-17.0.0-0.20200515234945.1d4c0ad.el8.noarch ovn-host-20.03.0-2.el8.x86_64 $ rpm -qa | grep tripleo-heat-templates openstack-tripleo-heat-templates-12.2.1-0.20200504123937.29a7fb8.el8.noarch For the containers, I'm just using current-tripleo, but let me know if there is something else specific that I can get for you: # podman image list | egrep 'ovn|neutron' docker.io/tripleomaster/centos-binary-nova-novncproxy current-tripleo 544acd4346da 9 days ago 1.22 GB docker.io/tripleomaster/centos-binary-neutron-server current-tripleo f19e459a94fd 9 days ago 1.19 GB docker.io/tripleomaster/centos-binary-ovn-northd current-tripleo 8291433d7448 9 days ago 852 MB docker.io/tripleomaster/centos-binary-ovn-northd pcmklatest 8291433d7448 9 days ago 852 MB docker.io/tripleomaster/centos-binary-ovn-controller current-tripleo e8efc9a55bb2 9 days ago 734 MB I'll share some ovn-trace outputs in the comments. This is getting a bit lengthy. Expected Results: OVN shouldn't send an ARP for a routed network. Severity for me is not very high. It's just a home lab, but if there is a wider issue it could be a problem. ** Affects: neutron Importance: Undecided Status: New ** Tags: ovn ** Attachment added: "Logic Flows" https://bugs.launchpad.net/bugs/1881041/+attachment/5377591/+files/logic_flows -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1881041 Title: OVN Router sending ARP instead of sending traffic to the gateway Status in neutron: New Bug description: Summary: When a VM has a Floating IP, any attempt to reach a routed network results in an ARP being sent instead of the traffic being sent to the Gateway. Description: I have two VM's: $ openstack server list -f yaml - Flavor: '' ID: f875fc7c-f743-4234-8ccb-c03f6ae66289 Image: Fedora_32 Name: fedora_no_fip Networks: infra_external=172.20.10.201 Status: ACTIVE - Flavor: '' ID: 4dd45015-9ad6-4388-b458-3128cbdd784b Image: Fedora_32 Name: fedora_test Networks: infra_internal=192.168.10.102, 172.20.10.107 Status: ACTIVE The one without the FIP can reach anything fine. For example, ping 1.1.1.1: [root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.201 -nevvv tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 22:52:13.970470 P fa:16:3e:47:ee:dd ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 59289, offset 0, flags [DF], proto ICMP (1), length 84) 172.20.10.201 > 1.1.1.1: ICMP echo request, id 1, seq 36, length 64 22:52:13.978619 P 00:e0:67:15:cc:2f ethertype 802.1Q (0x8100), length 104: vlan 4, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 38296, offset 0, flags [none], proto ICMP (1), length 84) 1.1.1.1 > 172.20.10.201: ICMP echo reply, id 1, seq 36, length 64 But, when I try the same from the VM with the Floating IP, I can see that an ARP is being sent for 1.1.1.1: [root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.107 -nevvv tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 22:55:42.779383 B fa:16:3e:d7:80:3a ethertype 802.1Q (0x8100), length 48: vlan 4, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28 22:55:42.779476 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28 22:55:42.779510 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28 The router has the gateway network set: $ openstack router show infra_r1 -f yaml admin_state_up: true availability_zone_hints: null availability_zones: null created_at: '2020-05-27T11:43:43Z' description: '' external_gateway_info: enable_snat: true external_fixed_ips: - ip_address: 172.20.10.118 subnet_id: bf21b56a-65c4-49fb-b345-b804c0429167 network_id: 2561f8db-e1c8-4185-9056-0883686a8a53 flavor_id: null id: 15c1b81d-b833-4d34-b622-4c6a0bd6c0d7 interfaces_info: - ip_address: 192.168.10.1 port_id: 65a28088-761c-461c-912c-7d0a3781ab6b subnet_id: 27382151-dbcc-4356-a080-47e181414e0b location: cloud: '' project: domain_id: null domain_name: Default id: 0e446e02e899455193635c877772fae7 name: admin region_name: regionOne zone: null name: infra_r1 project_id: 0e446e02e899455193635c877772fae7 revision_number: 3 routes: [] status: ACTIVE tags: [] updated_at: '2020-05-27T11:44:05Z' Reproducer for me has been: 1. Deploy OpenStack with OVN DVR (Using TripleO, so the settings by default here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/environments/services/neutron-ovn-dvr-ha.yaml) 2. Create an external network that is a VLAN: $ openstack network show infra_external -f yaml admin_state_up: true availability_zone_hints: [] availability_zones: [] created_at: '2020-05-27T11:43:24Z' description: '' dns_domain: '' id: 2561f8db-e1c8-4185-9056-0883686a8a53 ipv4_address_scope: null ipv6_address_scope: null is_default: false is_vlan_transparent: null location: cloud: '' project: domain_id: null domain_name: Default id: 0e446e02e899455193635c877772fae7 name: admin region_name: regionOne zone: null mtu: 9000 name: infra_external port_security_enabled: true project_id: 0e446e02e899455193635c877772fae7 provider:network_type: vlan provider:physical_network: datacentre provider:segmentation_id: 4 qos_policy_id: null revision_number: 2 router:external: true segments: null shared: false status: ACTIVE subnets: - bf21b56a-65c4-49fb-b345-b804c0429167 tags: [] updated_at: '2020-05-27T11:43:30Z' 3. Subnet with the corresponding details: $ openstack subnet show infra_external_subnet -f yaml allocation_pools: - end: 172.20.10.250 start: 172.20.10.70 cidr: 172.20.0.0/16 created_at: '2020-05-27T11:43:30Z' description: '' dns_nameservers: - 8.8.8.8 dns_publish_fixed_ip: null enable_dhcp: true gateway_ip: 172.20.0.254 host_routes: [] id: bf21b56a-65c4-49fb-b345-b804c0429167 ip_version: 4 ipv6_address_mode: null ipv6_ra_mode: null location: cloud: '' project: domain_id: null domain_name: Default id: 0e446e02e899455193635c877772fae7 name: admin region_name: regionOne zone: null name: infra_external_subnet network_id: 2561f8db-e1c8-4185-9056-0883686a8a53 prefix_length: null project_id: 0e446e02e899455193635c877772fae7 revision_number: 0 segment_id: null service_types: [] subnetpool_id: null tags: [] updated_at: '2020-05-27T11:43:30Z' 4. Internal network and a router, with the infra_external network set as the gateway (output provided earlier) 5. Create two VM's, one with a FIP and one directly attached to infra_external 6. Try to ping anything that would need to be routed by the gateway for infra_external_subnet: gateway_ip: 172.20.0.254 I can ping that gateway fine, it's just when the traffic would need to be routed by 172.20.0.254 that we have an issue. Versions: $ cat /etc/redhat-release CentOS Linux release 8.1.1911 (Core) # rpm -qa | grep ovn ovn-20.03.0-2.el8.x86_64 puppet-ovn-17.0.0-0.20200515234945.1d4c0ad.el8.noarch ovn-host-20.03.0-2.el8.x86_64 $ rpm -qa | grep tripleo-heat-templates openstack-tripleo-heat-templates-12.2.1-0.20200504123937.29a7fb8.el8.noarch For the containers, I'm just using current-tripleo, but let me know if there is something else specific that I can get for you: # podman image list | egrep 'ovn|neutron' docker.io/tripleomaster/centos-binary-nova-novncproxy current-tripleo 544acd4346da 9 days ago 1.22 GB docker.io/tripleomaster/centos-binary-neutron-server current-tripleo f19e459a94fd 9 days ago 1.19 GB docker.io/tripleomaster/centos-binary-ovn-northd current-tripleo 8291433d7448 9 days ago 852 MB docker.io/tripleomaster/centos-binary-ovn-northd pcmklatest 8291433d7448 9 days ago 852 MB docker.io/tripleomaster/centos-binary-ovn-controller current-tripleo e8efc9a55bb2 9 days ago 734 MB I'll share some ovn-trace outputs in the comments. This is getting a bit lengthy. Expected Results: OVN shouldn't send an ARP for a routed network. Severity for me is not very high. It's just a home lab, but if there is a wider issue it could be a problem. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1881041/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp