Public bug reported:

**Environment**

Queens
OVSGTW DVR Mode: dvr_snat
CMP DVR Mode: dvr
No L3 HA

Use Case: Centralized FIPs (aka Floating IPs agains unbound ports)
https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Neutron-Port-Binding-and-Impact-of-unbound-ports-on-DVR-Routers-with-FloatingIP.pdf

**How to reproduce**

1. Create normally a VM

2. Create allowed-pair port against the VM port

openstack port list --server <server_name> # Get port id
openstack port create --security-group <sec_group> --fixed-ip 
subnet=<subnet>,ip-address=<ip_address> --network <network name> <port name>
openstack port set --allowed-address ip-address=<ip_address> <server port>

3. Assign floating ip to the port

openstack floating ip set --port <port_name> <floating_ip>

4. Inside the deployed VM create IP alias for the new ip address

ip addr add <ip_address>/24 dev ens3

5. Detect which gtw node is hosting the centralized fip

neutron l3-agent-list-hosting-router <router>

6. Perform manual failover

neutron l3-agent-router-remove <hosting-l3-agent> <router>
neutron l3-agent-router-add <new-l3-agent> <router>

(Or) Perform automatic failover

shutdown -h now (on hosting gtw)

7. Detect failover happened on new node

neutron l3-agent-list-hosting-router <router>

**Expected Result**

Connection to floating ip address recovers automatically

**Actual Result**

Connection does not recover. Reoccurrence is 100%

**How to recover**

Perform "neutron-l3-agent" restart on hosting node (after failover).
Recovers within few seconds.

systemctl restart neutron-l3-agent

**Additional information**

After failover the SNAT namespace does not include the sysctl rules that
should be added upon namespace creation. We have also confirmed that
fixing them manually also fixes the issue.

https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/namespaces.py#L91-L107

The following is the sysctl's after failover
---
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv4.ip_forward
net.ipv4.ip_forward = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv4.conf.all.arp_ignore
net.ipv4.conf.all.arp_ignore = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv4.conf.all.arp_announce
net.ipv4.conf.all.arp_announce = 0
root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv6.conf.all.forwarding
net.ipv6.conf.all.forwarding = 1
root@gtw03:~#
---

We are believe this caused by the following commits which only does 
initialization when neutron-l3-agent starts.
https://github.com/openstack/neutron/commit/9d5e80e935049d08e0fcefc0c823fb67c793a51b

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1881995

Title:
  Centralized SNAT failover does not recover until "systemctl restart
  neutron-l3-agent" on transferred node

Status in neutron:
  New

Bug description:
  **Environment**

  Queens
  OVSGTW DVR Mode: dvr_snat
  CMP DVR Mode: dvr
  No L3 HA

  Use Case: Centralized FIPs (aka Floating IPs agains unbound ports)
  
https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Neutron-Port-Binding-and-Impact-of-unbound-ports-on-DVR-Routers-with-FloatingIP.pdf

  **How to reproduce**

  1. Create normally a VM

  2. Create allowed-pair port against the VM port

  openstack port list --server <server_name> # Get port id
  openstack port create --security-group <sec_group> --fixed-ip 
subnet=<subnet>,ip-address=<ip_address> --network <network name> <port name>
  openstack port set --allowed-address ip-address=<ip_address> <server port>

  3. Assign floating ip to the port

  openstack floating ip set --port <port_name> <floating_ip>

  4. Inside the deployed VM create IP alias for the new ip address

  ip addr add <ip_address>/24 dev ens3

  5. Detect which gtw node is hosting the centralized fip

  neutron l3-agent-list-hosting-router <router>

  6. Perform manual failover

  neutron l3-agent-router-remove <hosting-l3-agent> <router>
  neutron l3-agent-router-add <new-l3-agent> <router>

  (Or) Perform automatic failover

  shutdown -h now (on hosting gtw)

  7. Detect failover happened on new node

  neutron l3-agent-list-hosting-router <router>

  **Expected Result**

  Connection to floating ip address recovers automatically

  **Actual Result**

  Connection does not recover. Reoccurrence is 100%

  **How to recover**

  Perform "neutron-l3-agent" restart on hosting node (after failover).
  Recovers within few seconds.

  systemctl restart neutron-l3-agent

  **Additional information**

  After failover the SNAT namespace does not include the sysctl rules
  that should be added upon namespace creation. We have also confirmed
  that fixing them manually also fixes the issue.

  
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/l3/namespaces.py#L91-L107

  The following is the sysctl's after failover
  ---
  root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv4.ip_forward
  net.ipv4.ip_forward = 0
  root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv4.conf.all.arp_ignore
  net.ipv4.conf.all.arp_ignore = 0
  root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv4.conf.all.arp_announce
  net.ipv4.conf.all.arp_announce = 0
  root@gtw03:~# ip netns exec snat-8737216a-b561-434f-a023-1d9cae2ce04e sysctl 
net.ipv6.conf.all.forwarding
  net.ipv6.conf.all.forwarding = 1
  root@gtw03:~#
  ---

  We are believe this caused by the following commits which only does 
initialization when neutron-l3-agent starts.
  
https://github.com/openstack/neutron/commit/9d5e80e935049d08e0fcefc0c823fb67c793a51b

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1881995/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to