Public bug reported:
* We have the following setup:
- DVR, without HA disabled
- 3 controller nodes (network nodes)
- multiple computes
- public floating IP addresses
* What we trying to accomplish:
- Attach floating IPs
* Versions:
- neutron Pike (11.0.3)
- also same behavior neutron queens (12.0.3)
- openstack environment deployed via kolla-ansible
* The problem is:
when we create an instance with a VXLAN tenant network and attach a floating
IP, the floating IP is setup correctly in the qrouter namespace as an iptables
DNAT in the compute where the instance is running, however sometimes the
floating IP is also setup in the SNAT namespace of the network node, where the
centralized SNAT is setup.
This causes that access to the instance (ping/ssh) fails because traffic
request go through the SNAT namespace in the network node instead of
going through the qrouter namespace in the compute.
It's kind of a race condition, because a few times even if the FIP was
setup in both the SNAT namespace and the qrouter namespace, we can login
to the instance and we can see traffic goes only through the fip/qrouter
namespace in the compute.
* What we expect:
FIPs should not be setup in the SNAT namespace in the network node so we can
connect to our instances via ssh or ping.
* This is an example:
Floating IP is: X.Y.Z.169
1. Instance is active with a FIP
root@service001:/opt/kolla-configs# openstack server list --all | grep X.Y.Z.169
| 6d789f9f-4fc0-4725-9a26-a35f90ab1d2c | APP_server | ACTIVE |
asdadasd=10.10.10.7, X.Y.Z.169
2. We use a distributed router without HA
root@service001:/opt/kolla-configs# openstack router list --project
6898232aaee84941ab0de4f259771840
+--------------------------------------+----------------+--------+-------+-------------+-------+----------------------------------+
| ID | Name | Status | State |
Distributed | HA | Project |
+--------------------------------------+----------------+--------+-------+-------------+-------+----------------------------------+
| ee69bc58-1347-45de-abf6-4667d974fc9d | asdadasdad | ACTIVE | UP | True
| False | 6898232aaee84941ab0de4f259771840 |
3. We can see in the qrouter namespace in the compute that DNAT/SNAT were setup
correctly.
root@compute006:~# ip netns exec qrouter-ee69bc58-1347-45de-abf6-4667d974fc9d
iptables -L -t nat -n -v | grep X.Y.Z.169
0 0 DNAT all -- rfp-ee69bc58-1 * 0.0.0.0/0
X.Y.Z.169 to:10.10.10.7
0 0 SNAT all -- * * 10.10.10.7 0.0.0.0/0
to:X.Y.Z.169
4. However in the centralized SNAT namespace in the network node, the IP is
setup as an IP of the interface: qg-50f7f260-49:
root@controller001:~# ip netns exec snat-ee69bc58-1347-45de-abf6-4667d974fc9d
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2658: sg-c2a20a44-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default qlen 1
link/ether fa:16:3e:ec:47:4b brd ff:ff:ff:ff:ff:ff
inet 10.10.10.14/24 brd 10.10.10.255 scope global sg-c2a20a44-1b
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feec:474b/64 scope link
valid_lft forever preferred_lft forever
2663: qg-50f7f260-49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default qlen 1
link/ether fa:16:3e:e6:a3:7b brd ff:ff:ff:ff:ff:ff
inet X.Y.Z.27/24 brd X.Y.Z.255 scope global qg-50f7f260-49
valid_lft forever preferred_lft forever
inet X.Y.Z.32/32 brd X.Y.Z.32 scope global qg-50f7f260-49
valid_lft forever preferred_lft forever
inet X.Y.Z.169/32 brd X.Y.Z.169 scope global qg-50f7f260-49
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fee6:a37b/64 scope link
valid_lft forever preferred_lft forever
5. Afaik, the FIP shouldn't be setup in the SNAT namespace of the
network node, and I see in the logs is setup in both places:
neutron-server.log (network node)
2018-10-06 01:43:38.295 42 DEBUG neutron.db.l3_hamode_db
[req-ef937e08-7509-4623-9ac2-012734286462 - - - - -]
neutron.services.l3_router.l3_router_plugin.L3RouterPlugin method
_process_sync_ha_data called
with arguments
...
{'router_id': u'ee69bc58-1347-45de-abf6-4667d974fc9d', 'status': u'DOWN',
'description': u'', 'tags': [], 'updated_at': '2018-10-06T01:43:35Z',
'dns_domain': '', 'floating_network_id':
u'2f310092-1c75-4cb6-9758-edb13fb96d60', 'host': u'compute006',
'fixed_ip_address': u'10.10.10.7', 'floating_ip_address': u'X.Y.Z.169',
'revision_number': 0, 'port_id': u'95c43db6-9da0-4baa-ab78-020f09ce864d', 'id':
u'5882da22-aedc-4ffc-8ea7-36119d4841fd', 'dest_host': None, 'dns_name': '',
'created_at': '2018-10-06T01:43:35Z', 'tenant_id':
u'6898232aaee84941ab0de4f259771840', 'fixed_ip_address_scope': None,
'project_id': u'6898232aaee84941ab0de4f259771840'}
neutron-l3-agent.log (network node)
2018-10-06 01:43:48.207 14 DEBUG neutron.agent.linux.utils [-] Running command:
['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns',
'exec', 'snat-ee69bc58-1347-45de-abf6-4667d974fc9d', 'ip', '-4', 'addr', 'add',
'X.Y.Z.169/32', 'scope', 'global', 'dev', 'qg-50f7f260-49', 'brd', 'X.Y.Z.169']
create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
2018-10-06 01:43:50.555 14 DEBUG neutron.agent.linux.utils [-] Running command:
['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns',
'exec', 'snat-ee69bc58-1347-45de-abf6-4667d974fc9d', 'arping', '-U', '-I',
'qg-50f7f260-49', '-c', '1', '-w', '1.5', 'X.Y.Z.169'] create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
neutron-l3-agent.log (compute node)
2018-10-06 01:43:47.850 14 DEBUG neutron.agent.linux.utils [-] Running command:
['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns',
'exec', 'fip-2f310092-1c75-4cb6-9758-edb13fb96d60', 'ip', '-4', 'route',
'replace', 'X.Y.Z.169/32', 'via', '169.254.118.42', 'dev', 'fpr-ee69bc58-1']
create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
2018-10-06 01:43:48.516 14 DEBUG neutron.agent.linux.utils [-] Running command:
['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns',
'exec', 'fip-2f310092-1c75-4cb6-9758-edb13fb96d60', 'arping', '-U', '-I',
'fg-331574bc-69', '-c', '1', '-w', '1.5', 'X.Y.Z.169'] create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
I've noticed this happens specially when we deploy instances and attach
FIPs via Heat templates. Restarting the L3-agent service sometimes helps
to reduce this condition, but eventually we encounter this issue again.
** Affects: neutron
Importance: Undecided
Status: New
** Tags: l3-dvr-backlog
** Tags added: l3-dvr-backlog
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1796491
Title:
DVR Floating IP setup in the SNAT namespace of the network node and
also in the qrouter namespace in the compute node
Status in neutron:
New
Bug description:
* We have the following setup:
- DVR, without HA disabled
- 3 controller nodes (network nodes)
- multiple computes
- public floating IP addresses
* What we trying to accomplish:
- Attach floating IPs
* Versions:
- neutron Pike (11.0.3)
- also same behavior neutron queens (12.0.3)
- openstack environment deployed via kolla-ansible
* The problem is:
when we create an instance with a VXLAN tenant network and attach a floating
IP, the floating IP is setup correctly in the qrouter namespace as an iptables
DNAT in the compute where the instance is running, however sometimes the
floating IP is also setup in the SNAT namespace of the network node, where the
centralized SNAT is setup.
This causes that access to the instance (ping/ssh) fails because
traffic request go through the SNAT namespace in the network node
instead of going through the qrouter namespace in the compute.
It's kind of a race condition, because a few times even if the FIP was
setup in both the SNAT namespace and the qrouter namespace, we can
login to the instance and we can see traffic goes only through the
fip/qrouter namespace in the compute.
* What we expect:
FIPs should not be setup in the SNAT namespace in the network node so we can
connect to our instances via ssh or ping.
* This is an example:
Floating IP is: X.Y.Z.169
1. Instance is active with a FIP
root@service001:/opt/kolla-configs# openstack server list --all | grep
X.Y.Z.169
| 6d789f9f-4fc0-4725-9a26-a35f90ab1d2c | APP_server | ACTIVE |
asdadasd=10.10.10.7, X.Y.Z.169
2. We use a distributed router without HA
root@service001:/opt/kolla-configs# openstack router list --project
6898232aaee84941ab0de4f259771840
+--------------------------------------+----------------+--------+-------+-------------+-------+----------------------------------+
| ID | Name | Status | State |
Distributed | HA | Project |
+--------------------------------------+----------------+--------+-------+-------------+-------+----------------------------------+
| ee69bc58-1347-45de-abf6-4667d974fc9d | asdadasdad | ACTIVE | UP |
True | False | 6898232aaee84941ab0de4f259771840 |
3. We can see in the qrouter namespace in the compute that DNAT/SNAT were
setup correctly.
root@compute006:~# ip netns exec qrouter-ee69bc58-1347-45de-abf6-4667d974fc9d
iptables -L -t nat -n -v | grep X.Y.Z.169
0 0 DNAT all -- rfp-ee69bc58-1 * 0.0.0.0/0
X.Y.Z.169 to:10.10.10.7
0 0 SNAT all -- * * 10.10.10.7 0.0.0.0/0
to:X.Y.Z.169
4. However in the centralized SNAT namespace in the network node, the IP is
setup as an IP of the interface: qg-50f7f260-49:
root@controller001:~# ip netns exec snat-ee69bc58-1347-45de-abf6-4667d974fc9d
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2658: sg-c2a20a44-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN group default qlen 1
link/ether fa:16:3e:ec:47:4b brd ff:ff:ff:ff:ff:ff
inet 10.10.10.14/24 brd 10.10.10.255 scope global sg-c2a20a44-1b
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feec:474b/64 scope link
valid_lft forever preferred_lft forever
2663: qg-50f7f260-49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN group default qlen 1
link/ether fa:16:3e:e6:a3:7b brd ff:ff:ff:ff:ff:ff
inet X.Y.Z.27/24 brd X.Y.Z.255 scope global qg-50f7f260-49
valid_lft forever preferred_lft forever
inet X.Y.Z.32/32 brd X.Y.Z.32 scope global qg-50f7f260-49
valid_lft forever preferred_lft forever
inet X.Y.Z.169/32 brd X.Y.Z.169 scope global qg-50f7f260-49
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fee6:a37b/64 scope link
valid_lft forever preferred_lft forever
5. Afaik, the FIP shouldn't be setup in the SNAT namespace of the
network node, and I see in the logs is setup in both places:
neutron-server.log (network node)
2018-10-06 01:43:38.295 42 DEBUG neutron.db.l3_hamode_db
[req-ef937e08-7509-4623-9ac2-012734286462 - - - - -]
neutron.services.l3_router.l3_router_plugin.L3RouterPlugin method
_process_sync_ha_data called
with arguments
...
{'router_id': u'ee69bc58-1347-45de-abf6-4667d974fc9d', 'status': u'DOWN',
'description': u'', 'tags': [], 'updated_at': '2018-10-06T01:43:35Z',
'dns_domain': '', 'floating_network_id':
u'2f310092-1c75-4cb6-9758-edb13fb96d60', 'host': u'compute006',
'fixed_ip_address': u'10.10.10.7', 'floating_ip_address': u'X.Y.Z.169',
'revision_number': 0, 'port_id': u'95c43db6-9da0-4baa-ab78-020f09ce864d', 'id':
u'5882da22-aedc-4ffc-8ea7-36119d4841fd', 'dest_host': None, 'dns_name': '',
'created_at': '2018-10-06T01:43:35Z', 'tenant_id':
u'6898232aaee84941ab0de4f259771840', 'fixed_ip_address_scope': None,
'project_id': u'6898232aaee84941ab0de4f259771840'}
neutron-l3-agent.log (network node)
2018-10-06 01:43:48.207 14 DEBUG neutron.agent.linux.utils [-] Running
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip',
'netns', 'exec', 'snat-ee69bc58-1347-45de-abf6-4667d974fc9d', 'ip', '-4',
'addr', 'add', 'X.Y.Z.169/32', 'scope', 'global', 'dev', 'qg-50f7f260-49',
'brd', 'X.Y.Z.169'] create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
2018-10-06 01:43:50.555 14 DEBUG neutron.agent.linux.utils [-] Running
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip',
'netns', 'exec', 'snat-ee69bc58-1347-45de-abf6-4667d974fc9d', 'arping', '-U',
'-I', 'qg-50f7f260-49', '-c', '1', '-w', '1.5', 'X.Y.Z.169'] create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
neutron-l3-agent.log (compute node)
2018-10-06 01:43:47.850 14 DEBUG neutron.agent.linux.utils [-] Running
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip',
'netns', 'exec', 'fip-2f310092-1c75-4cb6-9758-edb13fb96d60', 'ip', '-4',
'route', 'replace', 'X.Y.Z.169/32', 'via', '169.254.118.42', 'dev',
'fpr-ee69bc58-1'] create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
2018-10-06 01:43:48.516 14 DEBUG neutron.agent.linux.utils [-] Running
command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip',
'netns', 'exec', 'fip-2f310092-1c75-4cb6-9758-edb13fb96d60', 'arping', '-U',
'-I', 'fg-331574bc-69', '-c', '1', '-w', '1.5', 'X.Y.Z.169'] create_process
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
I've noticed this happens specially when we deploy instances and
attach FIPs via Heat templates. Restarting the L3-agent service
sometimes helps to reduce this condition, but eventually we encounter
this issue again.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1796491/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp