** Changed in: neutron
Status: In Progress => Won't Fix
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1636466
Title:
HA router interface points to wrong host after network disruption
Status in neutron:
Won't Fix
Bug description:
If overlay network of a network node is down for a while, the slave node of
HA router can't receive the VRRP packet, so it will promote itself as the
master node. Then L3 agent updates ha_state of the router bound with itself to
active and updates port bindings of the router interfaces to the associated
host.
After network recovery, one of the two master nodes of a HA router will be
degraded to the slave node. If the degraded node is exactly the previous slave
node, L3 agent updates ha_state of the router bound with itself to standby but
won't update port bindings of the router interfaces to the host hosting the
original master node. Then packets sent to the router are sent to the slave
node because l2pop uses the incorrect port bindings.
As the keepalived configuration priority are the same 50, the probability of
occurrence of the above problem in two network node scenario is 50%.
How to reproduce:
- two network nodes: host1, host2.
- create a ha router: router1, a network: network1 and a subnet: subnet1, add
interface of subnet1 to router1.
$ neutron l3-agent-list-hosting-router subnet1
+--------------------------------------+------------+----------------+-------+----------+
| id | host | admin_state_up | alive
| ha_state |
+--------------------------------------+------------+----------------+-------+----------+
| 3a3b8d27-e5b4-42c0-9433-2ba8b6be98c2 | host1 | True | :-)
| standby |
| 4eba4a33-1452-4f4e-8874-a8eff2f4f357 | host2 | True | :-)
| active |
+--------------------------------------+------------+----------------+-------+----------+
$ neutron router-port-list subnet1 -c id -c binding:host_id -c fixed_ips
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
| id | binding:host_id | fixed_ips
|
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
| 00a89bc5-a589-4c37-9db0-a7b439c4dee9 | host1 | {"subnet_id":
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.6"} |
| b83590b2-0bf9-4fe7-b29f-0d37c92a9b3a | host2 | {"subnet_id":
"75e30064-a625-4267-8cbf-20d1a7b6e952", "ip_address": "192.168.10.1"} |
| ca2a66e0-5525-4302-b00f-0e703dbb48e2 | host2 | {"subnet_id":
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.1"} |
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
- disconnect host1 from the overlay network, wait until the
l3-agent-list-hosting-router api show that the two ha_state of router1 are both
active.
$ neutron l3-agent-list-hosting-router subnet1
+--------------------------------------+------------+----------------+-------+----------+
| id | host | admin_state_up | alive
| ha_state |
+--------------------------------------+------------+----------------+-------+----------+
| 3a3b8d27-e5b4-42c0-9433-2ba8b6be98c2 | host1 | True | :-)
| active |
| 4eba4a33-1452-4f4e-8874-a8eff2f4f357 | host2 | True | :-)
| active |
+--------------------------------------+------------+----------------+-------+----------+
$ neutron router-port-list subnet1 -c id -c binding:host_id -c fixed_ips
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
| id | binding:host_id | fixed_ips
|
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
| 00a89bc5-a589-4c37-9db0-a7b439c4dee9 | host1 | {"subnet_id":
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.6"} |
| b83590b2-0bf9-4fe7-b29f-0d37c92a9b3a | host1 | {"subnet_id":
"75e30064-a625-4267-8cbf-20d1a7b6e952", "ip_address": "192.168.10.1"} |
| ca2a66e0-5525-4302-b00f-0e703dbb48e2 | host2 | {"subnet_id":
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.1"} |
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
- restore the overlay network of host1, wait until one ha_state of router1
turn to standby. There is a 50% probability that the port binding of the
interface of router1 would be inconsistent with the host hosting the active
node. Then instances in subnet1 can't reach the router interface.
$ neutron l3-agent-list-hosting-router subnet1
+--------------------------------------+------------+----------------+-------+----------+
| id | host | admin_state_up | alive
| ha_state |
+--------------------------------------+------------+----------------+-------+----------+
| 3a3b8d27-e5b4-42c0-9433-2ba8b6be98c2 | host1 | True | :-)
| standby |
| 4eba4a33-1452-4f4e-8874-a8eff2f4f357 | host2 | True | :-)
| active |
+--------------------------------------+------------+----------------+-------+----------+
$ neutron router-port-list subnet1 -c id -c binding:host_id -c fixed_ips
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
| id | binding:host_id | fixed_ips
|
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
| 00a89bc5-a589-4c37-9db0-a7b439c4dee9 | host1 | {"subnet_id":
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.6"} |
| b83590b2-0bf9-4fe7-b29f-0d37c92a9b3a | host1 | {"subnet_id":
"75e30064-a625-4267-8cbf-20d1a7b6e952", "ip_address": "192.168.10.1"} |
| ca2a66e0-5525-4302-b00f-0e703dbb48e2 | host2 | {"subnet_id":
"6bb7aced-6b8f-448f-813d-d1bc91b9ee2d", "ip_address": "169.254.192.1"} |
+--------------------------------------+-----------------+--------------------------------------------------------------------------------------+
Expected behavior:
- update ha_state of a HA router to standby should trigger to update port
binding of the router interfaces to the host whose ha_state is active.
Affected versions:
can be reproduced in master branch, guess mitaka and newton are also affected.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1636466/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp