Public bug reported:

If overlay network of a network node is down for a while, the slave node of HA 
router can't receive the VRRP packet, so it will premote itself as the master 
node. Then L3 agent updates ha_state of the router bound with itself to active 
and updates port bindings of the router interfaces to the associated host. 
After network recovery, one of the two master nodes of a HA router will be 
degraded to the slave node. If the degraded node is exactly the previous slave 
node, L3 agent updates ha_state of the router bound with itself to standby but 
won't update port bindings of the router interfaces to the host hosting the 
original master node. Then packets sent to the router are sent to the slave 
node because l2pop uses the incorrect port bindings.
As the keepalived configuration priority are the same 50, the probability of 
occurrence of the above problem in two network node scenario is 50%.

How to reproduce:
- two network nodes: host1, host2.
- create a ha router: router1, a network: network1 and a subnet: subnet1, add 
interface of subnet1 to router1.
- disconnect host1 from the overlay network, wait until the 
l3-agent-list-hosting-router api show that the two ha_state of router1 are both 
active.
- restore the overlay network of host1, wait until one ha_state of router1 turn 
to standby. There is a 50% probability that the port binding of the interface 
of router1 would be inconsistent with the host hosting the active node. Then 
instances in subnet1 can't reach the router interface.

Expected behavior:
- update ha_state of a HA router to standby should trigger to update port 
binding of the router interfaces to the host whose ha_state is active.

Affected versions:
can be reproduced in master branch, guess mitaka and newton are also affected.

** Affects: neutron
     Importance: Undecided
     Assignee: Quan Tian (tianquan23)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1636466

Title:
  HA router interface points to wrong host after network disruption

Status in neutron:
  In Progress

Bug description:
  If overlay network of a network node is down for a while, the slave node of 
HA router can't receive the VRRP packet, so it will premote itself as the 
master node. Then L3 agent updates ha_state of the router bound with itself to 
active and updates port bindings of the router interfaces to the associated 
host. 
  After network recovery, one of the two master nodes of a HA router will be 
degraded to the slave node. If the degraded node is exactly the previous slave 
node, L3 agent updates ha_state of the router bound with itself to standby but 
won't update port bindings of the router interfaces to the host hosting the 
original master node. Then packets sent to the router are sent to the slave 
node because l2pop uses the incorrect port bindings.
  As the keepalived configuration priority are the same 50, the probability of 
occurrence of the above problem in two network node scenario is 50%.

  How to reproduce:
  - two network nodes: host1, host2.
  - create a ha router: router1, a network: network1 and a subnet: subnet1, add 
interface of subnet1 to router1.
  - disconnect host1 from the overlay network, wait until the 
l3-agent-list-hosting-router api show that the two ha_state of router1 are both 
active.
  - restore the overlay network of host1, wait until one ha_state of router1 
turn to standby. There is a 50% probability that the port binding of the 
interface of router1 would be inconsistent with the host hosting the active 
node. Then instances in subnet1 can't reach the router interface.

  Expected behavior:
  - update ha_state of a HA router to standby should trigger to update port 
binding of the router interfaces to the host whose ha_state is active.

  Affected versions:
  can be reproduced in master branch, guess mitaka and newton are also affected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1636466/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to