Public bug reported:
When neutron router is on some network node in backup state (other network node
is "active" for this router), and such network node will be rebooted it may
happen that connectivity to router's gateway port will be broken.
It can happen due to race between L3 agent and OVS agent and is easier to
reproduce when You have many routers in backup state on such node.
I was testing it with 10 routers, all in backup state. In such case 1 or 2
routers had got broken connectivity after reboot of host.
It is like that because when L3 agent adds interface to the router, it checks
if there is any IPv6 link-local address on interface and if there is, it flush
such IPv6 addresses and adds them to keepalived config. So keepalived can
manage such IPs as any other IP address from this interface.
But the problem is that when IPv6 address is removed from the interface, it
sends MLDv2 packets to unsubsribe from multicast group. And if those packets
will go out from host e.g. to ToR switch, such switch will learn that MAC
address of gw port is on wrong host (this rebooted one instead of one where
router is in master state).
Thos MLDv2 packets aren't send to the wire for each router but only for some of
them due to race.
Basically new qg-XXX port is created in br-int by L3 agent with DEAD_VLAN_TAG
(4095) and than both agents, L3 and OVS are configuring it. If L3 agent flush
IPv6 addresses from this interface BEFORE OVS agent sets proper tag
(local_vlan_id) for the port, than all is fine because MLDv2 packets are
dropped. But if L3 agent will flush AFTER tag is changed, than MLDv2 packets
are send to the wire and cause ingress connectivity break.
** Affects: neutron
Importance: Medium
Assignee: Slawek Kaplonski (slaweq)
Status: New
** Tags: l3-ha
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1859832
Title:
L3 HA connectivity to GW port can be broken after reboot of backup
node
Status in neutron:
New
Bug description:
When neutron router is on some network node in backup state (other network
node is "active" for this router), and such network node will be rebooted it
may happen that connectivity to router's gateway port will be broken.
It can happen due to race between L3 agent and OVS agent and is easier to
reproduce when You have many routers in backup state on such node.
I was testing it with 10 routers, all in backup state. In such case 1 or 2
routers had got broken connectivity after reboot of host.
It is like that because when L3 agent adds interface to the router, it checks
if there is any IPv6 link-local address on interface and if there is, it flush
such IPv6 addresses and adds them to keepalived config. So keepalived can
manage such IPs as any other IP address from this interface.
But the problem is that when IPv6 address is removed from the interface, it
sends MLDv2 packets to unsubsribe from multicast group. And if those packets
will go out from host e.g. to ToR switch, such switch will learn that MAC
address of gw port is on wrong host (this rebooted one instead of one where
router is in master state).
Thos MLDv2 packets aren't send to the wire for each router but only for some
of them due to race.
Basically new qg-XXX port is created in br-int by L3 agent with DEAD_VLAN_TAG
(4095) and than both agents, L3 and OVS are configuring it. If L3 agent flush
IPv6 addresses from this interface BEFORE OVS agent sets proper tag
(local_vlan_id) for the port, than all is fine because MLDv2 packets are
dropped. But if L3 agent will flush AFTER tag is changed, than MLDv2 packets
are send to the wire and cause ingress connectivity break.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1859832/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp