Public bug reported:

L3 HA did not work with l2pop at all, and that was fixed here:
https://bugs.launchpad.net/neutron/+bug/1365476 via 
https://review.openstack.org/#/c/141114/.

However, the solution is sub optimal because it assumes the control plane is 
operational for fail over to work correctly.
Without l2pop, L3 HA can fail over successfully if the database, messaging 
server, neutron-server and destination L3 agent are dead. With l2pop, all four 
are needed. This is because for fail over to work, the destination L3 agent 
notices that a router has transitioned to master, and notifies neutron-server 
via RPC. At which point neutron-server updates all of the internal router 
port's 'binding:host' value to point to the target node, and l2pop code is 
executed in order to update the L2 agents.

Instead, I'd like fail over to rely solely on the data plane regardless
if l2pop is on or off. One such solution would be something similar to
patch set 9 of the patch:
https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to
tell l2pop to treat HA router ports as replicated ports (Which they
are), so that tunnel endpoints would be created against all nodes that
host replicas of the router, and the destination MAC address of the port
would not be learned via l2pop, but via the fallback regular MAC
learning mechanism. This means that we lost some of the advantage of
l2pop, but I think it is essential to correct operation of L3 HA.

** Affects: neutron
     Importance: Medium
         Status: New


** Tags: l2-pop l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1522980

Title:
  L3 HA integration with l2pop assumes control plane is operational for
  fail over

Status in neutron:
  New

Bug description:
  L3 HA did not work with l2pop at all, and that was fixed here:
  https://bugs.launchpad.net/neutron/+bug/1365476 via 
https://review.openstack.org/#/c/141114/.

  However, the solution is sub optimal because it assumes the control plane is 
operational for fail over to work correctly.
  Without l2pop, L3 HA can fail over successfully if the database, messaging 
server, neutron-server and destination L3 agent are dead. With l2pop, all four 
are needed. This is because for fail over to work, the destination L3 agent 
notices that a router has transitioned to master, and notifies neutron-server 
via RPC. At which point neutron-server updates all of the internal router 
port's 'binding:host' value to point to the target node, and l2pop code is 
executed in order to update the L2 agents.

  Instead, I'd like fail over to rely solely on the data plane
  regardless if l2pop is on or off. One such solution would be something
  similar to patch set 9 of the patch:
  https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to
  tell l2pop to treat HA router ports as replicated ports (Which they
  are), so that tunnel endpoints would be created against all nodes that
  host replicas of the router, and the destination MAC address of the
  port would not be learned via l2pop, but via the fallback regular MAC
  learning mechanism. This means that we lost some of the advantage of
  l2pop, but I think it is essential to correct operation of L3 HA.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1522980/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to