Public bug reported:

Before HA rouer is used by agent,
1) HA network should be created
2) vr_id has to be allocated
3) HA router should able to create sufficient number of ports on HA network

If scheduler(from rpc worker) process the HA router(as router is available in 
DB) before these resources are created, then the following races(between api 
and rpc workers) can happen
1) Race for creating HA network
2) vr_id not avialable for agent, so can't spawn HA proxy process
3) If creating router ports in api worker is failed, router is deleted. So rpc 
worker will have races as router is deleted while it is binding router's ha 
ports to agent.


To avoid this, l3 scheduler should skip this router(while syncing for the 
agent) if above resources are not yet created.

To facilitate this, new status("ALLOCATING") is proposed for HA router in 
https://review.openstack.org/#/c/257059/
In this patch, first router is created and set status as ALLOCATING. And once 
all the above resources are created, its status is changed back to ACTIVE. 
Added proper checks(in the code) to skip using Router if it's status is 
ALLOCATING.
So with this patch
1) we are creating a new router status 
2) carefully identify where router can be accessed before its resources are 
created.
3) How code behaves(during its acess to router) when status transitioned from 
ALLOCATING to ACTIVE
Alternatively, if we are able to create HA router's resources before HA router 
creation, we can avoid a new status and new checks, but same functionality as 
https://review.openstack.org/#/c/257059/.

** Affects: neutron
     Importance: Undecided
     Assignee: venkata anil (anil-venkata)
         Status: New

** Changed in: neutron
     Assignee: (unassigned) => venkata anil (anil-venkata)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1566194

Title:
  Make sure resources for HA router exists before the router creation

Status in neutron:
  New

Bug description:
  Before HA rouer is used by agent,
  1) HA network should be created
  2) vr_id has to be allocated
  3) HA router should able to create sufficient number of ports on HA network

  If scheduler(from rpc worker) process the HA router(as router is available in 
DB) before these resources are created, then the following races(between api 
and rpc workers) can happen
  1) Race for creating HA network
  2) vr_id not avialable for agent, so can't spawn HA proxy process
  3) If creating router ports in api worker is failed, router is deleted. So 
rpc worker will have races as router is deleted while it is binding router's ha 
ports to agent.

  
  To avoid this, l3 scheduler should skip this router(while syncing for the 
agent) if above resources are not yet created.

  To facilitate this, new status("ALLOCATING") is proposed for HA router in 
https://review.openstack.org/#/c/257059/
  In this patch, first router is created and set status as ALLOCATING. And once 
all the above resources are created, its status is changed back to ACTIVE. 
Added proper checks(in the code) to skip using Router if it's status is 
ALLOCATING.
  So with this patch
  1) we are creating a new router status 
  2) carefully identify where router can be accessed before its resources are 
created.
  3) How code behaves(during its acess to router) when status transitioned from 
ALLOCATING to ACTIVE
  Alternatively, if we are able to create HA router's resources before HA 
router creation, we can avoid a new status and new checks, but same 
functionality as https://review.openstack.org/#/c/257059/.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1566194/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to