Reviewed: https://review.openstack.org/357966 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3e4c0ae223de776732f70626b387fba4de2e9c3f Submitter: Jenkins Branch: master
commit 3e4c0ae223de776732f70626b387fba4de2e9c3f Author: John Schwarz <[email protected]> Date: Fri Aug 19 15:23:36 2016 +0100 Revert "Add ALLOCATING state to routers" This reverts commit 9c3c19f07ce52e139d431aec54341c38a183f0b7. Following the merge of Ie98d5e3760cdb17450aea546f4b61f5ba14baf1c, the creation of new router uses RouterL3AgentBinding and its' new binding_index attribute to ensure correctness of the resources. As such, the ALLOCATING state (which was used to do just that) is no longer needed and can be removed. Closes-Bug: #1609738 Change-Id: Ib04e08df13ef4e6b94bd588854a5795163e2a617 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1609738 Title: l3-ha: a router can be stuck in the ALLOCATING state Status in neutron: Fix Released Bug description: The scenario is a simple one: during the creation of a router, the server that deals with the request crashes after creating the router with the ALLOCATING state [1] but before it's changed to ACTIVE [2]. In this case, the router will be "stuck" in the ALLOCATING and the only admin action to change the router back to ACTIVE (and allow it to be scheduled to agents) is: 1. set admin-state-up to False 2. set ha to False 3. set ha to True 4. set admin-state-up to True That is, a full migration of the HA router to legacy and back to HA is required. This will trigger the code in [3] and will fix this issue. However, these 4 steps aren't intuitive at all - why should a user re- set the router as an HA to solve a weird state of the router? Skipping steps 2 and 3 (only re-setting the admin-state-up) won't work because, as mentioned before, the scheduling happens on steps 2 and 3 (i.e. when the router is set to ha=False it's unscheduled, and when it's set to ha=True it is scheduled as if it's a new router). In fact, this means that the problem is more severe: if the server crashed in the middle of setting up the resources of an HA router, all 4 steps must be done to ensure the router is made valid again. The proposed solution is to add a new state, such that if admin-state- up is changed to False then the router's status will be changed to "DOWN" (as opposed to the current "ACTIVE", which doesn't make much sense since admin-state-up is False). This will help mitigate the "stuck ALLOCATING status" portion of the problem. In addition to changing the status, we will need to change the logic such that a router is unscheduled on admin-state-up=False and scheduled on admin-state-up=True. This will let us skip steps 2 and 3 and go straight for re-setting the admin-state-up attribute of a router, which is more intuitive. [1]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L469 [2]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L485 [3]: https://github.com/openstack/neutron/blob/ff5b38071e7e134baa0dc7a52280f9bcbc06efaf/neutron/db/l3_hamode_db.py#L570 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1609738/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

