Re: [openstack-dev] [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Matt Riedemann Mon, 22 May 2017 12:14:39 -0700

On 5/22/2017 1:50 PM, Matt Riedemann wrote:

On 5/22/2017 12:54 PM, Jay Pipes wrote:
Hi Ops,
I need your feedback on a very important direction we would like topursue. I realize that there were Forum sessions about this topic atthe summit in Boston and that there were some decisions that werereached.
I'd like to revisit that decision and explain why I'd like yoursupport for getting rid of the automatic reschedule behaviour entirelyin Nova for Pike.
== The current situation and why it sucks ==
Nova currently attempts to "reschedule" instances when any of thefollowing events occur:
a) the "claim resources" process that occurs on the nova-computeworker results in the chosen compute node exceeding its own capacity
b) in between the time a compute node was chosen by the scheduler,another process launched an instance that would violate an affinityconstraint
c) an "unknown" exception occurs during the spawn process. Inpractice, this really only is seen when the Ironic baremetal node thatwas chosen by the scheduler turns out to be unreliable (IPMI issues,BMC failures, etc) and wasn't able to launch the instance. [1]
The logic for handling these reschedules makes the Nova conductor,scheduler and compute worker code very complex. With the new cellsv2architecture in Nova, child cells are not able to communicate with theNova scheduler (and thus "ask for a reschedule").
To be clear, they are able to communicate, and do, as long as youconfigure them to be able to do so. The long-term goal is that you don'thave to configure them to be able to do so, so we're trying to designand work in that mode toward that goal.
We (the Nova team) would like to get rid of the automated reschedulingbehaviour that Nova currently exposes because we could eliminate alarge amount of complexity (which leads to bugs) from thealready-complicated dance of communication that occurs betweeninternal Nova components.
== What we would like to do ==
With the move of the resource claim to the Nova scheduler [2], we canentirely eliminate the a) class of Reschedule causes.
This leaves class b) and c) causes of Rescheduling.
For class b) causes, we should be able to solve this issue when theplacement service understands affinity/anti-affinity (maybeQueens/Rocky). Until then, we propose that instead of raising aReschedule when an affinity constraint was last-minute violated due toa racing scheduler decision, that we simply set the instance to anERROR state.
Personally, I have only ever seen anti-affinity/affinity use cases inrelation to NFV deployments, and in every NFV deployment of OpenStackthere is a VNFM or MANO solution that is responsible for theorchestration of instances belonging to various service functionchains. I think it is reasonable to expect the MANO system to beresponsible for attempting a re-launch of an instance that was set toERROR due to a last-minute affinity violation.
**Operators, do you agree with the above?**
Finally, for class c) Reschedule causes, I do not believe that weshould be attempting automated rescheduling when "unknown" errorsoccur. I just don't believe this is something Nova should be doing.
I recognize that large Ironic users expressed their concerns aboutIPMI/BMC communication being unreliable and not wanting to have usersmanually retry a baremetal instance launch. But, on this particularpoint, I'm of the opinion that Nova just do one thing and do it well.Nova isn't an orchestrator, nor is it intending to be a "justcontinually try to get me to this eventual state" system like Kubernetes.
If we removed Reschedule for class c) failures entirely, large Ironicdeployers would have to train users to manually retry a failed launchor would need to write a simple retry mechanism into whateverclient/UI that they expose to their users.
**Ironic operators, would the above decision force you to abandon Novaas the multi-tenant BMaaS facility?**
Thanks in advance for your consideration and feedback.

Best,
-jay
[1] This really does not occur with any frequency for hypervisor virtdrivers, since the exceptions those hypervisors throw are caught bythe nova-compute worker and handled without raising a Reschedule.
Are you sure about that?
https://github.com/openstack/nova/blob/931c3f48188e57e71aa6518d5253e1a5bd9a27c0/nova/compute/manager.py#L2041-L2049
The compute manager handles anything non-specific that leaks up from thevirt driver.spawn() method and reschedules it. ThinkProcessExecutionError when vif plugging fails in the libvirt driverbecause the command blew up for some reason (sudo on the host iswrong?). I'm not saying it should, as I'm guessing most of these typesof failures are due to misconfiguration, but it is how things currentlywork today.
[2]http://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-claims.html
_______________________________________________
OpenStack-operators mailing list
openstack-operat...@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Not to sound like we don't have a united front here, but I want torestate the concern I expressed this morning when talking about this.


I'm not an operator and don't have the background or experience there.

The 95% number thrown around at the summit was made up, as far as Iknow. There is no published data that I'm aware of which says someonetested reschedules at scale and in 95% of cases they were due to thesituation described in (a) above.

We're less than three weeks from the p-2 milestone. Feature freeze isJuly 27. That is plenty of time (ideally) to get this code done andmerged. However, I don't want to underestimate the number of weirdthings that are going to come out of this pretty large change in howthings work, especially when multiple cells and quotas changes arehappening.

Therefore I'm on the side of being conservative here and allowingreschedules within a cell for now. I think long-term it'd be a good ideato disable reschedules by default for new installs, and for people thatreally need them (or feel more secure by having them) then they can turnthem on. But I'd rather see that gradually phased out once we see howthings are working for awhile (at least a release).

Yes that means possible duplication and technical debt, but I thinkwe've always accepted some of that, at least temporarily, for largechanges so we can ease the transition.


--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Reply via email to