On 27/09/13 17:58, Clint Byrum wrote:
Excerpts from Zane Bitter's message of 2013-09-27 06:58:40 -0700:
On 27/09/13 08:58, Mike Spreitzer wrote:
I have begun to draft some specifics about the sorts of policies that
might be added to infrastructure to inform a smart unified placement
engine.  These are cast as an extension to Heat templates.  See
https://wiki.openstack.org/wiki/Heat/PolicyExtension.  Comments solicited.

These are not the kinds of specifics that are of any help at all in
figuring out how (or, indeed, whether) to incorporate holistic
scheduling into OpenStack.

I agree that the things in that page are a wet dream of logical deployment
fun. However, I think one can target just a few of the basic ones,
and see a real achievable case forming. I think I grasp Mike's ideas,
so I'll respond to your concerns with what I think. Note that it is
highly likely I've gotten some of this wrong.

Thanks for having a crack at this Clint. However, I think your example is not apposite, because it doesn't actually require any holistic scheduling. You can easily do anti-colocation of a bunch of servers just using scheduler hints to the Nova API (stick one in each zone until you run out of zones). This just requires Heat to expose the scheduler hints portion of the Nova API. To my mind this stuff is so basic that it falls squarely in the category of what you said in a previous thread:

There is
definitely a need for Heat to be able to communicate to the API's any
placement details that can be communicated. However, Heat should not
actually be "scheduling" anything.

But in any event, most of your answers appear to be predicated on this very simple case, not on a holistic scheduler. I think you are vastly underestimating the complexity of the problem.

What Mike is proposing is something more sophisticated, whereby you can solve for the optimal scheduling of resources of different types across different APIs. There may be a case for including this in Heat, but it needs to be made, and IMO it needs to be made by answering these kinds of questions at a similar level of detail to the symmetric dyadic primitives wiki page.

BTW there is one more question I should add:

- Who will implement and maintain this service/feature, and the associated changes to existing services?

- What would a holistic scheduling service look like? A standalone
service? Part of heat-engine?

I see it as a preprocessor of sorts for the current infrastructure engine.
It would take the logical expression of the cluster and either turn
it into actual deployment instructions or respond to the user that it
cannot succeed. Ideally it would just extend the same Heat API.

- How will the scheduling service reserve slots for resources in advance
of them being created? How will those reservations be accounted for and
- In the event that slots are reserved but those reservations are not
taken up, what will happen?

I dont' see the word "reserve" in Mike's proposal, and I don't think this
is necessary for the more basic models like Collocation and Anti-Collocation.

Right, but we're not talking about only the basic models. Reservations are very much needed according to my understanding of the proposal, because the whole point is to co-ordinate across multiple services in a way that is impossible to do atomically.

Reservations would of course make the scheduling decisions more likely to
succeed, but it isn't necessary if we do things optimistically. If the
stack create or update fails, we can retry with better parameters.

- Once scheduled, how will resources be created in their proper slots as
part of a Heat template?

In goes a Heat template (sorry for not using HOT.. still learning it. ;)

     Type: Some::Defined::ProviderType
     Type: OS::Heat::HACluster
       ClusterSize: 3
       MaxPerAZ: 1
       PlacementStrategy: anti-collocation
       Resources: [ ServerTemplate ]

And if we have at least 2 AZ's available, it feeds to the heat engine:

     Type: Some::Defined::ProviderType
         availability-zone: zone-A
     Type: Some::Defined::ProviderType
         availability-zone: zone-B
     Type: Some::Defined::ProviderType
         availability-zone: zone-A

If not, holistic scheduler says back "I don't have enough AZ's to
satisfy MaxPerAZ".

Now, if Nova grows anti-affininty under the covers that it can manage
directly, a later version can just spit out:

     Type: Some::Defined::ProviderType
         instance-group: 0
         affinity-type: anti
     Type: Some::Defined::ProviderType
         instance-group: 1
         affinity-type: anti
     Type: Some::Defined::ProviderType
         instance-group: 0
         affinity-type: anti

The point is that the user cares about their servers not being in the
same failure domain, not how that happens.

- What about when the user calls the APIs directly? (i.e. does their own
orchestration - either hand-rolled or using their own standalone Heat.)

This has come up with autoscaling too. "Undefined" - that's not your stack.

Well, when we have the new autoscaling service you'll still be able to create an autoscaling group using your own standalone Heat engine. If the provider has a scheduling service, why shouldn't you be able to use that with your own standalone Heat engine too?

- How and from where will the scheduling service obtain the utilisation
data needed to perform the scheduling? What mechanism will segregate
this information from the end user?

I do think this is a big missing piece. Right now it is spread out
all over the place. Keystone at least has regions, so that could be
incorporated now. I briefly dug through the other API's and don't see
a way to enumerate AZ's or cells. Perhaps it is hiding in extensions?

I don't think this must be segregated from end users. An API for "show
me the placement decisions I can make" seems useful for anybody trying
to automate deployments. Anyway, probably best to keep it decentralized
and just make it so that each service can respond with lists of arguments
to their API that are likely to succeed.

I think you're thinking about the very simplest case still (e.g. list of AZs - we have that already). To implement a completely general scheduling service you're going to need data down to the level of e.g. which machines are overcommitted and by how much. Good luck convincing public cloud providers to make this available through a user-facing API. The unintended consequences only _begin_ with pathological user behaviour, and end somewhere in the realm of lawsuits, financial reporting and competitive analysis.

As Mike pointed out downthread, the scheduler primarily serves the cloud provider's interest. That means the raw input data is at best (when compared to the actual scheduler output) a record of exactly how much the provider does or does not care about users, and at worst a basis for users building their own scheduler that serves only their own interest.

So the scheduler service needs some privileged access to the internals of each service. Heat is unprivileged (it just calls public APIs - you can run your own locally). How to resolve that mismatch is a key question if scheduling is to become part of Heat.


OpenStack-dev mailing list

Reply via email to