On 06/05/2017 05:22 PM, Ed Leafe wrote:
Another proposal involved a change to how placement responds to the
scheduler. Instead of just returning the UUIDs of the compute nodes
that satisfy the required resources, it would include a whole bunch
of additional information in a structured response. A straw man
example of such a response is here:
https://etherpad.openstack.org/p/placement-allocations-straw-man.
This was referred to as "Plan B".

Actually, this was Plan "C". Plan "B" was to modify the return of the GET /resource_providers Placement REST API endpoint.

> The main feature of this approach
is that part of that response would be the JSON dict for the
allocation call, containing the specific resource provider UUID for
each resource. This way, when the scheduler selects a host

Important clarification is needed here. The proposal is to have the scheduler actually select *more than just the compute host*. The scheduler would select the host, any sharing providers and any child providers within a host that actually contained the resources/traits that the request demanded.

>, it would
simply pass that dict back to the /allocations call, and placement
would be able to do the allocations directly against that
information.

There was another issue raised: simply providing the host UUIDs
didn't give the scheduler enough information in order to run its
filters and weighers. Since the scheduler uses those UUIDs to
construct HostState objects, the specific missing information was
never completely clarified, so I'm just including this aspect of the
conversation for completeness. It is orthogonal to the question of
how to allocate when the resource provider is not "simple".

The specific missing information is the following, but not limited to:

* Whether or not a resource can be provided by a sharing provider or a "local provider" or either. For example, assume a compute node that is associated with a shared storage pool via an aggregate but that also has local disk for instances. The Placement API currently returns just the compute host UUID but no indication of whether the compute host has local disk to consume from, has shared disk to consume from, or both. The scheduler is the thing that must weigh these choices and make a choice. The placement API gives the scheduler the choices and the scheduler makes a decision based on sorting/weighing algorithms.

It is imperative to remember the reason *why* we decided (way back in Portland at the Nova mid-cycle last year) to keep sorting/weighing in the Nova scheduler. The reason is because operators (and some developers) insisted on being able to weigh the possible choices in ways that "could not be pre-determined". In other words, folks wanted to keep the existing uber-flexibility and customizability that the scheduler weighers (and home-grown weigher plugins) currently allow, including being able to sort possible compute hosts by such things as the average thermal temperature of the power supply the hardware was connected to over the last five minutes (I kid you friggin not.)

* Which SR-IOV physical function should provider an SRIOV_NET_VF resource to an instance. Imagine a situation where a compute host has 4 SR-IOV physical functions, each having some traits representing hardware offload support and each having an inventory of 8 SRIOV_NET_VF. Currently the scheduler absolutely has the information to pick one of these SRIOV physical functions to assign to a workload. What the scheduler does *not* have, however, is a way to tell the Placement API to consume an SRIOV_NET_VF from that particular physical function. Why? Because the scheduler doesn't know that a particular physical function even *is* a resource provider in the placement API. *Something* needs to inform the scheduler that the physical function is a resource provider and has a particular UUID to identify it. This is precisely what the proposed GET /allocation_requests HTTP response data provides to the scheduler.

My current feeling is that we got ourselves into our existing mess of
ugly, convoluted code when we tried to add these complex
relationships into the resource tracker and the scheduler. We set out
to create the placement engine to bring some sanity back to how we
think about things we need to virtualize.

Sorry, I completely disagree with your assessment of why the placement engine exists. We didn't create it to bring some sanity back to how we think about things we need to virtualize. We created it to add consistency and structure to the representation of resources in the system.

I don't believe that exposing this structured representation of resources is a bad thing or that it is leaking "implementation details" out of the placement API. It's not an implementation detail that a resource provider is a child of another or that a different resource provider is supplying some resource to a group of other providers. That's simply an accurate representation of the underlying data structures.

> I would really hate to see
us make the same mistake again, by adding a good deal of complexity
to handle a few non-simple cases. What I would like to avoid, no
matter what the eventual solution chosen, is representing this
complexity in multiple places. Currently the only two candidates for
this logic are the placement engine, which knows about these
relationships already, or the compute service itself, which has to
handle the management of these complex virtualized resources.

The compute service will need to know about the hierarchies of providers on a particular compute node. That isn't complexity. It's simply accurate representation of the underlying data structures. Instead of random dicts of key/value pairs and different serialized JSON blobs for each particular class of resources, we now have a single, consistent way of describing the providers of those resources.

I don't know the answer. I'm hoping that we can have a discussion
that might uncover a clear approach, or, at the very least, one that
is less murky than the others.

I really like Dan's idea of returning a list of HTTP request bodies for POST /allocations/{consumer_uuid} calls along with a list of provider information that the scheduler can use in its sorting/weighing algorithms.

We've put this straw-man proposal here:

https://review.openstack.org/#/c/471927/

I'm hoping to keep the conversation going there.

Best,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to