Re: [openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

Jay Pipes Fri, 09 Jun 2017 07:58:48 -0700

On 06/05/2017 05:22 PM, Ed Leafe wrote:

Another proposal involved a change to how placement responds to the
scheduler. Instead of just returning the UUIDs of the compute nodes
that satisfy the required resources, it would include a whole bunch
of additional information in a structured response. A straw man
example of such a response is here:
https://etherpad.openstack.org/p/placement-allocations-straw-man.
This was referred to as "Plan B".

Actually, this was Plan "C". Plan "B" was to modify the return of theGET /resource_providers Placement REST API endpoint.


> The main feature of this approach

is that part of that response would be the JSON dict for the
allocation call, containing the specific resource provider UUID for
each resource. This way, when the scheduler selects a host

Important clarification is needed here. The proposal is to have thescheduler actually select *more than just the compute host*. Thescheduler would select the host, any sharing providers and any childproviders within a host that actually contained the resources/traitsthat the request demanded.


>, it would

simply pass that dict back to the /allocations call, and placement
would be able to do the allocations directly against that
information.

There was another issue raised: simply providing the host UUIDs
didn't give the scheduler enough information in order to run its
filters and weighers. Since the scheduler uses those UUIDs to
construct HostState objects, the specific missing information was
never completely clarified, so I'm just including this aspect of the
conversation for completeness. It is orthogonal to the question of
how to allocate when the resource provider is not "simple".


The specific missing information is the following, but not limited to:

* Whether or not a resource can be provided by a sharing provider or a"local provider" or either. For example, assume a compute node that isassociated with a shared storage pool via an aggregate but that also haslocal disk for instances. The Placement API currently returns just thecompute host UUID but no indication of whether the compute host haslocal disk to consume from, has shared disk to consume from, or both.The scheduler is the thing that must weigh these choices and make achoice. The placement API gives the scheduler the choices and thescheduler makes a decision based on sorting/weighing algorithms.

It is imperative to remember the reason *why* we decided (way back inPortland at the Nova mid-cycle last year) to keep sorting/weighing inthe Nova scheduler. The reason is because operators (and somedevelopers) insisted on being able to weigh the possible choices in waysthat "could not be pre-determined". In other words, folks wanted to keepthe existing uber-flexibility and customizability that the schedulerweighers (and home-grown weigher plugins) currently allow, includingbeing able to sort possible compute hosts by such things as the averagethermal temperature of the power supply the hardware was connected toover the last five minutes (I kid you friggin not.)

* Which SR-IOV physical function should provider an SRIOV_NET_VFresource to an instance. Imagine a situation where a compute host has 4SR-IOV physical functions, each having some traits representing hardwareoffload support and each having an inventory of 8 SRIOV_NET_VF.Currently the scheduler absolutely has the information to pick one ofthese SRIOV physical functions to assign to a workload. What thescheduler does *not* have, however, is a way to tell the Placement APIto consume an SRIOV_NET_VF from that particular physical function. Why?Because the scheduler doesn't know that a particular physical functioneven *is* a resource provider in the placement API. *Something* needs toinform the scheduler that the physical function is a resource providerand has a particular UUID to identify it. This is precisely what theproposed GET /allocation_requests HTTP response data provides to thescheduler.

My current feeling is that we got ourselves into our existing mess of
ugly, convoluted code when we tried to add these complex
relationships into the resource tracker and the scheduler. We set out
to create the placement engine to bring some sanity back to how we
think about things we need to virtualize.

Sorry, I completely disagree with your assessment of why the placementengine exists. We didn't create it to bring some sanity back to how wethink about things we need to virtualize. We created it to addconsistency and structure to the representation of resources in the system.

I don't believe that exposing this structured representation ofresources is a bad thing or that it is leaking "implementation details"out of the placement API. It's not an implementation detail that aresource provider is a child of another or that a different resourceprovider is supplying some resource to a group of other providers.That's simply an accurate representation of the underlying data structures.


> I would really hate to see

us make the same mistake again, by adding a good deal of complexity
to handle a few non-simple cases. What I would like to avoid, no
matter what the eventual solution chosen, is representing this
complexity in multiple places. Currently the only two candidates for
this logic are the placement engine, which knows about these
relationships already, or the compute service itself, which has to
handle the management of these complex virtualized resources.

The compute service will need to know about the hierarchies of providerson a particular compute node. That isn't complexity. It's simplyaccurate representation of the underlying data structures. Instead ofrandom dicts of key/value pairs and different serialized JSON blobs foreach particular class of resources, we now have a single, consistent wayof describing the providers of those resources.

I don't know the answer. I'm hoping that we can have a discussion
that might uncover a clear approach, or, at the very least, one that
is less murky than the others.

I really like Dan's idea of returning a list of HTTP request bodies forPOST /allocations/{consumer_uuid} calls along with a list of providerinformation that the scheduler can use in its sorting/weighing algorithms.


We've put this straw-man proposal here:

https://review.openstack.org/#/c/471927/

I'm hoping to keep the conversation going there.

Best,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

Reply via email to