We had a very lively discussion this morning during the Scheduler subteam 
meeting, which was continued in a Google hangout. The subject was how to handle 
claiming resources when the Resource Provider is not "simple". By "simple", I 
mean a compute node that provides all of the resources itself, as contrasted 
with a compute node that uses a shared storage for disk space, or which has 
complex nested relationships with things such as PCI devices or NUMA nodes. The 
current situation is as follows:

a) scheduler gets a request with certain resource requirements (RAM, disk, CPU, 
etc.)
b) scheduler passes these resource requirements to placement, which returns a 
list of hosts (compute nodes) that can satisfy the request.
c) scheduler runs these through some filters and weighers to get a list ordered 
by best "fit"
d) it then tries to claim the resources, by posting to placement allocations 
for these resources against the selected host
e) once the allocation succeeds, scheduler returns that host to conductor to 
then have the VM built

(some details for edge cases left out for clarity of the overall process)

The problem we discussed comes into play when the compute node isn't the actual 
provider of the resources. The easiest example to consider is when the computes 
are associated with a shared storage provider. The placement query is smart 
enough to know that even if the compute node doesn't have enough local disk, it 
will get it from the shared storage, so it will return that host in step b) 
above. If the scheduler then chooses that host, when it tries to claim it, it 
will pass the resources and the compute node UUID back to placement to make the 
allocations. This is the point where the current code would fall short: 
somehow, placement needs to know to allocate the disk requested against the 
shared storage provider, and not the compute node.

One proposal is to essentially use the same logic in placement that was used to 
include that host in those matching the requirements. In other words, when it 
tries to allocate the amount of disk, it would determine that that host is in a 
shared storage aggregate, and be smart enough to allocate against that 
provider. This was referred to in our discussion as "Plan A".

Another proposal involved a change to how placement responds to the scheduler. 
Instead of just returning the UUIDs of the compute nodes that satisfy the 
required resources, it would include a whole bunch of additional information in 
a structured response. A straw man example of such a response is here: 
https://etherpad.openstack.org/p/placement-allocations-straw-man. This was 
referred to as "Plan B". The main feature of this approach is that part of that 
response would be the JSON dict for the allocation call, containing the 
specific resource provider UUID for each resource. This way, when the scheduler 
selects a host, it would simply pass that dict back to the /allocations call, 
and placement would be able to do the allocations directly against that 
information.

There was another issue raised: simply providing the host UUIDs didn't give the 
scheduler enough information in order to run its filters and weighers. Since 
the scheduler uses those UUIDs to construct HostState objects, the specific 
missing information was never completely clarified, so I'm just including this 
aspect of the conversation for completeness. It is orthogonal to the question 
of how to allocate when the resource provider is not "simple".

My current feeling is that we got ourselves into our existing mess of ugly, 
convoluted code when we tried to add these complex relationships into the 
resource tracker and the scheduler. We set out to create the placement engine 
to bring some sanity back to how we think about things we need to virtualize. I 
would really hate to see us make the same mistake again, by adding a good deal 
of complexity to handle a few non-simple cases. What I would like to avoid, no 
matter what the eventual solution chosen, is representing this complexity in 
multiple places. Currently the only two candidates for this logic are the 
placement engine, which knows about these relationships already, or the compute 
service itself, which has to handle the management of these complex virtualized 
resources.

I don't know the answer. I'm hoping that we can have a discussion that might 
uncover a clear approach, or, at the very least, one that is less murky than 
the others.


-- Ed Leafe





Attachment: signature.asc
Description: Message signed with OpenPGP

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to