On 05/30/2016 11:22 PM, Cheng, Yingxin wrote:
Hi, cdent:

This problem arises because the RT(resource tracker) only knows to
consume the DISK resource in its host, but it still doesn’t know
exactly which resource provider to place the consumption. That is to
say, the RT still needs to *find* the correct resource provider in
the step 4. The *step 4* finally causes the explicit problem that
“the RT can find two resource providers providing DISK_GB, but it
doesn’t know which is right”, as you’ve encountered.

The problem is: the RT needs to make a decision to choose a resource
provider when it finds multiple of them according to *step 4*.
However, the scheduler should already know which resource provider to
choose when it is making a decision, and it doesn’t send this
information to compute nodes, either. That’s also to say, there is a
missing step in the bp g-r-p that we should “improve filter scheduler
that can make correct decisions with generic resource pools”, the
scheduler should tell the compute node RT not only about the
resources consumptions in the compute-node resource provider, but
also the information where to consume shared resources, i.e. their
related resource-provider-ids.

Well, that is the problem with not having the scheduler actually do the claiming of resources on a provider. :(

At this time, the compute node (specifically, its resource tracker) is the thing that does the actual claim of the resources in a request against the resource inventories it understands for itself.

This is why even though the scheduler "makes a placement decision" for things like which NUMA cell/node that a workload will be placed on [1], that decision is promptly forgotten about and ignored and the compute node makes a totally different decision [2] when claiming NUMA topology resources after it receives the instance request containing NUMA topology requests. :(

Is this silly and should, IMHO, the scheduler *actually* do the claim of resources on a provider? Yes, see [3] which still needs a spec pushed.

Is this going to change any time soon? Unfortunately, no.

Unfortunately, a compute node isn't aware that it may be consuming resources from a shared storage pool, which is what Step #4 is all about: making the compute node aware that it is using a shared storage pool if it is indeed using a shared storage pool. I'll answer Chris' email directly with more details.

Best,
-jay

[1] https://github.com/openstack/nova/blob/83cd67cd89ba58243d85db8e82485bda6fd00fde/nova/scheduler/filters/numa_topology_filter.py#L81 [2] https://github.com/openstack/nova/blob/83cd67cd89ba58243d85db8e82485bda6fd00fde/nova/compute/claims.py#L215 [3] https://blueprints.launchpad.net/nova/+spec/resource-providers-scheduler-claims

Hope it can help you.


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to