Eric, Response inline.
On 1/29/18, 10:27 AM, "Eric Fried" <openst...@fried.cc> wrote: >We had some lively discussion in #openstack-nova today, which I'll try >to summarize here. > >First of all, the hierarchy: > > controller (n-cond) > / \ > cluster/n-cpu cluster/n-cpu > / \ / \ > res. pool res. pool ... ... > / \ / \ > host host ... ... > / \ / \ >... ... inst inst > >Important points: > >(1) Instances do indeed get deployed to individual hosts, BUT vCenter >can and does move them around within a cluster independent of nova-isms >like live migration. > >(2) VMWare wants the ability to specify that an instance should be >deployed to a specific resource pool. > >(3) VMWare accounts for resources at the level of the resource pool (not >host). > >(4) Hosts can move fluidly among resource pools. > >(5) Conceptually, VMWare would like you not to see or think about the >'host' layer at all. > >(6) It has been suggested that resource pools may be best represented >via aggregates. But to satisfy (2), this would require support for >doing allocation requests that specify one (e.g. porting the GET >/resource_providers ?member_of=<agg> queryparam to GET >/allocation_candidates, and the corresponding flavor enhancements). And >doing so would mean getting past our reluctance up to this point of >exposing aggregates by name/ID to users. > >Here are some possible models: > >(A) Today's model, where the cluster/n-cpu is represented as a single >provider owning all resources. This requires some creative finagling of >inventory fields to ensure that a resource request might actually be >satisfied by a single host under this broad umbrella. (An example cited >was to set VCPU's max_unit to whatever one host could provide.) It is >not clear to me if/how resource pools have been represented in this >model thus far, or if/how it is currently possible to (2) target an >instance to a specific one. I also don't see how anything we've done >with traits or aggregates would help with that aspect in this model. > >(B) Representing each host as a root provider, each owning its own >actual inventory, each possessing a CUSTOM_RESOURCE_POOL_X trait >indicating which pool it belongs to at the moment; or representing pools >via aggregates as in (6). This model breaks because of (1), unless we >give virt drivers some mechanism to modify allocations (e.g. via POST >/allocations) without doing an actual migration. > >(C) Representing each resource pool as a root provider which presents >the collective inventory of all its hosts. Each could possess its own >unique CUSTOM_RESOURCE_POOL_X trait. Or we could possibly adapt >whatever mechanism Ironic uses when it targets a particular baremetal >node. Or we could use aggregates as in (6), where each aggregate is >associated with just one provider. This one breaks down because we >don't currently have a way for nova to know that, when an instance's >resources were allocated from the provider corresponding to resource >pool X, that means we should schedule the instance to (nova, n-cpu) host >Y. There may be some clever solution for this involving aggregates (NOT >sharing providers!), but it has not been thought through. It also >entails the same "creative finagling of inventory" described in (A). > >(D) Using actual nested resource providers: the "cluster" is the >(inventory-less) root provider, and each resource pool is a child of the >cluster. This is closest to representing the real logical hierarchy, >and is desirable for that reason. The drawback is that you then MUST >use some mechanism to ensure allocations are never spread across pools. >If your request *always* targets a specific resource pool, that works. >Otherwise, you would have to use a numbered request group, as described >below. It also entails the same "creative finagling of inventory" >described in (A). I think nested resource provider is better option for another reason. Every resource pool could have it's own limits. So, it is important to track the allocations/usage and ensure that the scheduler can throw error if there are no sufficient resources on the vcenter resource pool. NOTE: a vcenter cluster, which compute node, might have more capacity left. But, resource pool limit could prevent placing a VM on that pool. And yes, the request would always target a specific resource pool. > >(E) Take (D) a step further by adding each 'host' as a child of its >respective resource pool. No "creative finagling", but same "moving >allocations" issue as (B). This might not work because resource pool is a logical construct. They may not exist under vcenter cluster too. Vms can be placed on vcenter cluster with or without resource pool. > >I'm sure I've missed/misrepresented things. Please correct and refine >as necessary. > >Thanks, >Eric Thanks, Giri > >On 01/27/2018 12:23 PM, Eric Fried wrote: >> Rado- >> >> [+dev ML. We're getting pretty general here; maybe others will get >> some use out of this.] >> >>> is there a way to make the scheduler allocate only from one specific RP >> >> "...one specific RP" - is that Resource Provider or Resource Pool? >> >> And are we talking about scheduling an instance to a specific >> compute node, or are we talking about making sure that all the requested >> resources are pulled from the same compute node (but it could be any one >> of several compute nodes)? Or justlimiting the scheduler to any node in >> a specific resource pool? >> >> To make sure I'm fully grasping the VMWare-specific >> ratios/relationships between resource pools and compute nodes,I have >> been assuming: >> >> controller 1:many compute "host"(where n-cpu runs) >> compute "host" 1:many resource pool >> resource pool 1:many compute "node" (where instances can be scheduled) >> compute "node" 1:many instance >> >> (I don't know if this "host" vs"node" terminology is correct, but >> I'm going to keep pretending it is for the purposes of this note.) >> >> In particular, if that last line is true, then you do *not* want >> multiple compute "nodes" in the same provider tree. >> >>> if no custom trait is specified in the request? >> >> I am not aware of anything current or planned that will allow you to >> specify an aggregate you want to deploy from; so the only way I'm aware >> of that you could pin a request to a resource pool is to create a custom >> trait for that resource pool, tag all compute nodes in the pool with >> that trait, and specify that trait in your flavor. This way you don't >> use nested-ness at all. And in this model, there's also no need to >> create resource providers corresponding to resource pools - their >> solemanifestation is via traits. >> >> (Bonus: this model will work with what we've got merged in Queens - >> we didn't quiiite finish the piece of NRP that makes them work for >> allocation candidates, but we did merge trait support. We're also >> *mostly* there with aggregates, but I wouldn't want to rely on them >> working perfectly and we're not claiming full support for them.) >> >> To be explicit, in the model I'm suggesting, your compute "host", >> within update_provider_tree, would create new_root()s for each compute >> "node". So the "tree" isn't really a tree - it's a flat list of >> computes, of which one happens to correspond to the `nodename` and >> represents the compute "host". (I assume deploys can happen to the >> compute "host" just like they can to a compute "node"? If not, just >> give that guy no inventory and he'll be avoided.) It would then >> update_traits(node, ['CUSTOM_RPOOL_X']) for each. It would also >> update_inventory() for each as appropriate. >> >> Now on your deploys, to get scheduled to a particular resource pool, >> you would have to specify required=CUSTOM_RPOOL_X in your flavor. >> >> That's it. You never use new_child(). There are no providers >> corresponding to pools. There are no aggregates. >> >> Are we making progress, or am I confused/confusing? >> >> Eric >> >> >> On 01/27/2018 01:50 AM, Radoslav Gerganov wrote: >>> >>> +Chris >>> >>> >>> Hi Eric, >>> >>> Thanks a lot for sending this. I must admit that I am still trying to >>> catch up with how the scheduler (will) work when there are nested RPs, >>> traits, etc. I thought mostly about the case when we use a custom >>> trait to force allocations only from one resource pool. However, if >>> no trait is specified then we can end up in the situation that you >>> describe (allocating different resources from different resource >>> pools) and this is not what we want. If we go with the model that you >>> propose, is there a way to make the scheduler allocate only from one >>> specific RP if no custom trait is specified in the request? >>> >>> Thanks, >>> >>> Rado >>> >>> >>> ------------------------------------------------------------------------ >>> *From:* Eric Fried <openst...@fried.cc> >>> *Sent:* Friday, January 26, 2018 10:20 PM >>> *To:* Radoslav Gerganov >>> *Cc:* Jay Pipes >>> *Subject:* VMWare's resource pool / cluster and nested resource providers >>> >>> Rado- >>> >>> It occurred to me just now that the model you described to me >>> [1] isn't >>> going to work, unless there's something I really misunderstood. >>> >>> The problem is that the placement API will think it can allocate >>> resources from anywhere in the tree for a given allocation request >>> (unless you always use a single numbered request group [2] in your >>> flavors, which doesn't sound like a clean plan). >>> >>> So if you have *any* model where multiple compute nodes reside >>> in the >>> same provider tree, and I come along with a request for say >>> VCPU:1,MEMORY_MB:2048,DISK_GB:512, placement will happily give you a >>> candidate with the VCPU from compute10, the memory from compute5, and >>> the disk from compute7. I'm only guessing that this isn't a viable way >>> to boot an instance. >>> >>> I go back to my earlier suggestion: I think you need to create the >>> compute nodes as root providers in your ProviderTree, and find some >>> other way to mark the resource pool associations. You could do it with >>> custom traits (CUSTOM_RESOURCE_POOL_X, ..._Y, etc.); or you could do it >>> with aggregates (an aggregate maps to a resource pool; associate all the >>> compute providers in a given pool with its aggregate uuid). >>> >>> Thanks, >>> Eric >>> >>> [1] >>> http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-01-26.log.html#t2018-01-26T14:40:44 >>> [2] >>> https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/granular-resource-requests.html#numbered-request-groups >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >__________________________________________________________________________ >OpenStack Development Mailing List (not for usage questions) >Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev