Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Matt Riedemann Fri, 02 Nov 2018 13:34:35 -0700

On 11/2/2018 2:22 PM, Eric Fried wrote:

Based on a (long) discussion yesterday [1] I have put up a patch [2]
whereby you can set [compute]resource_provider_association_refresh to
zero and the resource tracker will never* refresh the report client's
provider cache. Philosophically, we're removing the "healing" aspect of
the resource tracker's periodic and trusting that placement won't
diverge from whatever's in our cache. (If it does, it's because the op
hit the CLI, in which case they should SIGHUP - see below.)

*except:
- When we initially create the compute node record and bootstrap its
resource provider.
- When the virt driver's update_provider_tree makes a change,
update_from_provider_tree reflects them in the cache as well as pushing
them back to placement.
- If update_from_provider_tree fails, the cache is cleared and gets
rebuilt on the next periodic.
- If you send SIGHUP to the compute process, the cache is cleared.

This should dramatically reduce the number of calls to placement from
the compute service. Like, to nearly zero, unless something is actually
changing.

Can I get some initial feedback as to whether this is worth polishing up
into something real? (It will probably need a bp/spec if so.)

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
[2]https://review.openstack.org/#/c/614886/

==========
Background
==========
In the Queens release, our friends at CERN noticed a serious spike in
the number of requests to placement from compute nodes, even in a
stable-state cloud. Given that we were in the process of adding a ton of
infrastructure to support sharing and nested providers, this was not
unexpected. Roughly, what was previously:

@periodic_task:
GET/resource_providers/$compute_uuid
GET/resource_providers/$compute_uuid/inventories

became more like:

@periodic_task:
# In Queens/Rocky, this would still just return the compute RP
GET /resource_providers?in_tree=$compute_uuid
# In Queens/Rocky, this would return nothing
GET /resource_providers?member_of=...&required=MISC_SHARES...
for each provider returned above: # i.e. just one in Q/R
GET/resource_providers/$compute_uuid/inventories
GET/resource_providers/$compute_uuid/traits
GET/resource_providers/$compute_uuid/aggregates

In a cloud the size of CERN's, the load wasn't acceptable. But at the
time, CERN worked around the problem by disabling refreshing entirely.
(The fact that this seems to have worked for them is an encouraging sign
for the proposed code change.)

We're not actually making use of most of that information, but it sets
the stage for things that we're working on in Stein and beyond, like
multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
etc., so removing/reducing the amount of information we look at isn't
really an option strategically.

A few random points from the long discussion that should probablyre-posed here for wider thought:

* There was probably a lot of discussion about why we needed to do thiscaching and stuff in the compute in the first place. What has changedthat we no longer need to aggressively refresh the cache on everyperiodic? I thought initially it came up because people really wantedthe compute to be fully self-healing to any external changes, includinghot plugging resources like disk on the host to automatically reflectthose changes in inventory. Similarly, external user/serviceinteractions with the placement API which would then be automaticallypicked up by the next periodic run - is that no longer a desire, and/orhow was the decision made previously that simply requiring a SIGHUP inthat case wasn't sufficient/desirable.

* I believe I made the point yesterday that we should probably notrefresh by default, and let operators opt-in to that behavior if theyreally need it, i.e. they are frequently making changes to theenvironment, potentially by some external service (I could think ofvCenter doing this to reflect changes from vCenter back intonova/placement), but I don't think that should be the assumed behaviorby most and our defaults should reflect the "normal" use case.

* I think I've noted a few times now that we don't actually use theprovider aggregates information (yet) in the compute service. Nova hostaggregate membership is mirror to placement since Rocky [1] but thathappens in the API, not the the compute. The only thing I can think ofthat relied on resource provider aggregate information in the compute isthe shared storage providers concept, but that's not supported (yet)[2]. So do we need to keep retrieving aggregate information when nothingin compute uses it yet?

* Similarly, why do we need to get traits on each periodic? The onlyin-tree virt driver I'm aware of that *reports* traits is the libvirtdriver for CPU features [3]. Otherwise I think the idea behind gettingthe latest traits is so the virt driver doesn't overwrite any traits setexternally on the compute node root resource provider. I think thatstill stands and is probably OK, even though we have generations nowwhich should keep us from overwriting if we don't have the latesttraits, but I wanted to bring it up since it's related to the "why do weneed provider aggregates in the compute?" question.

* Regardless of what we do, I think we should probably *at least* makethat refresh associations config allow 0 to disable it so CERN (andothers) can avoid the need to continually forward-porting code todisable it.

[1]https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-mirror-host-aggregates.html

[2] https://bugs.launchpad.net/nova/+bug/1784020

[3]https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/report-cpu-features-as-traits.html


--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Reply via email to