Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Jay Pipes Sun, 04 Nov 2018 04:02:33 -0800

On 11/02/2018 03:22 PM, Eric Fried wrote:

All-

Based on a (long) discussion yesterday [1] I have put up a patch [2]
whereby you can set [compute]resource_provider_association_refresh to
zero and the resource tracker will never* refresh the report client's
provider cache. Philosophically, we're removing the "healing" aspect of
the resource tracker's periodic and trusting that placement won't
diverge from whatever's in our cache. (If it does, it's because the op
hit the CLI, in which case they should SIGHUP - see below.)

*except:
- When we initially create the compute node record and bootstrap its
resource provider.
- When the virt driver's update_provider_tree makes a change,
update_from_provider_tree reflects them in the cache as well as pushing
them back to placement.
- If update_from_provider_tree fails, the cache is cleared and gets
rebuilt on the next periodic.
- If you send SIGHUP to the compute process, the cache is cleared.

This should dramatically reduce the number of calls to placement from
the compute service. Like, to nearly zero, unless something is actually
changing.

Can I get some initial feedback as to whether this is worth polishing up
into something real? (It will probably need a bp/spec if so.)

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
[2] https://review.openstack.org/#/c/614886/

==========
Background
==========
In the Queens release, our friends at CERN noticed a serious spike in
the number of requests to placement from compute nodes, even in a
stable-state cloud. Given that we were in the process of adding a ton of
infrastructure to support sharing and nested providers, this was not
unexpected. Roughly, what was previously:

@periodic_task:
GET /resource_providers/$compute_uuid
GET /resource_providers/$compute_uuid/inventories

became more like:

@periodic_task:
# In Queens/Rocky, this would still just return the compute RP
GET /resource_providers?in_tree=$compute_uuid
# In Queens/Rocky, this would return nothing
GET /resource_providers?member_of=...&required=MISC_SHARES...
for each provider returned above: # i.e. just one in Q/R
GET /resource_providers/$compute_uuid/inventories
GET /resource_providers/$compute_uuid/traits
GET /resource_providers/$compute_uuid/aggregates

In a cloud the size of CERN's, the load wasn't acceptable. But at the
time, CERN worked around the problem by disabling refreshing entirely.
(The fact that this seems to have worked for them is an encouraging sign
for the proposed code change.)

We're not actually making use of most of that information, but it sets
the stage for things that we're working on in Stein and beyond, like
multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
etc., so removing/reducing the amount of information we look at isn't
really an option strategically.

I support your idea of getting rid of the periodic refresh of the cachein the scheduler report client. Much of that was added in order toemulate the original way the resource tracker worked.

Most of the behaviour in the original resource tracker (and some of thecode still in there for dealing with (surprise!) PCI passthrough devicesand NUMA topology) was due to doing allocations on the compute node (thewhole claims stuff). We needed to always be syncing the state of thecompute_nodes and pci_devices table in the cell database with whateverusage information was being created/modified on the compute nodes [0].

All of the "healing" code that's in the resource tracker was basicallyto deal with "soft delete", migrations that didn't complete or workproperly, and, again, to handle allocations becoming out-of-sync becausethe compute nodes were responsible for allocating (as opposed to thecurrent situation we have where the placement service -- via thescheduler's call to claim_resources() -- is responsible for allocatingresources [1]).

Now that we have generation markers protecting both providers andconsumers, we can rely on those generations to signal to the schedulerreport client that it needs to pull fresh information about a provideror consumer. So, there's really no need to automatically and blindlyrefresh any more.


Best,
-jay

[0] We always need to be syncing those tables because those tables,unlike the placement database's data modeling, couple both inventory ANDusage in the same table structure...

[1] again, except for PCI devices and NUMA topology, because of thetight coupling in place with the different resource trackers those typesof resources use...



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Reply via email to