On Fri, May 25, 2018 at 12:19 AM, Matt Riedemann
> I've written a nova-manage placement heal_allocations CLI  which was a
> TODO from the PTG in Dublin as a step toward getting existing
> CachingScheduler users to roll off that (which is deprecated).
> During the CERN cells v1 upgrade talk it was pointed out that CERN was
> able to go from placement-per-cell to centralized placement in Ocata
> because the nova-computes in each cell would automatically recreate the
> allocations in Placement in a periodic task, but that code is gone once
> you're upgraded to Pike or later.
> In various other talks during the summit this week, we've talked about
> things during upgrades where, for instance, if placement is down for some
> reason during an upgrade, a user deletes an instance and the allocation
> doesn't get cleaned up from placement so it's going to continue counting
> against resource usage on that compute node even though the server instance
> in nova is gone. So this CLI could be expanded to help clean up situations
> like that, e.g. provide it a specific server ID and the CLI can figure out
> if it needs to clean things up in placement.
> So there are plenty of things we can build into this, but the patch is
> already quite large. I expect we'll also be backporting this to stable
> branches to help operators upgrade/fix allocation issues. It already has
> several things listed in a code comment inline about things to build into
> this later.
> My question is, is this good enough for a first iteration or is there
> something severely missing before we can merge this, like the automatic
> marker tracking mentioned in the code (that will probably be a non-trivial
> amount of code to add). I could really use some operator feedback on this
> to just take a look at what it already is capable of and if it's not going
> to be useful in this iteration, let me know what's missing and I can add
> that in to the patch.
>  https://review.openstack.org/#/c/565886/
It does sound for me a good way to help operators.
That said, given I'm now working on using Nested Resource Providers for
VGPU inventories, I wonder about a possible upgrade problem with VGPU
allocations. Given that :
- in Queens, VGPU inventories are for the root RP (ie. the compute node
- in Rocky, VGPU inventories will be for children RPs (ie. against a
specific VGPU type), then
if we have VGPU allocations in Queens, when upgrading to Rocky, we should
maybe recreate the allocations to a specific other inventory ?
Hope you see the problem with upgrading by creating nested RPs ?
> OpenStack-operators mailing list
OpenStack Development Mailing List (not for usage questions)