Re: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI
On 5/28/2018 7:31 AM, Sylvain Bauza wrote: That said, given I'm now working on using Nested Resource Providers for VGPU inventories, I wonder about a possible upgrade problem with VGPU allocations. Given that : - in Queens, VGPU inventories are for the root RP (ie. the compute node RP), but, - in Rocky, VGPU inventories will be for children RPs (ie. against a specific VGPU type), then if we have VGPU allocations in Queens, when upgrading to Rocky, we should maybe recreate the allocations to a specific other inventory ? For how the heal_allocations CLI works today, if the instance has any allocations in placement, it skips that instance. So this scenario wouldn't be a problem. Hope you see the problem with upgrading by creating nested RPs ? Yes, the CLI doesn't attempt to have any knowledge about nested resource providers, it just takes the flavor embedded in the instance and creates allocations against the compute node provider using the flavor. It has no explicit knowledge about granular request groups or more advanced features like that. -- Thanks, Matt ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI
On Fri, May 25, 2018 at 12:19 AM, Matt Riedemannwrote: > I've written a nova-manage placement heal_allocations CLI [1] which was a > TODO from the PTG in Dublin as a step toward getting existing > CachingScheduler users to roll off that (which is deprecated). > > During the CERN cells v1 upgrade talk it was pointed out that CERN was > able to go from placement-per-cell to centralized placement in Ocata > because the nova-computes in each cell would automatically recreate the > allocations in Placement in a periodic task, but that code is gone once > you're upgraded to Pike or later. > > In various other talks during the summit this week, we've talked about > things during upgrades where, for instance, if placement is down for some > reason during an upgrade, a user deletes an instance and the allocation > doesn't get cleaned up from placement so it's going to continue counting > against resource usage on that compute node even though the server instance > in nova is gone. So this CLI could be expanded to help clean up situations > like that, e.g. provide it a specific server ID and the CLI can figure out > if it needs to clean things up in placement. > > So there are plenty of things we can build into this, but the patch is > already quite large. I expect we'll also be backporting this to stable > branches to help operators upgrade/fix allocation issues. It already has > several things listed in a code comment inline about things to build into > this later. > > My question is, is this good enough for a first iteration or is there > something severely missing before we can merge this, like the automatic > marker tracking mentioned in the code (that will probably be a non-trivial > amount of code to add). I could really use some operator feedback on this > to just take a look at what it already is capable of and if it's not going > to be useful in this iteration, let me know what's missing and I can add > that in to the patch. > > [1] https://review.openstack.org/#/c/565886/ > > It does sound for me a good way to help operators. That said, given I'm now working on using Nested Resource Providers for VGPU inventories, I wonder about a possible upgrade problem with VGPU allocations. Given that : - in Queens, VGPU inventories are for the root RP (ie. the compute node RP), but, - in Rocky, VGPU inventories will be for children RPs (ie. against a specific VGPU type), then if we have VGPU allocations in Queens, when upgrading to Rocky, we should maybe recreate the allocations to a specific other inventory ? Hope you see the problem with upgrading by creating nested RPs ? > -- > > Thanks, > > Matt > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI
I've written a nova-manage placement heal_allocations CLI [1] which was a TODO from the PTG in Dublin as a step toward getting existing CachingScheduler users to roll off that (which is deprecated). During the CERN cells v1 upgrade talk it was pointed out that CERN was able to go from placement-per-cell to centralized placement in Ocata because the nova-computes in each cell would automatically recreate the allocations in Placement in a periodic task, but that code is gone once you're upgraded to Pike or later. In various other talks during the summit this week, we've talked about things during upgrades where, for instance, if placement is down for some reason during an upgrade, a user deletes an instance and the allocation doesn't get cleaned up from placement so it's going to continue counting against resource usage on that compute node even though the server instance in nova is gone. So this CLI could be expanded to help clean up situations like that, e.g. provide it a specific server ID and the CLI can figure out if it needs to clean things up in placement. So there are plenty of things we can build into this, but the patch is already quite large. I expect we'll also be backporting this to stable branches to help operators upgrade/fix allocation issues. It already has several things listed in a code comment inline about things to build into this later. My question is, is this good enough for a first iteration or is there something severely missing before we can merge this, like the automatic marker tracking mentioned in the code (that will probably be a non-trivial amount of code to add). I could really use some operator feedback on this to just take a look at what it already is capable of and if it's not going to be useful in this iteration, let me know what's missing and I can add that in to the patch. [1] https://review.openstack.org/#/c/565886/ -- Thanks, Matt ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators