Sylvain- On 05/31/2018 02:41 PM, Sylvain Bauza wrote: > > > On Thu, May 31, 2018 at 8:26 PM, Eric Fried <openst...@fried.cc > <mailto:openst...@fried.cc>> wrote: > > > 1. Make everything perform the pivot on compute node start (which can be > > re-used by a CLI tool for the offline case) > > 2. Make everything default to non-nested inventory at first, and provide > > a way to migrate a compute node and its instances one at a time (in > > place) to roll through. > > I agree that it sure would be nice to do ^ rather than requiring the > "slide puzzle" thing. > > But how would this be accomplished, in light of the current "separation > of responsibilities" drawn at the virt driver interface, whereby the > virt driver isn't supposed to talk to placement directly, or know > anything about allocations? Here's a first pass: > > > > What we usually do is to implement either at the compute service level > or at the virt driver level some init_host() method that will reconcile > what you want. > For example, we could just imagine a non-virt specific method (and I > like that because it's non-virt specific) - ie. called by compute's > init_host() that would lookup the compute root RP inventories, see > whether one ore more inventories tied to specific resource classes have > to be moved from the root RP and be attached to a child RP. > The only subtility that would require a virt-specific update would be > the name of the child RP (as both Xen and libvirt plan to use the child > RP name as the vGPU type identifier) but that's an implementation detail > that a possible virt driver update by the resource tracker would > reconcile that.
The question was rhetorical; my suggestion (below) was an attempt at designing exactly what you've described. Let me know if I can explain/clarify it further. I'm looking for feedback as to whether it's a viable approach. > The virt driver, via the return value from update_provider_tree, tells > the resource tracker that "inventory of resource class A on provider B > have moved to provider C" for all applicable AxBxC. E.g. > > [ { 'from_resource_provider': <cn_rp_uuid>, > 'moved_resources': [VGPU: 4], > 'to_resource_provider': <gpu_rp1_uuid> > }, > { 'from_resource_provider': <cn_rp_uuid>, > 'moved_resources': [VGPU: 4], > 'to_resource_provider': <gpu_rp2_uuid> > }, > { 'from_resource_provider': <cn_rp_uuid>, > 'moved_resources': [ > SRIOV_NET_VF: 2, > NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000, > NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000, > ], > 'to_resource_provider': <gpu_rp2_uuid> > } > ] > > As today, the resource tracker takes the updated provider tree and > invokes [1] the report client method update_from_provider_tree [2] to > flush the changes to placement. But now update_from_provider_tree also > accepts the return value from update_provider_tree and, for each "move": > > - Creates provider C (as described in the provider_tree) if it doesn't > already exist. > - Creates/updates provider C's inventory as described in the > provider_tree (without yet updating provider B's inventory). This ought > to create the inventory of resource class A on provider C. > - Discovers allocations of rc A on rp B and POSTs to move them to rp C*. > - Updates provider B's inventory. > > (*There's a hole here: if we're splitting a glommed-together inventory > across multiple new child providers, as the VGPUs in the example, we > don't know which allocations to put where. The virt driver should know > which instances own which specific inventory units, and would be able to > report that info within the data structure. That's getting kinda close > to the virt driver mucking with allocations, but maybe it fits well > enough into this model to be acceptable?) > > Note that the return value from update_provider_tree is optional, and > only used when the virt driver is indicating a "move" of this ilk. If > it's None/[] then the RT/update_from_provider_tree flow is the same as > it is today. > > If we can do it this way, we don't need a migration tool. In fact, we > don't even need to restrict provider tree "reshaping" to release > boundaries. As long as the virt driver understands its own data model > migrations and reports them properly via update_provider_tree, it can > shuffle its tree around whenever it wants. > > Thoughts? > > -efried > > [1] > > https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/compute/resource_tracker.py#L890 > > <https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/compute/resource_tracker.py#L890> > [2] > > https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/scheduler/client/report.py#L1341 > > <https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/scheduler/client/report.py#L1341> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev> > > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev