Re: [openstack-dev] [nova][placement] update_provider_tree design updates
> > ...then there's no way I can know ahead of time what all those might be. > (In particular, if I want to support new devices without updating my > code.) I.e. I *can't* write the corresponding > provider_tree.remove_trait(...) condition. Maybe that never becomes a > real problem because we'll never need to remove a dynamic trait. Or > maybe we can tolerate "leakage". Or maybe we do something > clever-but-ugly with namespacing (if > trait.startswith('CUSTOM_DEV_VENDORID_')...). We're consciously kicking > this can down the road. > > And note that this "dynamic" problem is likely to be a much larger > portion (possibly all) of the domain when we're talking about aggregates. > > Then there's ironic, which is currently set up to get its traits blindly > from Inspector. So Inspector not only needs to maintain the "owned > traits" list (with all the same difficulties as above), but it must also > either a) communicate that list to ironic virt so the latter can manage > the add/remove logic; or b) own the add/remove logic and communicate the > individual traits with a +/- on them so virt knows whether to add or > remove them. Just a nit, Ironic doesn't necessarily get its traits from inspector. Ironic gets them from *some* API client, which may be an operator, or inspector, or something else. Inspector is totally optional. Anyway, I'm inclined to kick this can down the road a bit, as you mention. I imagine that the ideal situation is for Ironic to remove traits from placement on the fly when they are removed in Ironic. Any other traits that nova-compute knows about (but Ironic doesn't), nova-compute can manage the removal the same way as another virt driver. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] update_provider_tree design updates
Excellent and astute questions, both of which came up in the discussion, but I neglected to mention. (I had to miss *something*, right?) See inline. On 03/15/2018 02:29 PM, Chris Dent wrote: > On Thu, 15 Mar 2018, Eric Fried wrote: > >> One of the takeaways from the Queens retrospective [1] was that we >> should be summarizing discussions that happen in person/hangout/IRC/etc. >> to the appropriate mailing list for the benefit of those who weren't >> present (or paying attention :P ). This is such a summary. > > Thank you _very_ much for doing this. I've got two questions within. > >> ...which we discussed earlier this week in IRC [4][5]. We concluded: >> >> - Compute is the source of truth for any and all traits it could ever >> assign, which will be a subset of what's in os-traits, plus whatever >> CUSTOM_ traits it stakes a claim to. If an outside agent sets a trait >> that's in that list, compute can legitimately remove it. If an outside >> agent removes a trait that's in that list, compute can reassert it. > > Where does that list come from? Or more directly how does Compute > stake the claim for "mine"? One piece of the list should come from the traits associated with the compute driver capabilities [2]. Likewise anything else in the future that's within compute but outside of virt. In other words, we're declaring that it doesn't make sense for an operator to e.g. set the "has_imagecache" trait on a compute if the compute doesn't do that itself. The message being that you can't turn on a capability by setting a trait. Beyond that, each virt driver is going to be responsible for figuring out its own list. Thinking this through with my PowerVM hat on, it won't actually be as hard as it initially sounded - though it will require more careful accounting. Essentially, the driver is going to ask the platform questions and get responses in its own language; then map those responses to trait names. So we'll be writing blocks like: if sys_caps.can_modify_io: provider_tree.add_trait(nodename, "CUSTOM_LIVE_RESIZE_CAPABLE") else: provider_tree.remove_trait(nodename, "CUSTOM_LIVE_RESIZE_CAPABLE") And, for some subset of the "owned" traits, we should be able to maintain a dict such that this works: for feature in trait_map.values(): if feature in sys_features: provider_tree.add_trait(nodename, trait_map[feature]) else: provider_tree.remove_trait(nodename, trait_map[feature]) BUT what about *dynamic* features? If I have code like (don't kill me): vendor_id_trait = 'CUSTOM_DEV_VENDORID_' + slugify(io_device.vendor_id) provider_tree.add_trait(io_dev_rp, vendor_id_trait) ...then there's no way I can know ahead of time what all those might be. (In particular, if I want to support new devices without updating my code.) I.e. I *can't* write the corresponding provider_tree.remove_trait(...) condition. Maybe that never becomes a real problem because we'll never need to remove a dynamic trait. Or maybe we can tolerate "leakage". Or maybe we do something clever-but-ugly with namespacing (if trait.startswith('CUSTOM_DEV_VENDORID_')...). We're consciously kicking this can down the road. And note that this "dynamic" problem is likely to be a much larger portion (possibly all) of the domain when we're talking about aggregates. Then there's ironic, which is currently set up to get its traits blindly from Inspector. So Inspector not only needs to maintain the "owned traits" list (with all the same difficulties as above), but it must also either a) communicate that list to ironic virt so the latter can manage the add/remove logic; or b) own the add/remove logic and communicate the individual traits with a +/- on them so virt knows whether to add or remove them. > How does an outside agent know what Compute has claimed? Presumably > they want to know that so they can avoid wastefully doing something > that's going to get clobbered? Yup [11]. It was deemed that we don't need an API/CLI to discover those lists (assuming that would even be possible). The reasoning was two-pronged: - We'll document that there are traits "owned" by nova and attempts to set/unset them will be frustrated. You can't find out which ones they are except when a manually-set/-unset trait magically dis-/re-appears. - It probably won't be an issue because outside agents will be setting traits based on some specific thing they want to do, and the documentation for that thing will specify traits that are known not to interfere with those in nova's wheelhouse. > [2] https://review.openstack.org/#/c/538498/ [11] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T16:26:29 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo
Re: [openstack-dev] [nova][placement] update_provider_tree design updates
On Thu, 15 Mar 2018, Eric Fried wrote: One of the takeaways from the Queens retrospective [1] was that we should be summarizing discussions that happen in person/hangout/IRC/etc. to the appropriate mailing list for the benefit of those who weren't present (or paying attention :P ). This is such a summary. Thank you _very_ much for doing this. I've got two questions within. ...which we discussed earlier this week in IRC [4][5]. We concluded: - Compute is the source of truth for any and all traits it could ever assign, which will be a subset of what's in os-traits, plus whatever CUSTOM_ traits it stakes a claim to. If an outside agent sets a trait that's in that list, compute can legitimately remove it. If an outside agent removes a trait that's in that list, compute can reassert it. Where does that list come from? Or more directly how does Compute stake the claim for "mine"? How does an outside agent know what Compute has claimed? Presumably they want to know that so they can avoid wastefully doing something that's going to get clobbered? -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent__ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][placement] update_provider_tree design updates
One of the takeaways from the Queens retrospective [1] was that we should be summarizing discussions that happen in person/hangout/IRC/etc. to the appropriate mailing list for the benefit of those who weren't present (or paying attention :P ). This is such a summary. As originally conceived, ComputeDriver.update_provider_tree was intended to be the sole source of truth for traits and aggregates on resource providers under its purview. Then came the idea of reflecting compute driver capabilities as traits [2], which would be done outside of update_provider_tree, but still within the bounds of nova compute. Then Friday discussions at the PTG [3] brought to light the fact that we need to honor traits set by outside agents (operators, other services like neutron, etc.), effectively merging those with whatever the virt driver sets. Concerns were raised about how to reconcile overlaps, and in particular how compute (via update_provider_tree or otherwise) can know if a trait is safe to *remove*. At the PTG, we agreed we need to do this, but deferred the details. ...which we discussed earlier this week in IRC [4][5]. We concluded: - Compute is the source of truth for any and all traits it could ever assign, which will be a subset of what's in os-traits, plus whatever CUSTOM_ traits it stakes a claim to. If an outside agent sets a trait that's in that list, compute can legitimately remove it. If an outside agent removes a trait that's in that list, compute can reassert it. - Anything outside of that list of compute-owned traits is fair game for outside agents to set/unset. Compute won't mess with those, ever. - Compute (and update_provider_tree) will therefore need to know what that list comprises. Furthermore, it must take care to use merging logic such that it only sets/unsets traits it "owns". - To facilitate this on the compute side, ProviderTree will get new methods to add/remove provider traits. (Technically, it could all be done via update_traits [6], which replaces the entire set of traits on a provider, but then every update_provider_tree implementation would have to write the same kind of merging logic.) - For operators, we'll need OSC affordance for setting/unsetting provider traits. And finally: - Everything above *also* applies to provider aggregates. NB: Here there be tygers. Unlike traits, the comprehensive list of which can conceivably be known a priori (even including CUSTOM_*s), aggregate UUIDs are by their nature unique and likely generated dynamically. Knowing that you "own" an aggregate UUID is relatively straightforward when you need to set it; but to know you can/must unset it, you need to have kept a record of having set it in the first place. A record that persists e.g. across compute service restarts. Can/should virt drivers write a file? If so, we better make sure it works across upgrades. And so on. Ugh. For the time being, we're kinda punting on this issue until it actually becomes a problem IRL. And now for the moment you've all been awaiting with bated breath: - Delta [7] to the update_provider_tree spec [8]. - Patch for ProviderTree methods to add/remove traits/aggregates [9]. - Patch modifying the update_provider_tree docstring, and adding devref content for update_provider_tree [10]. Please feel free to email or reach out in #openstack-nova if you have any questions. Thanks, efried [1] https://etherpad.openstack.org/p/nova-queens-retrospective (L122 as of this writing) [2] https://review.openstack.org/#/c/538498/ [3] https://etherpad.openstack.org/p/nova-ptg-rocky (L496-502 aotw) [4] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T16:02:08 [5] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T19:20:23 [6] https://github.com/openstack/nova/blob/5f38500df6a8e1665b968c3e98b804e0fdfefc63/nova/compute/provider_tree.py#L494 [7] https://review.openstack.org/552122 [8] http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/update-provider-tree.html [9] https://review.openstack.org/553475 [10] https://review.openstack.org/553476 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev