Re: [openstack-dev] [nova][placement] update_provider_tree design updates

2018-03-16 Thread Jim Rollenhagen
>
> ...then there's no way I can know ahead of time what all those might be.
>  (In particular, if I want to support new devices without updating my
> code.)  I.e. I *can't* write the corresponding
> provider_tree.remove_trait(...) condition.  Maybe that never becomes a
> real problem because we'll never need to remove a dynamic trait.  Or
> maybe we can tolerate "leakage".  Or maybe we do something
> clever-but-ugly with namespacing (if
> trait.startswith('CUSTOM_DEV_VENDORID_')...).  We're consciously kicking
> this can down the road.
>
> And note that this "dynamic" problem is likely to be a much larger
> portion (possibly all) of the domain when we're talking about aggregates.
>
> Then there's ironic, which is currently set up to get its traits blindly
> from Inspector.  So Inspector not only needs to maintain the "owned
> traits" list (with all the same difficulties as above), but it must also
> either a) communicate that list to ironic virt so the latter can manage
> the add/remove logic; or b) own the add/remove logic and communicate the
> individual traits with a +/- on them so virt knows whether to add or
> remove them.


Just a nit, Ironic doesn't necessarily get its traits from inspector.
Ironic gets them from *some* API client, which may be an operator, or
inspector, or something else. Inspector is totally optional.

Anyway, I'm inclined to kick this can down the road a bit, as you mention.
I imagine that the ideal situation is for Ironic to remove traits from
placement
on the fly when they are removed in Ironic. Any other traits that
nova-compute
knows about (but Ironic doesn't), nova-compute can manage the removal
the same way as another virt driver.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] update_provider_tree design updates

2018-03-15 Thread Eric Fried
Excellent and astute questions, both of which came up in the discussion,
but I neglected to mention.  (I had to miss *something*, right?)

See inline.

On 03/15/2018 02:29 PM, Chris Dent wrote:
> On Thu, 15 Mar 2018, Eric Fried wrote:
> 
>> One of the takeaways from the Queens retrospective [1] was that we
>> should be summarizing discussions that happen in person/hangout/IRC/etc.
>> to the appropriate mailing list for the benefit of those who weren't
>> present (or paying attention :P ).  This is such a summary.
> 
> Thank you _very_ much for doing this. I've got two questions within.
> 
>> ...which we discussed earlier this week in IRC [4][5].  We concluded:
>>
>> - Compute is the source of truth for any and all traits it could ever
>> assign, which will be a subset of what's in os-traits, plus whatever
>> CUSTOM_ traits it stakes a claim to.  If an outside agent sets a trait
>> that's in that list, compute can legitimately remove it.  If an outside
>> agent removes a trait that's in that list, compute can reassert it.
> 
> Where does that list come from? Or more directly how does Compute
> stake the claim for "mine"?

One piece of the list should come from the traits associated with the
compute driver capabilities [2].  Likewise anything else in the future
that's within compute but outside of virt.  In other words, we're
declaring that it doesn't make sense for an operator to e.g. set the
"has_imagecache" trait on a compute if the compute doesn't do that
itself.  The message being that you can't turn on a capability by
setting a trait.

Beyond that, each virt driver is going to be responsible for figuring
out its own list.  Thinking this through with my PowerVM hat on, it
won't actually be as hard as it initially sounded - though it will
require more careful accounting.  Essentially, the driver is going to
ask the platform questions and get responses in its own language; then
map those responses to trait names.  So we'll be writing blocks like:

 if sys_caps.can_modify_io:
 provider_tree.add_trait(nodename, "CUSTOM_LIVE_RESIZE_CAPABLE")
 else:
 provider_tree.remove_trait(nodename, "CUSTOM_LIVE_RESIZE_CAPABLE")

And, for some subset of the "owned" traits, we should be able to
maintain a dict such that this works:

 for feature in trait_map.values():
 if feature in sys_features:
 provider_tree.add_trait(nodename, trait_map[feature])
 else:
 provider_tree.remove_trait(nodename, trait_map[feature])

BUT what about *dynamic* features?  If I have code like (don't kill me):

 vendor_id_trait = 'CUSTOM_DEV_VENDORID_' + slugify(io_device.vendor_id)
 provider_tree.add_trait(io_dev_rp, vendor_id_trait)

...then there's no way I can know ahead of time what all those might be.
 (In particular, if I want to support new devices without updating my
code.)  I.e. I *can't* write the corresponding
provider_tree.remove_trait(...) condition.  Maybe that never becomes a
real problem because we'll never need to remove a dynamic trait.  Or
maybe we can tolerate "leakage".  Or maybe we do something
clever-but-ugly with namespacing (if
trait.startswith('CUSTOM_DEV_VENDORID_')...).  We're consciously kicking
this can down the road.

And note that this "dynamic" problem is likely to be a much larger
portion (possibly all) of the domain when we're talking about aggregates.

Then there's ironic, which is currently set up to get its traits blindly
from Inspector.  So Inspector not only needs to maintain the "owned
traits" list (with all the same difficulties as above), but it must also
either a) communicate that list to ironic virt so the latter can manage
the add/remove logic; or b) own the add/remove logic and communicate the
individual traits with a +/- on them so virt knows whether to add or
remove them.

> How does an outside agent know what Compute has claimed? Presumably
> they want to know that so they can avoid wastefully doing something
> that's going to get clobbered?

Yup [11].  It was deemed that we don't need an API/CLI to discover those
lists (assuming that would even be possible).  The reasoning was
two-pronged:
- We'll document that there are traits "owned" by nova and attempts to
set/unset them will be frustrated.  You can't find out which ones they
are except when a manually-set/-unset trait magically dis-/re-appears.
- It probably won't be an issue because outside agents will be setting
traits based on some specific thing they want to do, and the
documentation for that thing will specify traits that are known not to
interfere with those in nova's wheelhouse.

> [2] https://review.openstack.org/#/c/538498/
[11]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T16:26:29

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo

Re: [openstack-dev] [nova][placement] update_provider_tree design updates

2018-03-15 Thread Chris Dent

On Thu, 15 Mar 2018, Eric Fried wrote:


One of the takeaways from the Queens retrospective [1] was that we
should be summarizing discussions that happen in person/hangout/IRC/etc.
to the appropriate mailing list for the benefit of those who weren't
present (or paying attention :P ).  This is such a summary.


Thank you _very_ much for doing this. I've got two questions within.


...which we discussed earlier this week in IRC [4][5].  We concluded:

- Compute is the source of truth for any and all traits it could ever
assign, which will be a subset of what's in os-traits, plus whatever
CUSTOM_ traits it stakes a claim to.  If an outside agent sets a trait
that's in that list, compute can legitimately remove it.  If an outside
agent removes a trait that's in that list, compute can reassert it.


Where does that list come from? Or more directly how does Compute
stake the claim for "mine"?

How does an outside agent know what Compute has claimed? Presumably
they want to know that so they can avoid wastefully doing something
that's going to get clobbered?

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] update_provider_tree design updates

2018-03-15 Thread Eric Fried
One of the takeaways from the Queens retrospective [1] was that we
should be summarizing discussions that happen in person/hangout/IRC/etc.
to the appropriate mailing list for the benefit of those who weren't
present (or paying attention :P ).  This is such a summary.

As originally conceived, ComputeDriver.update_provider_tree was intended
to be the sole source of truth for traits and aggregates on resource
providers under its purview.

Then came the idea of reflecting compute driver capabilities as traits
[2], which would be done outside of update_provider_tree, but still
within the bounds of nova compute.

Then Friday discussions at the PTG [3] brought to light the fact that we
need to honor traits set by outside agents (operators, other services
like neutron, etc.), effectively merging those with whatever the virt
driver sets.  Concerns were raised about how to reconcile overlaps, and
in particular how compute (via update_provider_tree or otherwise) can
know if a trait is safe to *remove*.  At the PTG, we agreed we need to
do this, but deferred the details.

...which we discussed earlier this week in IRC [4][5].  We concluded:

- Compute is the source of truth for any and all traits it could ever
assign, which will be a subset of what's in os-traits, plus whatever
CUSTOM_ traits it stakes a claim to.  If an outside agent sets a trait
that's in that list, compute can legitimately remove it.  If an outside
agent removes a trait that's in that list, compute can reassert it.
- Anything outside of that list of compute-owned traits is fair game for
outside agents to set/unset.  Compute won't mess with those, ever.
- Compute (and update_provider_tree) will therefore need to know what
that list comprises.  Furthermore, it must take care to use merging
logic such that it only sets/unsets traits it "owns".
- To facilitate this on the compute side, ProviderTree will get new
methods to add/remove provider traits.  (Technically, it could all be
done via update_traits [6], which replaces the entire set of traits on a
provider, but then every update_provider_tree implementation would have
to write the same kind of merging logic.)
- For operators, we'll need OSC affordance for setting/unsetting
provider traits.

And finally:
- Everything above *also* applies to provider aggregates.  NB: Here
there be tygers.  Unlike traits, the comprehensive list of which can
conceivably be known a priori (even including CUSTOM_*s), aggregate
UUIDs are by their nature unique and likely generated dynamically.
Knowing that you "own" an aggregate UUID is relatively straightforward
when you need to set it; but to know you can/must unset it, you need to
have kept a record of having set it in the first place.  A record that
persists e.g. across compute service restarts.  Can/should virt drivers
write a file?  If so, we better make sure it works across upgrades.  And
so on.  Ugh.  For the time being, we're kinda punting on this issue
until it actually becomes a problem IRL.

And now for the moment you've all been awaiting with bated breath:
- Delta [7] to the update_provider_tree spec [8].
- Patch for ProviderTree methods to add/remove traits/aggregates [9].
- Patch modifying the update_provider_tree docstring, and adding devref
content for update_provider_tree [10].

Please feel free to email or reach out in #openstack-nova if you have
any questions.

Thanks,
efried

[1] https://etherpad.openstack.org/p/nova-queens-retrospective (L122 as
of this writing)
[2] https://review.openstack.org/#/c/538498/
[3] https://etherpad.openstack.org/p/nova-ptg-rocky (L496-502 aotw)
[4]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T16:02:08
[5]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-12.log.html#t2018-03-12T19:20:23
[6]
https://github.com/openstack/nova/blob/5f38500df6a8e1665b968c3e98b804e0fdfefc63/nova/compute/provider_tree.py#L494
[7] https://review.openstack.org/552122
[8]
http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/update-provider-tree.html
[9] https://review.openstack.org/553475
[10] https://review.openstack.org/553476


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev