Rob/Jay, The use of the OSOps Working group and its repos is a great way to address this.. If any of you are coming to the Summit, please take a look at our Etherpad that we have created.[1] This could be a great discussion topic for the working sessions and we can brainstorm how we could help with this.
Joe [1] https://etherpad.openstack.org/p/AUS-ops-OSOps On Fri, Apr 22, 2016 at 4:02 PM, Robert Starmer <rob...@kumul.us> wrote: > Maybe a result of the discussion can be a set of models (let's not go so > far as to call them best pracices yet :) for how maintainance can be done > at scale, perhaps solidifying the descriptions Jay has above with the user > stories Tomi described in his initial note. This seems like an achievable > outcome from a working session, and the output even has a target, either > creating scripable workflows that could end up in the OSops repository, or > as user stories that can be mapped to the PM working group. > > R > > On Fri, Apr 22, 2016 at 12:47 PM, Jay Pipes <jaypi...@gmail.com> wrote: > >> On 04/14/2016 05:14 AM, Juvonen, Tomi (Nokia - FI/Espoo) wrote: >> <snip> >> >>> As admin I want to know when host is ready to actions to be done by admin >>> during the maintenance. Meaning physical resources are emptied. >>> >> >> You are equating "host maintenance mode" with the end result of a call to >> `nova host-evacuate-live`. The two are not the same. >> >> "host maintenance mode" typically just refers to taking a Nova compute >> node out of consideration for placing new workloads on that compute node. >> Putting a Nova compute node into host maintenance mode is as simple as >> calling `nova service-disable $hostname nova-compute`. >> >> Depending on what you need to perform on the compute node that is in host >> maintenance mode, you *may* want to migrate the workloads from that compute >> node to some other compute node that isn't in host maintenance mode. The >> `nova host-evacuate $hostname` and `nova host-evacuate-live $hostname` >> commands in the Nova CLI [1] can be used to migrate or live-migrate all >> workloads off the target compute node. >> >> Live migration will reduce the disruption that tenant workloads (data >> plane) experience during the workload migration. However, research at >> Mirantis has shown that libvirt/KVM/QEMU live migration performed against >> workloads with even a medium rate of memory page dirtying can easily never >> complete. Solutions like auto-converge and xbzrle compression have minimal >> effect on this, unfortunately. Pausing a workload manually is typically >> what is done to force the live migration to complete. >> >> [1] Note that these are commands in the Nova CLI tool >> (python-novaclient). Neither a host-evacuate nor a host-evacuate-live REST >> API call exists in the Compute API. This fact alone should suggest to folks >> that the appropriate place to put logic associated with performing host >> maintenance tasks should be *outside* of Nova entirely... >> >> As owner of a server I want to prepare for maintenance to minimize >>> downtime, >>> keep capacity on needed level and switch HA service to server not >>> affected by maintenance. >>> >> >> This isn't an appropriate use case, IMHO. HA control planes should, by >> their very nature, be established across various failure domains. The whole >> *point* of having an HA service is so that you don't need to "prepare" for >> some maintenance event (planned or unplanned). >> >> All HA control planes worth their salt will be able to notify some >> external listener of a partition in the cluster. This HA control plane is >> the responsibility of the tenant, not the infrastructure (i.e. Nova). I >> really do not want to add coupling between infrastructure control plane >> services and tenant control plane services. >> >> As owner of a server I want to know when my servers will be down because >>> of >>> host maintenance as it might be servers are not moved to another host. >>> >> >> See above. As an owner of a server involved in an HA cluster, it is *the >> server owner's* responsibility to set things up so that the cluster >> rebalances, handles redirected load, or does the custom thing that they >> want. This isn't, IMHO, the domain of the NVFi but rather a much >> higher-level NFVO orchestration layer. >> >> As owner of a server I want to know if host is to be totally removed, so >>> instead of keeping my servers on host during maintenance, I want to move >>> them to somewhere else. >>> >> >> This isn't something the owner of a server even knows about in a cloud >> environment. Owners of a server don't (and shouldn't) know which compute >> node they are, nor should they know that a host is having a planned or >> unplanned host maintenance event. >> >> The infrastructure owner (cloud deployer/operator) is responsible for >> doing the needful and performing a [live] migration of workloads off of a >> failing host or a host that is undergoing a cold upgrade. The tenant >> doesn't know anything about these things, and shouldn't. >> >> As owner of a server I want to send acknowledgement to be ready for host >>> maintenance and I want to state if servers are to be moved or kept on >>> host. >>> >> >> This is describing some virtual inventory management or CMDB >> functionality that isn't in scope for infrastructure services like Nova. >> Perhaps it's worth looking into how something like Remedy can manage your >> virtual inventory in this manner, but I don't see this being in the >> OpenStack realm really... >> >> FWIW, this is the same objection I had to Tacker joining the OpenStack >> Big Tent. It is essentially a monolithic, purpose-built-for-Telco >> application that orchestrates VNFs at layers way above the OpenStack >> deployment. >> >> Best, >> -jay >> >> Removal and creating of server is in owner's control already. Optionally >>> server >>> Configuration data could hold information about automatic actions to be >>> done >>> when host is going down unexpectedly or in controlled manner. Also >>> actions at >>> the same if down permanently or only temporarily. Still this needs >>> acknowledgement from server owner as he needs time for application level >>> controlled HA service switchover. >>> Br, >>> Tomi >>> >>> >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >>> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators