On Tue, Mar 15, 2016 at 4:04 AM Roman Prykhodchenko <m...@romcheg.me> wrote:
> Fuelers, > > I would like to continue the series of "Getting rid of …" emails. This > time I’d like to talk about statuses of clusters. > > The issues with that attribute is that it is not actually related to real > world very much and represents nothing. A few month ago I proposed to make > it more real-world-like [1] by replacing a simple string by an aggregated > value. However, after task based deployment was introduced even that > approach lost its connection to the real world. > > My idea is to get rid of that attribute from a cluster and start working > with status of every single node in it. Nevertheless, we only have tasks > that are executed on nodes now, so we cannot apply the "status" term to > them. What if we replace that with a sort of boolean value called > maintenance_mode (or similar) that we will use to tell if the node is > operational or not. After that we will be able to use an aggregated > property for cluster and check, if there are any nodes that are under a > progress of performing some tasks on them. > Yes, we still need an operations attribute, I'm not sure a bool is enough, but you are quite correct, setting the status of the cluster after operational == True based on the result of a specific node failing, is in practice invalid. At the same time, operational == True is not necessarily deployment succeeded, its more along the line of deployment validated, which may be further testing passing like ostf, or more manual in the operator wants to do more testing their own prior to changing the state. As we adventure in to the LCM flow, we actually need status of each component in addition of the general status of the cluster to determine the proper course of action the on the next operation. For example nova-compute if the cluster is not operational, then we can provision compute nodes, and have them enabled, or active in the scheduler automatically. However if the cluster is operational, a new compute node must be disabled, or otherwise blocked from the default scheduler until the node has received validation. In this case the interpretation of operational is quite simple For example ceph Here we care less about the status of the cluster (slightly, this example ignores ceph's impact on nova-compute), and more about the status of the service. In the case that we deploy ceph-osd's when their are not replica factor osd hosts online (3) the we can provision the OSD's similar to nova-compute, in that we can bring them all online and active and data could be placed to them immediately (more or less). but if the ceph status is operational, then we have to take a different action, the OSD's have to be brought in disabled, and gradually(probably by the operator) have their data weight increased so they don't clog the network with data peering which causes the clients may woes. > Thoughts, ideas? > > > References: > > 1. https://blueprints.launchpad.net/fuel/+spec/complex-cluster-status > > > - romcheg > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- -- Andrew Woodward Mirantis Fuel Community Ambassador Ceph Community
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev