I don't think we should be flipping states for instances on a potentially downed compute. We definitely should not set an instance to ERROR. I think a time associated with the last power state check might be nice and be good enough.
- Chris > On Jun 24, 2014, at 5:17 PM, Joe Gordon <[email protected]> wrote: > > > > >> On Tue, Jun 24, 2014 at 5:12 PM, Joe Gordon <[email protected]> wrote: >> >> >> >>> On Tue, Jun 24, 2014 at 4:16 PM, Ahmed RAHAL <[email protected]> wrote: >>> Le 2014-06-24 17:38, Joe Gordon a écrit : >>>> >>>> On Jun 24, 2014 2:31 PM, "Russell Bryant" <[email protected] >>>> <mailto:[email protected]>> wrote: >>> >>>> > There be dragons here. Just because Nova doesn't see the node reporting >>>> > in, doesn't mean the VMs aren't actually still running. I think this >>>> > needs to be left to logic outside of Nova. >>>> > >>>> > For example, if your deployment monitoring really does think the host is >>>> > down, you want to make sure it's *completely* dead before taking further >>>> > action such as evacuating the host. You certainly don't want to risk >>>> > having the VM running on two different hosts. This is just a business I >>>> > don't think Nova should be getting in to. >>>> >>>> I agree nova shouldn't take any actions. But I don't think leaving an >>>> instance as 'active' is right either. I was thinking move instance to >>>> error state (maybe an unknown state would be more accurate) and let the >>>> user deal with it, versus just letting the user deal with everything. >>>> Since nova knows something *may* be wrong shouldn't we convey that to >>>> the user (I'm not 100% sure we should myself). >>> >>> I saw compute nodes going down, from a management perspective (say, >>> nova-compute disappeared), but VMs were just fine. Reporting on the state >>> may be misleading. The 'unknown' state would fit, but nothing lets us >>> presume the VMs are non-functional or impacted. >> >> nothing lets us presume the opposite as well. We don't know if the instance >> is still up. >> >>> >>> As far as an operator is concerned, a compute node not responding is a >>> reason enough to check the situation. >>> >>> To go further about other comments related to customer feedback, there are >>> many reasons a customer may think his VM is down, so showing him a 'useful >>> information' in some cases will only trigger more anxiety. >>> Besides people will start hammering the API to check 'state' instead of >>> using proper monitoring. >>> But, state is already reported if the customer shuts down a VM, so ... >>> >>> Currently, compute nodes state reporting is done by the nova-compute >>> process himself, reporting back with a time stamp to the database (through >>> conductor if I recall well). It's more like a watchdog than a reporting >>> system. >>> For VMs (assuming we find it useful) the same kind of process could occur: >>> nova-compute reporting back all states with time stamps for all VMs he >>> hosts. This shall then be optional, as I already sense scaling/performance >>> issues here (ceilometer anyone ?). >>> >>> Finally, assuming the customer had access to this 'unknown' state >>> information, what would he be able to do with it ? Usually he has no lever >>> to 'evacuate' or 'recover' the VM. All he could do is spawn another >>> instance to replace the lost one. But only if the VM really is currently >>> unavailable, an information he must get from other sources. >> >> If I was a user, and my instance went to an 'UNKNOWN' state, I would check >> if its still operating, and if not delete it and start another instance. > > The alternative is how things work today, if a nova-compute goes down we > don't change any instance states, and the user is responsible for making sure > there instance is still operating even if the instance is set to ACTIVE. > >> >>> >>> So, I see how the state reporting could be a useful information, but am not >>> sure that nova Status is the right place for it. >>> >>> Ahmed. in >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> [email protected] >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
