On Thu, Oct 16, 2014 at 7:03 PM, Adam Lawson <[email protected]> wrote: > > Be forewarned; here's my two cents before I've had my morning coffee. > > It would seem to me that if we were seeking some level of resiliency against > host failures (if a host fails, evacuate the instances that were hosted on it > to a host that isn't broken), it would seem that host HA is a good approach. > The ultimate goal of course is instance HA but the task of monitoring > individual instances and determining what constitutes "down" seems like a > much more complex task than detecting when a compute node is down. I know > that requiring the presence of agents should probably need some more > brain-cycles since we can't expect additional bytes consuming memory on each > individual VM.
What Russell is suggesting, though, is actually a very feasible approach for compute node HA today and per-instance HA tomorrow. > Additionally, I'm not really hung up on the 'how' as we all realize there > several ways to skin that cat, so long as that 'how' is leveraged via tools > over which we have control and direct influence. Reason being, we may not > want to leverage features as important as this on tools that change outside > our control and subsequently shifts the foundation of the feature we > implemented that was based on how the product USED to work. Basically if > Pacemaker does what we need then cool but it seems that implementing a > feature should be built upon a bedrock of programs over which we have a > direct influence. That almost sounds a bit like "let's always build a better wheel, because control". I'm not sure if that's indeed the intention, but if it is then that seems like a bad idea to me. > This is why Nagios may be able to do it but it's a hack at best. I'm not > saying Nagios isn't good or ythe hack doesn't work but in the context of an > Openstack solution, we can't require a single external tool for a feature > like host or VM HA. Are we suggesting that we tell people who want HA - "go > use Nagios"? Call me a purist but if we're going to implement a feature, it > should be our community implementing it because we have some of the best > minds on staff. ; ) Anyone who thinks that having a monitoring solution to page people and then waking up a human to restart the service constitutes HA needs to be doused in a bucket of ice water. :) Cheers, Florian _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
