On Thu, Oct 16, 2014 at 7:03 PM, Adam Lawson <alaw...@aqorn.com> wrote:
> Be forewarned; here's my two cents before I've had my morning coffee.
> It would seem to me that if we were seeking some level of resiliency against 
> host failures (if a host fails, evacuate the instances that were hosted on it 
> to a host that isn't broken), it would seem that host HA is a good approach. 
> The ultimate goal of course is instance HA but the task of monitoring 
> individual instances and determining what constitutes "down" seems like a 
> much more complex task than detecting when a compute node is down. I know 
> that requiring the presence of agents should probably need some more 
> brain-cycles since we can't expect additional bytes consuming memory on each 
> individual VM.

What Russell is suggesting, though, is actually a very feasible
approach for compute node HA today and per-instance HA tomorrow.

> Additionally, I'm not really hung up on the 'how' as we all realize there 
> several ways to skin that cat, so long as that 'how' is leveraged via tools 
> over which we have control and direct influence. Reason being, we may not 
> want to leverage features as important as this on tools that change outside 
> our control and subsequently shifts the foundation of the feature we 
> implemented that was based on how the product USED to work. Basically if 
> Pacemaker does what we need then cool but it seems that implementing a 
> feature should be built upon a bedrock of programs over which we have a 
> direct influence.

That almost sounds a bit like "let's always build a better wheel,
because control". I'm not sure if that's indeed the intention, but if
it is then that seems like a bad idea to me.

> This is why Nagios may be able to do it but it's a hack at best. I'm not 
> saying Nagios isn't good or ythe hack doesn't work but in the context of an 
> Openstack solution, we can't require a single external tool for a feature 
> like host or VM HA. Are we suggesting that we tell people who want HA - "go 
> use Nagios"? Call me a purist but if we're going to implement a feature, it 
> should be our community implementing it because we have some of the best 
> minds on staff. ; )

Anyone who thinks that having a monitoring solution to page people and
then waking up a human to restart the service constitutes HA needs to
be doused in a bucket of ice water. :)


OpenStack-dev mailing list

Reply via email to