On Thu, Dec 6, 2012 at 9:24 PM, Lars Marowsky-Bree <l...@suse.com> wrote: > On 2012-12-06T20:04:20, Andrew Beekhof <and...@beekhof.net> wrote: > >> >> Does that make sense though? >> >> You've not achieved anything a restart wouldn't have done. >> >> The choice to move the VM should be up to the VM. >> > If the fail-count of a nagios resource reaches its own >> > migration-threshold, the colocated VM should migrate with it anyway, >> > shouldn't it? >> >> But moving a nagios resource makes no sense. > > Exactly; we would want to move the container/parent. > >> Because its running inside the guest, which would have already moved >> if it was the right thing to do. > > No, that's not a given. The VM might be "healthy" (as in, the kernel is > running), but a service being monitored within it may not have > sufficient resources/CPU/IO/network or even connectivity problems on a > given host, to the point where trying to restart it on another > hypervisor makes sense.
But any failures of the nagios agents would count against the VM's migration-threshold. So if moving were the right thing to do, it would have done it already. > > But migration-threshold on the nagios primitive combined with a > mandatory colocation constraint will take care of that already, if an > admin wants to configure such. > > I agree that, for the most part, people will not do that but keep > restarting VMs. > >> > I like the concept of "failure-delegate". If we introduce it, it sounds >> > more like a resource's meta/op attribute to me, rather than into order >> > constraint or group. What do you think? >> Yes. It would be a resource meta attribute. > > Hmmm. OK, I think I see where this is going. > > We already have on-fail settings. How would these play together? Good question. My initial thought was that it would be up to on-fail settings in the VM. > Would it even make sense to have on-fail="restart-container"? (Or a > nicer wording.) > > Hmmm. That might work. We allow a "container" to be specified as a meta > attribute. > > If set, on-fail would default to restart container for most actions. But > admins could actually modify it - say, they might want to set > monitor on-fail="ignore" to just get notified. And when we move forward > to whiteboxes, we could have start/monitor/promote/demote > on-fail="restart" (like now) and stop on-fail="restart-container". > > That appears reasonably neat? It does actually. I wasn't originally thinking it was necessary but it makes sense now that you point it out. > > > > Regards, > Lars > > -- > Architect Storage/HA > SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, > HRB 21284 (AG Nürnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org