On Wed, Dec 12, 2012 at 4:53 AM, David Vossel <dvos...@redhat.com> wrote: > ----- Original Message ----- >> From: "Yan Gao" <y...@suse.com> >> To: pacemaker@oss.clusterlabs.org >> Sent: Tuesday, December 11, 2012 1:23:03 AM >> Subject: Re: [Pacemaker] Enable remote monitoring >> >> Hi, >> Here's the latest code: >> https://github.com/gao-yan/pacemaker/commit/4d58026c2171c42385c85162a0656c44b37fa7e8 >> >> >> Now: >> - container-type: >> * black - ordering, colocating >> * white - ordering >> Both them are not probed so far. > > I think for the sake of this implementation we should ignore the whitebox use > case for now. There are aspects of the whitebox use case that I'm just not > sure about yet, and I don't want to hold you all up trying to define that. I > don't mind re-approaching this container concept and expanding it to the > whitebox use case later on building with what you have here. I'm in favor of > removing the "container-type" letting the blackbox use case be the default > for now, and I'll go in and do our whitebox bits later. It feels like we are > at least headed in the right direction with all of this now. > >> >> - on-fail defaults "restart-container" for most actions, >> >> except for stop op (Not sure what it means if a stop fails. A >> nagios >> daemon cannot be terminated? Should it always return success?) , > > A nagios "stop" action should always return success. The nagio's agent > doesn't even need a stop function, the lrmd can know to treat a "stop" as a > (no-op for stop) + (cancel all recurring actions).
The lrmd shouldn't need to do this iirc. The crmd will request all recurring ops be canceled before firing off the stop action. > In this case if the nagios agent doesn't stop successfully, it is because of > an lrmd failure which should result in a fencing action i'd imagine. > >> still >> defaults to "fence" for it for now. >> >> - Failures of resources count against container's >> migration-threshold. > > What happens if someone wants to clear the container's failcount? Do we need > to add some logic to go in and clear all the child resource's failures as > well to make this happen correctly? > > -- Vossel > >> - Also support grouping container with its resources. >> >> Please help take a look, and correct me if I missed anything after >> the >> tons of discussions. :-) >> >> Regards, >> Gao,Yan >> -- >> Gao,Yan <y...@suse.com> >> Software Engineer >> China Server Team, SUSE. >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org