Re: [Pacemaker] Enable remote monitoring

Andrew Beekhof Tue, 11 Dec 2012 18:37:21 -0800

On Wed, Dec 12, 2012 at 4:53 AM, David Vossel <dvos...@redhat.com> wrote:
> ----- Original Message -----
>> From: "Yan Gao" <y...@suse.com>
>> To: pacemaker@oss.clusterlabs.org
>> Sent: Tuesday, December 11, 2012 1:23:03 AM
>> Subject: Re: [Pacemaker] Enable remote monitoring
>>
>> Hi,
>> Here's the latest code:
>> https://github.com/gao-yan/pacemaker/commit/4d58026c2171c42385c85162a0656c44b37fa7e8
>>
>>
>> Now:
>> - container-type:
>>   * black - ordering, colocating
>>   * white - ordering
>>   Both them are not probed so far.
>
> I think for the sake of this implementation we should ignore the whitebox use 
> case for now.  There are aspects of the whitebox use case that I'm just not 
> sure about yet, and I don't want to hold you all up trying to define that. I 
> don't mind re-approaching this container concept and expanding it to the 
> whitebox use case later on building with what you have here.  I'm in favor of 
> removing the "container-type" letting the blackbox use case be the default 
> for now, and I'll go in and do our whitebox bits later.  It feels like we are 
> at least headed in the right direction with all of this now.
>
>>
>> - on-fail defaults "restart-container" for most actions,
>>
>>   except for stop op (Not sure what it means if a stop fails. A
>>   nagios
>> daemon cannot be terminated? Should it always return success?) ,
>
> A nagios "stop" action should always return success.  The nagio's agent 
> doesn't even need a stop function, the lrmd can know to treat  a "stop" as a 
> (no-op for stop) + (cancel all recurring actions).


The lrmd shouldn't need to do this iirc.
The crmd will request all recurring ops be canceled before firing off
the stop action.

> In this case if the nagios agent doesn't stop successfully,  it is because of 
> an lrmd failure which should result in a fencing action i'd imagine.
>
>> still
>> defaults to "fence" for it for now.
>>
>> - Failures of resources count against container's
>> migration-threshold.
>
> What happens if someone wants to clear the container's failcount? Do we need 
> to add some logic to go in and clear all the child resource's failures as 
> well to make this happen correctly?
>
> -- Vossel
>
>> - Also support grouping container with its resources.
>>
>> Please help take a look, and correct me if I missed anything after
>> the
>> tons of discussions. :-)
>>
>> Regards,
>>   Gao,Yan
>> --
>> Gao,Yan <y...@suse.com>
>> Software Engineer
>> China Server Team, SUSE.
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enable remote monitoring

Reply via email to