----- Original Message -----
> From: "Yan Gao" <y...@suse.com>
> To: pacemaker@oss.clusterlabs.org
> Sent: Monday, January 21, 2013 11:28:40 PM
> Subject: Re: [Pacemaker] Enable remote monitoring
> 
> Hi,
> Here's the code for supporting nagios plugins in lrmd:
> 
> https://github.com/gao-yan/pacemaker/commits/nagios
> 
> A new resource class "nagios" is introduced.
> 
> Actions:
> 
> - probe: A resource defined for a resource container is not probed.
> (We
> can also add a condition in pengine to just avoid probing a nagios
> class
> resource.)

Yeah, I think the pengine should know to never probe a nagios script regardless 
if it is involved in a container or not.

> - start: Invokes the nagios plugin with specified parameters (Maps
> the
> instance attributes to the long options of the nagios plugin). If it
> returns non-OK, re-invokes it after some delay (delay = start_timeout
> /
> 10),  until it returns OK or exceeds the start timeout.

I made a comment about this on the patch.  Shouldn't the cmd->timeout value be 
updated each time it is re-scheduled to account for time already spent?

> 
> - monitor: Recurring invocation to the nagios plugin with specified
> parameters.
> 
> - stop: Nothing special is done. The recurring monitor is canceled
> anyway.
> 
> - metadata: Reads the corresponding metadata from a xml file in
> NAGIOS_METADATA_DIR.
> 
> (As we know nagios plugins don't support metadata. The current plan
> is
> to generate the corresponding metadata according to the help of the
> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan
> already
> has progress on this. Thank, Dejan!)
> 
> 
> For nagios plugins, the exit code are:
> 
> STATE_OK        = 0,
> STATE_WARNING   = 1,
> STATE_CRITICAL  = 2,
> STATE_UNKNOWN   = 3,
> STATE_DEPENDENT = 4,
> 
> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should
> all
> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no
> code
> to express "NOT_RUNNING" in nagios plugins. I think it should be
>  fine,
> since there's no probe.
> 
> Any suggestions are appreciated!

This mostly looks like what I expected.  I'm letting the whole re-scheduling of 
the start operation roll around in my head a bit.  It almost seems like that 
functionality belongs in the service library...  retry executing this action 
until either the timeout is hit or some target return code is encountered.  Any 
thoughts on that?

-- Vossel

> Thanks,
>   Gao,Yan
> 
> --
> Gao,Yan <y...@suse.com>
> Software Engineer
> China Server Team, SUSE.
> 
>   * English - detected
>   * English
>   * Chinese (Simplified)
> 
>   * English
>   * Chinese (Simplified)
> 
>  <javascript:void(0);> <#>
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to