----- Original Message ----- > From: "Andrew Beekhof" <and...@beekhof.net> > To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Sent: Tuesday, February 5, 2013 2:29:11 AM > Subject: Re: [Pacemaker] Enable remote monitoring > > On Fri, Feb 1, 2013 at 3:37 PM, Gao,Yan <y...@suse.com> wrote: > > Hi Andrew, > > > > On 01/31/13 14:35, Andrew Beekhof wrote: > >> > >> On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> > >> wrote: > >> > >>> > >>> > >>> ----- Original Message ----- > >>>> From: "Yan Gao" <y...@suse.com> > >>>> To: pacemaker@oss.clusterlabs.org > >>>> Sent: Monday, January 21, 2013 11:28:40 PM > >>>> Subject: Re: [Pacemaker] Enable remote monitoring > >>>> > >>>> Hi, > >>>> Here's the code for supporting nagios plugins in lrmd: > >>>> > >>>> https://github.com/gao-yan/pacemaker/commits/nagios > >>>> > >>>> A new resource class "nagios" is introduced. > >>>> > >>>> Actions: > >>>> > >>>> - probe: A resource defined for a resource container is not > >>>> probed. > >>>> (We > >>>> can also add a condition in pengine to just avoid probing a > >>>> nagios > >>>> class > >>>> resource.) > >>> > >>> Yeah, I think the pengine should know to never probe a nagios > >>> script regardless if it is involved in a container or not. > >>> > >>>> - start: Invokes the nagios plugin with specified parameters > >>>> (Maps > >>>> the > >>>> instance attributes to the long options of the nagios plugin). > >>>> If it > >>>> returns non-OK, re-invokes it after some delay (delay = > >>>> start_timeout > >>>> / > >>>> 10), until it returns OK or exceeds the start timeout. > >>> > >>> I made a comment about this on the patch. Shouldn't the > >>> cmd->timeout value be updated each time it is re-scheduled to > >>> account for time already spent? > >>> > >>>> > >>>> - monitor: Recurring invocation to the nagios plugin with > >>>> specified > >>>> parameters. > >>>> > >>>> - stop: Nothing special is done. The recurring monitor is > >>>> canceled > >>>> anyway. > >>>> > >>>> - metadata: Reads the corresponding metadata from a xml file in > >>>> NAGIOS_METADATA_DIR. > >>>> > >>>> (As we know nagios plugins don't support metadata. The current > >>>> plan > >>>> is > >>>> to generate the corresponding metadata according to the help of > >>>> the > >>>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan > >>>> already > >>>> has progress on this. Thank, Dejan!) > >>>> > >>>> > >>>> For nagios plugins, the exit code are: > >>>> > >>>> STATE_OK = 0, > >>>> STATE_WARNING = 1, > >>>> STATE_CRITICAL = 2, > >>>> STATE_UNKNOWN = 3, > >>>> STATE_DEPENDENT = 4, > >>>> > >>>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others > >>>> should > >>>> all > >>>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's > >>>> no > >>>> code > >>>> to express "NOT_RUNNING" in nagios plugins. I think it should be > >>>> fine, > >>>> since there's no probe. > >>>> > >>>> Any suggestions are appreciated! > >>> > >>> This mostly looks like what I expected. I'm letting the whole > >>> re-scheduling of the start operation roll around in my head a > >>> bit. It almost seems like that functionality belongs in the > >>> service library... retry executing this action until either the > >>> timeout is hit or some target return code is encountered. Any > >>> thoughts on that? > >> > >> Who the what now? > >> Why do start ops need to be rescheduled? > > It's very likely that the "start" of the container returns before > > the > > services inside are started. Abusing start-delay is not preferred. > > The > > idea is, in the start operation of the nagios resource, repeatedly > > monitoring the service until it returns OK or exceeds the start > > timeout. > > I thought both stop and start were a no-op and only monitor did > anything? > Did we move on from that (I can see why we might, my memory is just a > little hazy on the subject)?
Start is the first monitor. This gives us the distinction between start fail (never worked) and monitor fail (something went wrong afterwards). -- Vossel _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org