----- Original Message ----- > From: "Yan Gao" <y...@suse.com> > To: pacemaker@oss.clusterlabs.org > Sent: Thursday, January 31, 2013 10:37:53 PM > Subject: Re: [Pacemaker] Enable remote monitoring > > Hi Andrew, > > On 01/31/13 14:35, Andrew Beekhof wrote: > > > > On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> wrote: > > > >> > >> > >> ----- Original Message ----- > >>> From: "Yan Gao" <y...@suse.com> > >>> To: pacemaker@oss.clusterlabs.org > >>> Sent: Monday, January 21, 2013 11:28:40 PM > >>> Subject: Re: [Pacemaker] Enable remote monitoring > >>> > >>> Hi, > >>> Here's the code for supporting nagios plugins in lrmd: > >>> > >>> https://github.com/gao-yan/pacemaker/commits/nagios > >>> > >>> A new resource class "nagios" is introduced. > >>> > >>> Actions: > >>> > >>> - probe: A resource defined for a resource container is not > >>> probed. > >>> (We > >>> can also add a condition in pengine to just avoid probing a > >>> nagios > >>> class > >>> resource.) > >> > >> Yeah, I think the pengine should know to never probe a nagios > >> script regardless if it is involved in a container or not. > >> > >>> - start: Invokes the nagios plugin with specified parameters > >>> (Maps > >>> the > >>> instance attributes to the long options of the nagios plugin). If > >>> it > >>> returns non-OK, re-invokes it after some delay (delay = > >>> start_timeout > >>> / > >>> 10), until it returns OK or exceeds the start timeout. > >> > >> I made a comment about this on the patch. Shouldn't the > >> cmd->timeout value be updated each time it is re-scheduled to > >> account for time already spent? > >> > >>> > >>> - monitor: Recurring invocation to the nagios plugin with > >>> specified > >>> parameters. > >>> > >>> - stop: Nothing special is done. The recurring monitor is > >>> canceled > >>> anyway. > >>> > >>> - metadata: Reads the corresponding metadata from a xml file in > >>> NAGIOS_METADATA_DIR. > >>> > >>> (As we know nagios plugins don't support metadata. The current > >>> plan > >>> is > >>> to generate the corresponding metadata according to the help of > >>> the > >>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan > >>> already > >>> has progress on this. Thank, Dejan!) > >>> > >>> > >>> For nagios plugins, the exit code are: > >>> > >>> STATE_OK = 0, > >>> STATE_WARNING = 1, > >>> STATE_CRITICAL = 2, > >>> STATE_UNKNOWN = 3, > >>> STATE_DEPENDENT = 4, > >>> > >>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others > >>> should > >>> all > >>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no > >>> code > >>> to express "NOT_RUNNING" in nagios plugins. I think it should be > >>> fine, > >>> since there's no probe. > >>> > >>> Any suggestions are appreciated! > >> > >> This mostly looks like what I expected. I'm letting the whole > >> re-scheduling of the start operation roll around in my head a > >> bit. It almost seems like that functionality belongs in the > >> service library... retry executing this action until either the > >> timeout is hit or some target return code is encountered. Any > >> thoughts on that? > > > > Who the what now? > > Why do start ops need to be rescheduled? > It's very likely that the "start" of the container returns before the > services inside are started. Abusing start-delay is not preferred. > The > idea is, in the start operation of the nagios resource, repeatedly > monitoring the service until it returns OK or exceeds the start > timeout.
It is likely I'll have to do something similar for my whitebox use case with the lrmd connection resources. -- Vossel > The latest code for supporting nagios plugin in lrmd is in: > https://github.com/gao-yan/pacemaker/commits/nagios > > And the code for supporting container in policy engine is still in: > https://github.com/ClusterLabs/pacemaker/pull/195 > > Thanks, > Gao,Yan > -- > Gao,Yan <y...@suse.com> > Software Engineer > China Server Team, SUSE. > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org