Re: [Pacemaker] Enable remote monitoring

David Vossel Thu, 06 Dec 2012 15:23:24 -0800

----- Original Message -----
> From: "Yan Gao" <y...@suse.com>
> To: pacemaker@oss.clusterlabs.org
> Sent: Thursday, December 6, 2012 12:28:06 PM
> Subject: Re: [Pacemaker] Enable remote monitoring
> 
> Hi,
> 
> On 12/06/12 19:42, Lars Marowsky-Bree wrote:
> > On 2012-12-06T22:25:40, Andrew Beekhof <and...@beekhof.net> wrote:
> > 
> >> But any failures of the nagios agents would count against the VM's
> >> migration-threshold.
> >> So if moving were the right thing to do, it would have done it
> >> already.
> > 
> > OK. I think this was due to me still being stuck on the workings of
> > an
> > order constraint, but of course if the failures are instead
> > attributed
> > to the container, this would happen automatically already. True.
> > 
> > (Incidentally, I like "attribute", "ascribe" better than "delegate"
> > because to me, they better fit what's going on, if we sticked with
> > "delegate-failures". Just saying. ;-)
> > 
> >>> We already have on-fail settings. How would these play together?
> >> Good question. My initial thought was that it would be up to
> >> on-fail
> >> settings in the VM.
> > 
> > I'd prefer to keep that separate (as proposed below). Because if an
> > action of the *VM* really fails, I may want an admin to look into
> > it
> > (why could the bloody hypervisor not start/stop it?), which is
> > different
> > from restarting the VM if one of the resources within it needs
> > that.
> > 
> >>> Would it even make sense to have on-fail="restart-container"? (Or
> >>> a
> >>> nicer wording.)
> >>>
> >>> Hmmm. That might work. We allow a "container" to be specified as
> >>> a meta
> >>> attribute.
> >>>
> >>> If set, on-fail would default to restart container for most
> >>> actions. But
> >>> admins could actually modify it - say, they might want to set
> >>> monitor on-fail="ignore" to just get notified. And when we move
> >>> forward
> >>> to whiteboxes, we could have start/monitor/promote/demote
> >>> on-fail="restart" (like now) and stop
> >>> on-fail="restart-container".
> >>>
> >>> That appears reasonably neat?
> >> It does actually.
> >> I wasn't originally thinking it was necessary but it makes sense
> >> now
> >> that you point it out.
> > 
> > Yes, I think I like this too now.
> I like it too. Here comes the drafted code:
> https://github.com/gao-yan/pacemaker/commit/4f7b80baa42f3801c1fb8186aef076877f34dfea
> 
> It works in my simple test. Although failures of resources hasn't
> counted against container's migration-threshold yet, it shows you the
> basic idea. I'd appreciate if you can take a look first. It's very
> likely I'm really on the right track this time. ;-)


I've thought about your implementation some more.  Have we discussed the 
possibility of implicitly setting the order constraint internally when the 
container attribute is set?  Also, it seems like now that we are mapping a 
resource to a container resource in the meta-attributes, we could find a 
shortcut to build the colocation relationship there as well.

What about something like this for the meta-attributes.

container="vm"  --- Internally this means 'on-fail=restart-container' and 
'order start vm then start rsc'
with-container="true"  --- this means if container is set, go ahead and 
colocate this rsc with the container.

With something like the above, we can fully express the container and child 
relationship without multiple (any) resource and colocation constraint sets.

Anyway, just an idea... I drastically like this container meta-attribute idea 
and the failure-delagate idea over the restart-origin one now.  restart-origin 
seemed good at first, but it doesn't really express what we are doing 
completely, these other ideas seem represent the relationship between the 
resources better.  Great discussion everyone :)

-- Vossel



> > 
> > Uhm. Would "container" imply ordering + colocation, or would we
> > still
> > need them grouped (resource_set'ed, whatever)?
> > 
> > My, design is hard. ;-)
> :-)
> 
> 
> Regards,
>   Gao,Yan
> --
> Gao,Yan <y...@suse.com>
> Software Engineer
> China Server Team, SUSE.
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enable remote monitoring

Reply via email to