Re: [ClusterLabs] kind=Optional order constraint not working at startup

Auer, Jens Thu, 22 Sep 2016 01:41:05 -0700

Hi,

> >> shared_fs has to wait for the DRBD promotion, but the other resources
> >> have no such limitation, so they are free to start before shared_fs.
> > Isn't there an implicit limitation by the ordering constraint? I have
> > drbd_promote < shared_fs < snmpAgent-clone, and I would expect this to be a
> transitive relationship.
> 
> Yes, but shared fs < snmpAgent-Clone is optional, so snmpAgent-Clone is free 
> to
> start without it.
I was probably confused by the description in the manual. It says that
"* Optional - Only applies if both resources are starting and/or stopping." 
(from 
RedHat HA documentation). I assumed that this means e.g.
that when all resources are started when I start the cluster the constraint 
holds.


> > What is the meaning of "transition"? Is there any way I can force resource 
> > actions
> into transitions?
> 
> A transition is simply the cluster's response to the current cluster state, 
> as directed
> by the configuration. The easiest way to think of it is as the "steps" as 
> described
> above.
> 
> If the configuration says a service should be running, but the service is not 
> currently
> running, then the cluster will schedule a start action (if possible 
> considering
> constraints, etc.). All such actions that may be scheduled together at one 
> time is a
> "transition".
> 
> You can't really control transitions; you can only control the configuration, 
> and
> transitions result from configuration+state.
> 
> The only way to force actions to take place in a certain order is to use 
> mandatory
> constraints.
> 
> The problem here is that you want the constraint to be mandatory only at 
> "start-
> up". But there really is no such thing. Consider the case where the cluster 
> stays up,
> and for whatever maintenance purpose, you stop all the resources, then start 
> them
> again later. Is that the same as start-up or not? What if you restart all but 
> one
> resource?
I think start-up is just a special case of what I think is a dependency for 
starting a resource. 
My current understanding is that a mandatory constraint means "If you 
start/stop resource A then you 
have to start/stop resource B". An optional  constraint says that the 
constraint only holds when
you start/stop two resources together in a single transition. What I want to 
express is more like
a dependency "don't start resource A before resource B has been started at all. 
State changes of resource B 
should not impact resource A". I realize this is kind of odd, but if A can 
tolerate outages of its dependency B,
e.g. reconnect, this makes sense. In principle this is what an optional 
constraint does, but not restricted
to a single transition.

> I can imagine one possible (but convoluted) way to do something like this, 
> using
> node attributes and rules:
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-
> single/Pacemaker_Explained/index.html#idm140521751827232
> 
> With a rule, you can specify a location constraint that applies, not to a 
> particular
> node, but to any node with a particular value of a particular node attribute.
> 
> You would need a custom resource agent that sets a node attribute. Let's say 
> it
> takes three parameters, the node attribute name, the value to set when 
> starting (or
> do nothing), and the value to set when stopping (or do nothing). (That might
> actually be a good idea for a new ocf:pacemaker:
> agent.)
> 
> You'd have an instance of this resource grouped with shared-fs, that would 
> set the
> attribute to some magic value when started (say, "1").
> You'd have another instance grouped with snmpAgent-clone that would set it
> differently when stopped ("0"). Then, you'd have a location constraint for
> snmpAgent-clone with a rule that says it is only allowed on nodes with the 
> attribute
> set to "1".
> 
> With that, snmpAgent-clone would be unable to start until shared-fs had 
> started at
> least once. shared-fs could stop without affecting snmpAgent-clone. If 
> snmpAgent-
> clone stopped, it would reset, so it would require shared-fs again.
> 
> I haven't thought through all possible scenarios, but I think it would give 
> the
> behavior you want.
That sounds interesting... I think we explore a solution which could accept 
restarting our resources.
We only used the cloned resource set because we want our processes up and 
running to
minimize outage when doing a failover. Currently, the second server is a 
passive backup
which has everything up and running ready to take over. After the fs switches, 
it resynchs 
and then is ready to go. We probably can accept the additional timeout for 
starting the resources
completely, but we have to explore this.

Thanks,

  Jens


_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] kind=Optional order constraint not working at startup

Reply via email to