On Tue, May 5, 2009 at 5:26 PM, Eliot Gable <ega...@broadvox.net> wrote: > I have determined that this appears to be something with resource stickiness. > With the failure-timeout set to 5s, it was timing out the failure and > switching back to the preferred node so fast that it could not be migrated to > the other node. With failure-timeout set to 30s, the migration occurs such > that node2 becomes master.
right, you need failure-timeout to be greater than the time taken for your resource to move When the first cluster-recheck-interval fires after the failure-timeout expires, the cluster switches back to node1 as the Master. So, the stickiness seems to be off. I am using this rsc_default setup, which I built based on your documentation: > > <rsc_defaults> > <meta_attributes id="off-hours" score="2"> > <rule id="off-hour-rule" score="2"> > <date_expression id="four-am-to-five-am" operation="date_spec"> > <date_spec id="off-hour-date-spec" hours="4-5" weekdays="1-7"/> > </date_expression> > </rule> > <nvpair id="off-stickiness" name="resource-stickiness" value="0"/> > </meta_attributes> > <meta_attributes id="core-hours" score="1"> > <nvpair id="core-stickiness" name="resource-stickiness" > value="INFINITY"/> > </meta_attributes> > </rsc_defaults> > > The goal was to keep everything infinitely sticky except from 4am-5am when > resources would be allowed to switch back to their preferred node. Did I make > a mistake here? It looks about right. Can you create a bugzilla entry for this please? _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker