Dne 26. 11. 18 v 21:41 Ken Gaillot napsal(a):
On Mon, 2018-11-26 at 14:24 +0200, Klecho wrote:
Hi again,

Just made one simple "parallel shutdown" test with a strange result,
confirming the problem I've described.

Created a few dummy resources, each of them taking 60s to stop. No
constraints at all. After that issued "stop" to all of them, one by
one.

Stop operation wasn't attempted for any of the rest until the first
resource stopped.

When the first resource stopped, all the rest stopped at a same
moment
120s after the stop commands were issued.

This confirms that if many resources (VMs) need to be stopped and
first
one starts some update (and a big stop timeout is set), stop attempt
for
the rest won't be made at all, until the first is up.

Why is this so and is there a way to avoid it?

It has to do with pacemaker's concept of a "transition".

When an interesting event happens (like your first stop), pacemaker
calculates what actions need to be taken and then does them. A
transition may be interrupted between actions by a new event, but any
event already begun must complete before a new transition can begin.

What happened here is that when you stopped the first resource, a
transition was created with that one stop, and that stop was initiated.
When the later stops came in, they would cause a new transition, but
that first stop has to complete before that transition can begin.

There are a few ways around this:

* Shutdown will stop all resources on its own, so you could skip the
stopping altogether.

* If you prefer to ensure all the resources stop successfully before
you start the shutdown, you could batch all the "stop" changes into one
file and apply that to the config. A stop command sets the resource's
target-role meta-attribute to Stopped. Normally, this is applied
directly to the live configuration, so it takes effect immediately.
However crm and pcs both offer ways to batch commands in a file, then
apply it all at once.

With pcs 0.9.157 and newer you can simply specify several resources in the "pcs resource disable" command. It has the same effect as batching all the stop changes into a file but it is much easier to use.


* Or, you could set the node(s) to standby mode as a transient
attribute (using attrd_updater). That would cause all resources to move
off those nodes (and stop if there are no nodes remaining). Transient
node attributes are erased every time a node leaves the cluster, so it
would only have effect until shutdown; when the node rejoined, it would
be in regular mode.


On 11/20/18 12:40 PM, Klechomir wrote:
Hi list,
Bumped onto the following issue lately:

When ultiple VMs are given shutdown right one-after-onther and the
shutdown of
the first VM takes long, the others aren't being shut down at all
before the
first doesn't stop.

"batch-limit" doesn't seem to affect this.
Any suggestions why this could happen?

Best regards,
Klecho
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
h.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to