>>> Ken Gaillot <[email protected]> schrieb am 25.02.2020 um 23:30 in Nachricht <29058_1582669837_5E55A00B_29058_3341_1_f8e8426d0c2cf098f88fb6330e8a80586f03043a [email protected]>: > Hi all, > > We are a couple of months away from starting the release cycle for > Pacemaker 2.0.4. I'll highlight some new features between now and then. > > First we have shutdown locks. This is a narrow use case that I don't > expect a lot of interest in, but it helps give pacemaker feature parity > with proprietary HA systems, which can help users feel more comfortable > switching to pacemaker and open source. > > The use case is a large organization with few cluster experts and many > junior system administrators who reboot hosts for OS updates during > planned maintenance windows, without any knowledge of what the host > does. The cluster runs services that have a preferred node and take a > very long time to start. > > In this scenario, pacemaker's default behavior of moving the service to > a failover node when the node shuts down, and moving it back when the > node comes back up, results in needless downtime compared to just > leaving the service down for the few minutes needed for a reboot. > > The goal could be accomplished with existing pacemaker features. > Maintenance mode wouldn't work because the node is being rebooted. But > you could figure out what resources are active on the node, and use a > location constraint with a rule to ban them on all other nodes before > shutting down. That's a lot of work for something the cluster can > figure out automatically. > > Pacemaker 2.0.4 will offer a new cluster property, shutdown‑lock, > defaulting to false to keep the current behavior. If shutdown‑lock is > set to true, any resources active on a node when it is cleanly shut > down will be "locked" to the node (kept down rather than recovered > elsewhere). Once the node comes back up and rejoins the cluster, they > will be "unlocked" (free to move again if circumstances warrant).
I'm not very happy with the wording: What about a per-resource feature "tolerate-downtime" that specifies how long this resource may be down without causing actions from the cluster. I think it would be more useful than some global setting. Maybe complement that per-resource feature with a per-node feature using the same name. I think it's very important to specify and document that mode comparing it to maintenance mode. Regards, Ulrich > > An additional cluster property, shutdown‑lock‑limit, allows you to set > a timeout for the locks so that if the node doesn't come back within > that time, the resources are free to be recovered elsewhere. This > defaults to no limit. > > If you decide while the node is down that you need the resource to be > recovered, you can manually clear a lock with "crm_resource ‑‑refresh" > specifying both ‑‑node and ‑‑resource. > > There are some limitations using shutdown locks with Pacemaker Remote > nodes, so I'd avoid that with the upcoming release, though it is > possible. > ‑‑ > Ken Gaillot <[email protected]> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
