>>> Ken Gaillot <[email protected]> schrieb am 12.04.2022 um 17:22 in Nachricht <[email protected]>: > Hi all, > > I'm hoping to have the first release candidate for 2.1.3 ready next > week. > > Pacemaker has long had a feature to monitor node health (CPU usage, > SMART drive errors, etc.) and move resources off degraded nodes: > > https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/ind
> ex.html#tracking‑node‑health Great, I wanted to ask a question on it anyway: Is the node health attribute stored in the CIB, or is it transient (i.e.: reset when the node is restarted)? Some comments on the docs: "yellow" state: could also mean node is becoming healthy (coming from red), right? The "Node Health Strategy" could benefit from better explanation. E.g.: "Assign the value of ..." Assign to whom/what? It's very hard to find out what "progressive" really does. I think an configuration example with a sample scenario (node health changes) would be very helpful. > > The 2.1.3 release will add a couple of features to make this more > useful. > > First, you can now exempt particular resources from health‑related > bans, using the new "allow‑unhealthy‑nodes" resource meta‑attribute. If that's a resource attribute, then the name is poorly chosen (IMHO). In times like these I'd almost suggest to name it "immune-against-node-health=red" or so (OK, just a joke). > > This is particularly helpful for the health monitoring agents > themselves. Without the new option, health agents get moved off Specifically if the health state can improve again. > degraded nodes, which means the cluster can't detect if the degraded > condition goes away. Users had to manually clear the health attributes > to allow resources to move back to the node. Now, you can set allow‑ > unhealthy‑nodes=true on your health agent resources, so they can > continue detecting changes in the health status. > > Second, crm_mon will indicate when a node's health is yellow or red, > like: > > * Node List: > * Node node1: online (health is RED) For compatibility I'd prefer a new option to display those, or at least a new item; maybe like: ---- Node Health: * Node: h16: green ... ---- or --- Node Attributes: * Node h16: green --- > > Previously, you would see that the node is not running any resources, > but not know why, unless you thought to check every node health > attribute. That's definitely a bad thing for any atrificial intelligence not to be able to explain itself ;-) Regards, Ulrich > ‑‑ > Ken Gaillot <[email protected]> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
