On Wednesday 21 December 2022 at 16:59:16, Antony Stone wrote: > Hi. > > I'm implementing fencing on a 7-node cluster as described recently: > https://lists.clusterlabs.org/pipermail/users/2022-December/030714.html > > I'm using external/ssh for the time being, and it works if I test it using: > > stonith -t external/ssh -p "nodeA nodeB nodeC" -T reset nodeB > > > However, when it's supposed to be invoked because a node has got stuck, I > simply find syslog full of the following (one from each of the other six > nodes in the cluster): > > pacemaker-fenced[3262]: notice: Operation reboot of nodeB by <no-one> for > pacemaker-controld.26852@nodeA.93b391b2: No such device > > I have defined seven stonith resources, one for rebooting each machine, and > I can see from "crm status" that they have been assigned randomly amongst > the other servers, usually one per server, so that looks good. > > > The main things that puzzle me about the log message are: > > a) why does it say "<no-one>"? Is this more like "anyone", meaning that > no- one in particular is required to do this task, provided that at least > someone does it? Does this indicate a configuration problem?
PS: I've just noticed that I'm also getting log entries immediately afterwards: pacemaker-controld[3264]: notice: Peer nodeB was not terminated (reboot) by <anyone> on behalf of pacemaker-controld.26852: No such device > b) what is this "device" referred to? I'm using "external/ssh" so there is > no actual Stonith device for power-cycling hardware machines - am I > supposed to define some sort of dummy device somewhere? > > For clarity, this is what I have added to my cluster configuration to set > this up: > > primitive reboot_nodeA stonith:external/ssh params hostlist="nodeA" > location only_nodeA reboot_nodeA -inf: nodeA > > ...repeated for all seven nodes. > > I also have "stonith-enabled=yes" in the cib-bootstrap-options. > > > Ideas, anyone? > > Thanks, > > > Antony. -- Normal people think "If it ain't broke, don't fix it". Engineers think "If it ain't broke, it doesn't have enough features yet". Please reply to the list; please *don't* CC me. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/