On Tue, 2019-09-03 at 10:09 +0200, Marco Marino wrote: > Hi, I have a problem with fencing on a two node cluster. It seems > that randomly the cluster cannot complete monitor operation for fence > devices. In log I see: > crmd[8206]: error: Result of monitor operation for fence-node2 on > ld2.mydomain.it: Timed Out > As attachment there is > - /var/log/messages for node1 (only the important part) > - /var/log/messages for node2 (only the important part) <-- Problem > starts here > - pcs status > - pcs stonith show (for both fence devices) > > I think it could be a timeout problem, so how can I see timeout value > for monitor operation in stonith devices? > Please, someone can help me with this problem? > Furthermore, how can I fix the state of fence devices without > downtime? > > Thank you
How to investigate depends on whether this is an occasional monitor failure, or happens every time the device start is attempted. From the status you attached, I'm guessing it's at start. In that case, my next step (since you've already verified ipmitool works directly) would be to run the fence agent manually using the same arguments used in the cluster configuration. Check the man page for the fence agent, looking at the section for "Stdin Parameters". These are what's used in the cluster configuration, so make a note of what values you've configured. Then run the fence agent like this: echo -e "action=status\nPARAMETER=VALUE\nPARAMETER=VALUE\n..." | /path/to/agent where PARAMETER=VALUE entries are what you have configured in the cluster. If the problem isn't obvious from that, you can try adding a debug_file parameter. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/