On Mon, 2017-07-24 at 11:51 +0000, Tomer Azran wrote: > Hello, > > > > We built a pacemaker cluster with 2 physical servers. > > We configured DRBD in Master\Slave setup, a floating IP and file > system mount in Active\Passive mode. > > We configured two STONITH devices (fence_ipmilan), one for each > server. > > > > We are trying to simulate a situation when the Master server crushes > with no power. > > We pulled both of the PSU cables and the server becomes offline > (UNCLEAN). > > The resources that the Master use to hold are now in Started (UNCLEAN) > state. > > The state is unclean since the STONITH failed (the STONITH device is > located on the server (Intel RMM4 - IPMI) – which uses the same power > supply). > > > > The problem is that now, the cluster does not releasing the resources > that the Master holds, and the service goes down. > > > > Is there any way to overcome this situation? > > We tried to add a qdevice but got the same results. > > > > We are using pacemaker 1.1.15 on CentOS 7.3 > > > > Thanks, > > Tomer.
This is a limitation of using IPMI as the only fence device, when the IPMI shares power with the main system. The way around it is to use a fallback fence device, for example a switched power unit or sbd (watchdog). Pacemaker lets you specify a fencing "topology" with multiple devices -- level 1 would be the IPMI, and level 2 would be the fallback device. qdevice helps with quorum, which would let one side attempt to fence the other, but it doesn't affect whether the fencing succeeds. With a two-node cluster, you can use qdevice to get quorum, or you can use corosync's two_node option. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org