On Sat, May 22, 2010 at 1:13 AM, Dean Patterson <[email protected]> wrote: > We are using the following to create a 2-node highly-available cluster: > > Disk device - fusion-io cards (PCIe SSD's) > DRBD/Corosync/Pacemaker > > [r...@motest16 log]# rpm -qa | egrep "drbd|corosync|pacemaker" > drbd-pacemaker-8.3.7-1 > drbd-8.3.7-1 > drbd-bash-completion-8.3.7-1 > drbd-xen-8.3.7-1 > drbd-km-debuginfo-8.3.7-12 > corosynclib-1.2.1-1.el5 > drbd-utils-8.3.7-1 > drbd-udev-8.3.7-1 > drbd-km-2.6.18_164.15.1.0.1.el5-8.3.7-12 > corosynclib-1.2.1-1.el5 > pacemaker-1.0.8-6.el5 > drbd-debuginfo-8.3.7-1 > drbd-heartbeat-8.3.7-1 > corosync-1.2.1-1.el5 > pacemaker-libs-1.0.8-6.el5 > > [r...@motest16 log]# uname -r > 2.6.18-164.15.1.0.1.el5 > > Terminology: > Pacemaker - Master/Slave > DRBD - Primary/Secondary >
[snip] > ############################### TEST CASE #2 ############################### > OVERVIEW: Using dd /dev/zero to test the switchover of drbd/pacemaker and it > fails. And pacemaker > does not switchover the master/slave indicating an issue with the > corosync/pacemaker layer. The cluster can't start fsFusion somewhere else until it safely stopped on motest17. Unfortunately the stop action failed (by timing out) and since stonith was disabled, there was no way for the cluster to complete the recovery. Step 1, increase the timeouts. Step 2, enable stonith (and add a stonith device) [snip] > fsFusion (ocf::heartbeat:Filesystem): Started motest17.apple.com > (unmanaged) FAILED > > Failed actions: fsFusion_stop_0 (node=motest17.apple.com, call=54, rc=-2, > status=Timed Out): unknown exec error _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
