On Sat, May 22, 2010 at 1:13 AM, Dean Patterson <[email protected]> wrote:
> We are using the following to create a 2-node highly-available cluster:
>
> Disk device - fusion-io cards (PCIe SSD's)
> DRBD/Corosync/Pacemaker
>
> [r...@motest16 log]# rpm -qa | egrep "drbd|corosync|pacemaker"
> drbd-pacemaker-8.3.7-1
> drbd-8.3.7-1
> drbd-bash-completion-8.3.7-1
> drbd-xen-8.3.7-1
> drbd-km-debuginfo-8.3.7-12
> corosynclib-1.2.1-1.el5
> drbd-utils-8.3.7-1
> drbd-udev-8.3.7-1
> drbd-km-2.6.18_164.15.1.0.1.el5-8.3.7-12
> corosynclib-1.2.1-1.el5
> pacemaker-1.0.8-6.el5
> drbd-debuginfo-8.3.7-1
> drbd-heartbeat-8.3.7-1
> corosync-1.2.1-1.el5
> pacemaker-libs-1.0.8-6.el5
>
> [r...@motest16 log]# uname -r
> 2.6.18-164.15.1.0.1.el5
>
> Terminology:
> Pacemaker - Master/Slave
> DRBD      - Primary/Secondary
>

[snip]

> ############################### TEST CASE #2 ###############################
> OVERVIEW: Using dd /dev/zero to test the switchover of drbd/pacemaker and it 
> fails. And pacemaker
> does not switchover the master/slave indicating an issue with the 
> corosync/pacemaker layer.

The cluster can't start fsFusion somewhere else until it safely
stopped on motest17.
Unfortunately the stop action failed (by timing out) and since stonith
was disabled, there was no way for the cluster to complete the
recovery.

Step 1, increase the timeouts.
Step 2, enable stonith (and add a stonith device)

[snip]

> fsFusion        (ocf::heartbeat:Filesystem):    Started motest17.apple.com 
> (unmanaged) FAILED
>
> Failed actions:   fsFusion_stop_0 (node=motest17.apple.com, call=54, rc=-2, 
> status=Timed Out): unknown exec error
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to