All Concerned; I have been getting slapped around all day with this problem - I can't solve it.
The system is only half done - I have not yet implemented the nfs portion - but drbd part is not yet cooperating with corosync. It appears to be working OK - but when I stop corosync on the DC - the other node does not start drbd? Here is how I am setting things up.... Configure quorum<http://docs.homelinux.org/doku.php?id=create_high-available_drbd_device_with_pacemaker#fn__1>and stonith<http://docs.homelinux.org/doku.php?id=create_high-available_drbd_device_with_pacemaker#fn__2> property no-quorum-policy="ignore" property stonith-enabled="false" On wms1 onfigure DRBD resource primitive drbd_drbd0 ocf:linbit:drbd \ params drbd_resource="drbd0" \ op monitor interval="30s" Configure DRBD Master/Slave ms ms_drbd_drbd0 drbd_drbd0 \ meta master-max="1" master-node-max="1" \ clone-max="2" clone-node-max="1" \ notify="true" Configure filesystem mountpoint primitive fs_ftpdata ocf:heartbeat:Filesystem \ params device="/dev/drbd0" \ directory="/mnt/drbd0" fstype="ext3" When I check the status on the DC.... [root@wms2 ~]# crm crm(live)# status ============ Last updated: Wed May 30 23:58:43 2012 Last change: Wed May 30 23:52:42 2012 via cibadmin on wms1 Stack: openais Current DC: wms2 - partition with quorum Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558 2 Nodes configured, 2 expected votes 3 Resources configured. ============ Online: [ wms1 wms2 ] Master/Slave Set: ms_drbd_drbd0 [drbd_drbd0] Masters: [ wms2 ] Slaves: [ wms1 ] fs_ftpdata (ocf::heartbeat:Filesystem): Started wms2 [root@wms2 ~]# mount -l | grep drbd /dev/drbd0 on /mnt/drbd0 type ext3 (rw) So I stop corosync - but the other node... [root@wms1 ~]# crm crm(live)# status ============ Last updated: Thu May 31 00:12:17 2012 Last change: Wed May 30 23:52:42 2012 via cibadmin on wms1 Stack: openais Current DC: wms1 - partition WITHOUT quorum Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558 2 Nodes configured, 2 expected votes 3 Resources configured. ============ Online: [ wms1 ] OFFLINE: [ wms2 ] Master/Slave Set: ms_drbd_drbd0 [drbd_drbd0] Masters: [ wms1 ] Stopped: [ drbd_drbd0:1 ] Fails to mount /dev/drbd0? Any ideas? I tailed /var/log/cluster/corosync.log and get this.... May 31 00:02:36 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 22 for master-drbd_drbd0:0=5 failed: Remote node did not respond May 31 00:03:06 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 25 for master-drbd_drbd0:0=5 failed: Remote node did not respond May 31 00:03:10 wms1 crmd: [1268]: WARN: cib_rsc_callback: Resource update 15 failed: (rc=-41) Remote node did not respond May 31 00:03:36 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 28 for master-drbd_drbd0:0=5 failed: Remote node did not respond May 31 00:04:06 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 31 for master-drbd_drbd0:0=5 failed: Remote node did not respond May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 34 for master-drbd_drbd0:0=5 failed: Remote node did not respond May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 37 for master-drbd_drbd0:0=5 failed: Remote node did not respond May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 40 for master-drbd_drbd0:0=5 failed: Remote node did not respond May 31 00:08:02 wms1 cib: [1257]: info: cib_stats: Processed 58 operations (0.00us average, 0% utilization) in the last 10min May 31 00:08:02 wms1 cib: [1264]: info: cib_stats: Processed 117 operations (256.00us average, 0% utilization) in the last 10min [root@wms2 ~]# tail /var/log/cluster/corosync.log May 31 00:02:16 corosync [pcmk ] info: update_member: Node wms2 now has process list: 00000000000000000000000000000002 (2) May 31 00:02:16 corosync [pcmk ] notice: pcmk_shutdown: Shutdown complete May 31 00:02:16 corosync [SERV ] Service engine unloaded: Pacemaker Cluster Manager 1.1.6 May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync extended virtual synchrony service May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync configuration service May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync cluster config database access v1.01 May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync profile loading service May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 May 31 00:02:16 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1858. the example that I am working from talks about doing the following.... group services fs_drbd0 But this fails miserable... services being undefined? -- Steven Silk CSC 303 497 3112
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org