Re: [Pacemaker] Seems to be working but fails to transition to othernode.

Matthew O'Connor Wed, 30 May 2012 19:14:59 -0700

On 5/30/2012 8:16 PM, Steven Silk wrote:

All Concerned;

I have been getting slapped around all day with this problem - I can't solve it.

The system is only half done - I have not yet implemented the nfs portion - but drbd part is not yet cooperating with corosync.

It appears to be working OK - but when I stop corosync on the DC - the other node does not start drbd?

Here is how I am setting things up....

Configure quorum and stonith
property no-quorum-policy="ignore"
property stonith-enabled="false"
On wms1 onfigure DRBD resource
primitive drbd_drbd0 ocf:linbit:drbd \
                    params drbd_resource="drbd0" \
                    op monitor interval="30s"
Configure DRBD Master/Slave
ms ms_drbd_drbd0 drbd_drbd0 \
                    meta master-max="1" master-node-max="1" \
                         clone-max="2" clone-node-max="1" \
                         notify="true"
Configure filesystem mountpoint
primitive fs_ftpdata ocf:heartbeat:Filesystem \
                    params device="/dev/drbd0" \
                    directory="/mnt/drbd0" fstype="ext3"
When I check the status on the DC....

[root@wms2 ~]# crm
crm(live)# status
============
Last updated: Wed May 30 23:58:43 2012
Last change: Wed May 30 23:52:42 2012 via cibadmin on wms1
Stack: openais
Current DC: wms2 - partition with quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ wms1 wms2 ]

Master/Slave Set: ms_drbd_drbd0 [drbd_drbd0]
     Masters: [ wms2 ]
     Slaves: [ wms1 ]
fs_ftpdata    (ocf::heartbeat:Filesystem):    Started wms2

[root@wms2 ~]# mount -l | grep drbd

/dev/drbd0 on /mnt/drbd0 type ext3 (rw)

So I stop corosync - but the other node...

[root@wms1 ~]# crm
crm(live)# status
============
Last updated: Thu May 31 00:12:17 2012
Last change: Wed May 30 23:52:42 2012 via cibadmin on wms1
Stack: openais
Current DC: wms1 - partition WITHOUT quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ wms1 ]
OFFLINE: [ wms2 ]

Master/Slave Set: ms_drbd_drbd0 [drbd_drbd0]
     Masters: [ wms1 ]
     Stopped: [ drbd_drbd0:1 ]

Fails to mount /dev/drbd0?

Any ideas?

I tailed /var/log/cluster/corosync.log and get this....

May 31 00:02:36 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 22 for master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:03:06 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 25 for master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:03:10 wms1 crmd: [1268]: WARN: cib_rsc_callback: Resource update 15 failed: (rc=-41) Remote node did not respond
May 31 00:03:36 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 28 for master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:06 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 31 for master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 34 for master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 37 for master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:04:10 wms1 attrd: [1266]: WARN: attrd_cib_callback: Update 40 for master-drbd_drbd0:0=5 failed: Remote node did not respond
May 31 00:08:02 wms1 cib: [1257]: info: cib_stats: Processed 58 operations (0.00us average, 0% utilization) in the last 10min
May 31 00:08:02 wms1 cib: [1264]: info: cib_stats: Processed 117 operations (256.00us average, 0% utilization) in the last 10min

[root@wms2 ~]# tail /var/log/cluster/corosync.log
May 31 00:02:16 corosync [pcmk ] info: update_member: Node wms2 now has process list: 00000000000000000000000000000002 (2)
May 31 00:02:16 corosync [pcmk ] notice: pcmk_shutdown: Shutdown complete
May 31 00:02:16 corosync [SERV ] Service engine unloaded: Pacemaker Cluster Manager 1.1.6
May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync extended virtual synchrony service
May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync configuration service
May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync cluster config database access v1.01
May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync profile loading service
May 31 00:02:16 corosync [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
May 31 00:02:16 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1858.

the example that I am working from talks about doing the following....
group services fs_drbd0

In your case I would think it would be:
group ms_drbd_drbd0 fs_drbd0

in my very limited experience, I have found it to be useful to also provide some additional info to pacemaker:
colocation co_drbd_fs inf: fs_drbd0 ms_drbd_drbd0:Master
order o_drbd_fs inf: ms_drbd_drbd0:promote fs_drbd0:start

I hope this helps!

But this fails miserable... services being undefined?

--
Steven Silk
CSC
303 497 3112

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Sincerely,
  Matthew O'Connor

-----------------------------------------------------------------
Sr. Software Engineer
PGP/GPG Key: 0x55F981C4
Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4

Engineering and Computer Simulations, Inc.
11825 High Tech Ave Suite 250
Orlando, FL 32817

Tel:   407-823-9991 x315
Fax:   407-823-8299
Email: m...@ecsorl.com
Web:   www.ecsorl.com
-----------------------------------------------------------------

CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Seems to be working but fails to transition to othernode.

Reply via email to