Hi!

We have set up a 2-node Pacemaker cluster using SLES 11 SP1 + HA-Extension. Each machine has two DRBD resources, on is called 'mysql' and the other 'wwwdata'. The mysql resource has an XFS filesystem; wwwdata is using an OCFS2 1.4 FS. Our goal is to create an Active/Standby MySQL cluster with the databases being on the XFS filesystem. The OCFS2 FS is supposed to store data that is created by
scripts that access the MySQL server database.

The primitive resources are setup as follows:
----- snip -----
primitive p_controld ocf:pacemaker:controld \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="100s"
primitive p_drbd_mysql ocf:linbit:drbd \
        params drbd_resource="mysql" \
        op monitor interval="20" role="Master" timeout="20" \
        op monitor interval="30" role="Slave" timeout="20" \
        op notify interval="0" timeout="90" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="100s"
primitive p_drbd_wwwdata ocf:linbit:drbd \
        params drbd_resource="wwwdata" \
        op monitor interval="20" role="Master" timeout="20" \
        op monitor interval="30" role="Slave" timeout="20" \
        op notify interval="0" timeout="90" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="360s"
primitive p_fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/mysql" directory="/data/mysql" fstype="xfs" options="rw,noatime" \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="100s" \
        meta is-managed="true"
primitive p_fs_wwwdata ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/data/www" fstype="ocfs2" options="rw,noatime,noacl,nouser_xattr,commit=30,data=writeback" \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="300s"
primitive p_ip_float_cluster ocf:heartbeat:IPaddr2 \
params ip="1.2.3.4" nic="bond0" cidr_netmask="24" flush_routes="true" \
        meta target-role="Started"
primitive p_o2cb ocf:ocfs2:o2cb \
        op monitor interval="120s" \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="100s" \
        meta target-role="Started"
----- snip -----

The problem with the setup is that the DRBD monitor operation seem to time out in situations with high I/O load, triggering a Failover-attempt followed by one node getting STONITH'd since the file system is still busy running the operation that caused this in the first place. For example, this is what happened yesterday when I did a "chmod -R" on a directory-tree containing about 4.5 million rather small files on the OCFS2 fs:

----- snip -----
Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_mysql:0:39: monitor Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_wwwdata:0:38: monitor
Jul 11 11:06:29 node01 mysql[6665]: INFO: MySQL monitor succeeded
Jul 11 11:07:37 node01 lrmd: [25011]: WARN: p_drbd_wwwdata:0:monitor process (PID 6776) timed out (try 1). Killing with signal SIGTERM (15). Jul 11 11:07:37 node01 lrmd: [25011]: WARN: operation monitor[38] on ocf::drbd::p_drbd_wwwdata:0 for client 25014, its parameters: CRM_meta_clone=[0] CRM_meta_role=[Master] CRM_meta_notify_slave_resource=[ ] CRM_meta_notify_active_resource=[ ] CRM_meta_notify_demote_uname=[ ] drbd_resource=[wwwdata] CRM_meta_notify_inactive_resource=[p_drbd_wwwdata:0 p_drbd_wwwdata:1 ] CRM_meta_master_node_max=[1] CRM_meta_notify_stop_resource=[ ] CRM_meta_notify_master_resource=[ ] CRM_meta_clone_node_max=[1] CRM_meta_notify=[true] CRM_meta_notify_demote_resource=[: pid [6776] timed out Jul 11 11:07:37 node01 crmd: [25014]: ERROR: process_lrm_event: LRM operation p_drbd_wwwdata:0_monitor_20000 (38) Timed Out (timeout=20000ms) Jul 11 11:07:37 node01 crmd: [25014]: info: process_graph_event: Detected action p_drbd_wwwdata:0_monitor_20000 from a different transition: 11 vs. 135 Jul 11 11:07:37 node01 crmd: [25014]: info: abort_transition_graph: process_graph_event:477 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=p_drbd_wwwdata:0_monitor_20000, magic=2:-2;15:11:8:6f0304c9-522b-4582-a26b-cffe24afe9e2, cib=0.349.10) : Old event Jul 11 11:07:37 node01 crmd: [25014]: WARN: update_failcount: Updating failcount for p_drbd_wwwdata:0 on node01 after failed monitor: rc=-2 (update=value++, time=1310375257)
----- snip -----

The operation would have taken a few minutes to complete, but shouldn't have had any larger impact on the rest of the system. Increasing the monitor timeout indefinitely
doesn't look like the way to go here.
Is there a way to ensure that the monitor operations return within a reasonable
time-frame even in high load situations?
Or is there something fundamentally flawed in our setup?

Thanks in advance!


--
Sebastian Kaps

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to