[Pacemaker] DRBD monitor time out in high I/O situations

Sebastian Kaps Tue, 12 Jul 2011 01:47:19 -0700

Hi!

We have set up a 2-node Pacemaker cluster using SLES 11 SP1 +HA-Extension.Each machine has two DRBD resources, on is called 'mysql' and the other'wwwdata'.The mysql resource has an XFS filesystem; wwwdata is using an OCFS2 1.4FS.Our goal is to create an Active/Standby MySQL cluster with thedatabases beingon the XFS filesystem. The OCFS2 FS is supposed to store data that iscreated by

scripts that access the MySQL server database.


The primitive resources are setup as follows:
----- snip -----
primitive p_controld ocf:pacemaker:controld \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="100s"
primitive p_drbd_mysql ocf:linbit:drbd \
        params drbd_resource="mysql" \
        op monitor interval="20" role="Master" timeout="20" \
        op monitor interval="30" role="Slave" timeout="20" \
        op notify interval="0" timeout="90" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="100s"
primitive p_drbd_wwwdata ocf:linbit:drbd \
        params drbd_resource="wwwdata" \
        op monitor interval="20" role="Master" timeout="20" \
        op monitor interval="30" role="Slave" timeout="20" \
        op notify interval="0" timeout="90" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="360s"
primitive p_fs_mysql ocf:heartbeat:Filesystem \

params device="/dev/drbd/by-res/mysql" directory="/data/mysql"fstype="xfs" options="rw,noatime" \

        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="100s" \
        meta is-managed="true"
primitive p_fs_wwwdata ocf:heartbeat:Filesystem \

params device="/dev/drbd/by-res/wwwdata" directory="/data/www"fstype="ocfs2"options="rw,noatime,noacl,nouser_xattr,commit=30,data=writeback" \

        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="300s"
primitive p_ip_float_cluster ocf:heartbeat:IPaddr2 \

params ip="1.2.3.4" nic="bond0" cidr_netmask="24"flush_routes="true" \

        meta target-role="Started"
primitive p_o2cb ocf:ocfs2:o2cb \
        op monitor interval="120s" \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="100s" \
        meta target-role="Started"
----- snip -----

The problem with the setup is that the DRBD monitor operation seem totime out in situations with high I/O load,triggering a Failover-attempt followed by one node getting STONITH'dsince the file system is still busy runningthe operation that caused this in the first place. For example, this iswhat happened yesterday when I did a"chmod -R" on a directory-tree containing about 4.5 million rathersmall files on the OCFS2 fs:


----- snip -----

Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_mysql:0:39:monitorJul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_wwwdata:0:38:monitor

Jul 11 11:06:29 node01 mysql[6665]: INFO: MySQL monitor succeeded

Jul 11 11:07:37 node01 lrmd: [25011]: WARN: p_drbd_wwwdata:0:monitorprocess (PID 6776) timed out (try 1). Killing with signal SIGTERM (15).Jul 11 11:07:37 node01 lrmd: [25011]: WARN: operation monitor[38] onocf::drbd::p_drbd_wwwdata:0 for client 25014, its parameters:CRM_meta_clone=[0] CRM_meta_role=[Master]CRM_meta_notify_slave_resource=[ ] CRM_meta_notify_active_resource=[ ]CRM_meta_notify_demote_uname=[ ] drbd_resource=[wwwdata]CRM_meta_notify_inactive_resource=[p_drbd_wwwdata:0 p_drbd_wwwdata:1 ]CRM_meta_master_node_max=[1] CRM_meta_notify_stop_resource=[ ]CRM_meta_notify_master_resource=[ ] CRM_meta_clone_node_max=[1]CRM_meta_notify=[true] CRM_meta_notify_demote_resource=[: pid [6776]timed outJul 11 11:07:37 node01 crmd: [25014]: ERROR: process_lrm_event: LRMoperation p_drbd_wwwdata:0_monitor_20000 (38) Timed Out(timeout=20000ms)Jul 11 11:07:37 node01 crmd: [25014]: info: process_graph_event:Detected action p_drbd_wwwdata:0_monitor_20000 from a differenttransition: 11 vs. 135Jul 11 11:07:37 node01 crmd: [25014]: info: abort_transition_graph:process_graph_event:477 - Triggered transition abort (complete=1,tag=lrm_rsc_op, id=p_drbd_wwwdata:0_monitor_20000,magic=2:-2;15:11:8:6f0304c9-522b-4582-a26b-cffe24afe9e2, cib=0.349.10) :Old eventJul 11 11:07:37 node01 crmd: [25014]: WARN: update_failcount: Updatingfailcount for p_drbd_wwwdata:0 on node01 after failed monitor: rc=-2(update=value++, time=1310375257)

----- snip -----

The operation would have taken a few minutes to complete, but shouldn'thave had anylarger impact on the rest of the system. Increasing the monitor timeoutindefinitely

doesn't look like the way to go here.

Is there a way to ensure that the monitor operations return within areasonable

time-frame even in high load situations?
Or is there something fundamentally flawed in our setup?

Thanks in advance!


--
Sebastian Kaps

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] DRBD monitor time out in high I/O situations

Reply via email to