Re: [Pacemaker] DRBD monitor time out in high I/O situations

Lars Marowsky-Bree Tue, 12 Jul 2011 03:17:05 -0700

On 2011-07-12T10:37:47, Sebastian Kaps <sebastian.k...@imail.de> wrote:


Hi Sebastian,

> Our goal is to create an Active/Standby MySQL cluster with the
> databases being
> on the XFS filesystem. The OCFS2 FS is supposed to store data that
> is created by
> scripts that access the MySQL server database.

That sounds perfectly viable.

> The problem with the setup is that the DRBD monitor operation seem
> to time out in situations with high I/O load,

That shouldn't happen, obviously. The question is why it does; do you
see high network traffic during these times? How's the performance of
DRBD in general?

Is DRBD's backing device on the same local disk as the system itself? If
so, then they might impact each other.

> triggering a Failover-attempt followed by one node getting STONITH'd
> since the file system is still busy running
> the operation that caused this in the first place.

Well, in theory, the Filesystem RA should kill everything before trying
to umount, so assuming you have constraints as well, at least the
STONITH shouldn't happen, either.

> ----- snip -----
> Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_mysql:0:39:
> monitor
> Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_wwwdata:0:38:
> monitor
> Jul 11 11:06:29 node01 mysql[6665]: INFO: MySQL monitor succeeded
> Jul 11 11:07:37 node01 lrmd: [25011]: WARN: p_drbd_wwwdata:0:monitor
> process (PID 6776) timed out (try 1).  Killing with signal SIGTERM
> (15).

drbd's monitor operation is not that heavy-weight; I can't immediately
see why the IO load on the file system it hosts should affect it so
badly.

As a work-around, increasing the timeout is fine - gather some
statistics as to how long this actually does that to complete in a
normal operation and under load, and then tune that.

You can either file a support ticket with Novell/SUSE (for addressing
the DRBD slowdown), or if you want to continue to pursue the community
angle, the drbd mailing lists are a better place for this than
pacemaker - it's not a pacemaker issue.

Good luck!


Regards,
    Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] DRBD monitor time out in high I/O situations

Reply via email to