On 2011-07-12T10:37:47, Sebastian Kaps <sebastian.k...@imail.de> wrote:
Hi Sebastian, > Our goal is to create an Active/Standby MySQL cluster with the > databases being > on the XFS filesystem. The OCFS2 FS is supposed to store data that > is created by > scripts that access the MySQL server database. That sounds perfectly viable. > The problem with the setup is that the DRBD monitor operation seem > to time out in situations with high I/O load, That shouldn't happen, obviously. The question is why it does; do you see high network traffic during these times? How's the performance of DRBD in general? Is DRBD's backing device on the same local disk as the system itself? If so, then they might impact each other. > triggering a Failover-attempt followed by one node getting STONITH'd > since the file system is still busy running > the operation that caused this in the first place. Well, in theory, the Filesystem RA should kill everything before trying to umount, so assuming you have constraints as well, at least the STONITH shouldn't happen, either. > ----- snip ----- > Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_mysql:0:39: > monitor > Jul 11 11:06:14 node01 lrmd: [25011]: info: rsc:p_drbd_wwwdata:0:38: > monitor > Jul 11 11:06:29 node01 mysql[6665]: INFO: MySQL monitor succeeded > Jul 11 11:07:37 node01 lrmd: [25011]: WARN: p_drbd_wwwdata:0:monitor > process (PID 6776) timed out (try 1). Killing with signal SIGTERM > (15). drbd's monitor operation is not that heavy-weight; I can't immediately see why the IO load on the file system it hosts should affect it so badly. As a work-around, increasing the timeout is fine - gather some statistics as to how long this actually does that to complete in a normal operation and under load, and then tune that. You can either file a support ticket with Novell/SUSE (for addressing the DRBD slowdown), or if you want to continue to pursue the community angle, the drbd mailing lists are a better place for this than pacemaker - it's not a pacemaker issue. Good luck! Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker