Re: [Pacemaker] Frequent SBD triggered server reboots

andrea cuozzo Fri, 03 May 2013 03:25:48 -0700

Hi Emmanuel,


>> look this parameters fast_io_fail_tmo,dev_loss_tmo maybe your watchdog
timeout is too low and you took for more the 10 seconds

from multipathd -k and show config I can see the values are:

fast_io_fail_tmo 5
dev_loss_tmo 10

I'm recreating the SBD partition using 20 seconds for watchdog and 40
seconds for msgwait on one of the clusters, with these logging parameters
enabled: SBD latency (SBD_OPTS="-W -v"), Qlogic hba
(ql2xextended_error_logging=1), and scsi operations (echo 9411 >
/proc/sys/dev/scsi/logging_level): at least I should see something more on
the syslog console if/when the servers get rebooted by the watchdog, most of
all if during the 20 seconds countdown Oracle, under monitoring too, is
actively using its partition (on the same LUN as the SBD partition) or is
stuck on I/O access.

Thanks,

andrea

------------------------------

Message: 6
Date: Fri, 3 May 2013 09:52:19 +0200
From: emmanuel segura <emi2f...@gmail.com>
To: The Pacemaker cluster resource manager
        <pacemaker@oss.clusterlabs.org>
Subject: Re: [Pacemaker] Frequent SBD triggered server reboots
Message-ID:
        <cae7pj3cx2ii5wh8f_zyb6m6pm0ynrnuiogyy-b9+eiv2530...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello Andrea

When you use multipath -v3 look this parameters
fast_io_fail_tmo,dev_loss_tmo maybe your watchdog timeout is too low and you
took for more the 10 seconds

Thanks




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Frequent SBD triggered server reboots

Reply via email to