Hi Emmanuel,
>> look this parameters fast_io_fail_tmo,dev_loss_tmo maybe your watchdog timeout is too low and you took for more the 10 seconds from multipathd -k and show config I can see the values are: fast_io_fail_tmo 5 dev_loss_tmo 10 I'm recreating the SBD partition using 20 seconds for watchdog and 40 seconds for msgwait on one of the clusters, with these logging parameters enabled: SBD latency (SBD_OPTS="-W -v"), Qlogic hba (ql2xextended_error_logging=1), and scsi operations (echo 9411 > /proc/sys/dev/scsi/logging_level): at least I should see something more on the syslog console if/when the servers get rebooted by the watchdog, most of all if during the 20 seconds countdown Oracle, under monitoring too, is actively using its partition (on the same LUN as the SBD partition) or is stuck on I/O access. Thanks, andrea ------------------------------ Message: 6 Date: Fri, 3 May 2013 09:52:19 +0200 From: emmanuel segura <emi2f...@gmail.com> To: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> Subject: Re: [Pacemaker] Frequent SBD triggered server reboots Message-ID: <cae7pj3cx2ii5wh8f_zyb6m6pm0ynrnuiogyy-b9+eiv2530...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hello Andrea When you use multipath -v3 look this parameters fast_io_fail_tmo,dev_loss_tmo maybe your watchdog timeout is too low and you took for more the 10 seconds Thanks _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org