Thanks Emmanuel, using the -v3 switch I can see that multipath is querying my local /dev/sda disk too, which is unneded, so I blacklisted /dev/sda in multipath.conf and reloaded multipath; it migh not be related to my problem but it was a mistake indeed. Thanks. My vision on all the timouts involved in my scenario is:
- each hba has 1 second (qlport_down_retry=1 ) to manage a port down event and report upwards - multipathd has at most 5 seconds (polling_interval 5) to notice for path failures and do its job - sbd attempts to read its partition each second (Timeout (loop) :1) and has 10 seconds (timeout (watchdog):10 ) before the watchdog reboots the server (I'm assuming SBD doesn't feed the watchdog unless the reading attempt is succesfull). I can see this working as expected using the multipathd -k interactive console to fail and reinstate paths, and reading the relative multipathd messages on the syslog about path lost and reinstated, What makes me think my problem might not be multipath related is that there's no sign of port down or path lost messages in the syslog when the problem happens, there's just the sbd delay countdown. andrea Date: Thu, 2 May 2013 20:18:04 +0200 From: emmanuel segura <emi2f...@gmail.com> To: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> Subject: Re: [Pacemaker] Frequent SBD triggered server reboots Message-ID: <cae7pj3dgq4ksovr8svfk0ypregc67yg7prrjtvqop2ba8vc...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" if you think your problems are related to multipath timeout, try to use multipath -v3 and look well the sbd timeout Thanks _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org