Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Lars Marowsky-Bree
On 2011-07-06T15:06:01, Craig Lesle craig.le...@bruden.com wrote: Interesting that st_timeout does not show 75 seconds on any try and looks rather random, like it's calculated. ... right. I hadn't noticed that before. So what's happening is that, in pacemaker's fencing/remote.c, the

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Craig Lesle
... right. I hadn't noticed that before. So what's happening is that, in pacemaker's fencing/remote.c, the stonith-timeout specified is divided up in 10% for _querying_ the list of nodes a given stonith device can retrieve, and 90% for then performing an actual operation. (Compare

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Lars Marowsky-Bree
On 2011-07-07T05:40:23, Craig Lesle craig.le...@bruden.com wrote: Interesting. It would seem more intuitive for remote.c to add 10% to the specified value in order to get it's querying overhead accounted for. Now that I know about the query tax, will verify stonith-timeout is set to a

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Andrew Beekhof
On Thu, Jul 7, 2011 at 5:40 PM, Lars Marowsky-Bree l...@suse.de wrote: On 2011-07-06T15:06:01, Craig Lesle craig.le...@bruden.com wrote: Interesting that st_timeout does not show 75 seconds on any try and looks rather random, like it's calculated. ... right. I hadn't noticed that before.

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-06 Thread Lars Marowsky-Bree
On 2011-07-05T17:10:16, Craig Lesle craig.le...@bruden.com wrote: Hi Craig, Timeout (msgwait) : 90 stonith-timeout=100s \ You may want to increase stonith-timeout a bit further, to increase the difference between msgwait (the time that sbd will wait before returning) and the

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-06 Thread Craig Lesle
Hi Lars, / Timeout (msgwait) : 90 // stonith-timeout=100s \ / You may want to increase stonith-timeout a bit further, to increase the difference between msgwait (the time that sbd will wait before returning) and the cluster allowed time for that. But 10s should not be too little,

[Linux-HA] stonith-ng reboot returned 1

2011-07-05 Thread Craig Lesle
Greetings. Let me set up the scene. Two node cluster, SUSE11 SP1 updated/patched as of 7/1/2011. The sbd device and timeout values used are as follows. capep01:~ # sbd -d /dev/disk/by-id/scsi-36001438005de94646747 dump ==Dumping header on disk