> Hi,
> 
> On Tue, Dec 18, 2012 at 10:58:18AM +0000, James Harper wrote:
> > For the following failure:
> >
> > Failed actions:
> >     p_lvm_iscsi:0_monitor_10000 (node=bitvs6, call=57, rc=-2,
> > status=Timed Out): unknown exec error
> >
> > Is this the ra itself returning a "Timed Out" error, or is it the
> > cluster software determining that the ra is taking too long and so
> > killing it and declaring it failed? stonith kicks in
> 
> The latter.
> 
> > shortly after this happens so tracking it down is a bit of a pain.
> 
> Is it expected? Normally, a monitor failing should cause a resource restart. 
> If
> a resource fails to stop, it may be a resource agent bug.
> 
> > It happens any time the system gets loaded (eg when making a config
> > change)
> 
> What kind of change?
> 
> > and I can't seem to put my finger on what is causing it.
> 
> Which resource is that? Which version of resource agents do you run?
> 

Any cib change throws the system load up for 10-20 seconds, and then things 
start timing out, despite having set the timeouts well in excess of the time it 
takes for pacemaker to mark the resource as timed out.

All packages are from debian wheezy.

James

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to