Hi Tim,

On 11/17/2010 9:33 PM, Tim Serong wrote:
Hi Ron,

On 11/18/2010 at 11:26 AM, Ron Kerry <rke...@sgi.com> wrote:
 > I have noted a problem that exists in both SLE11-HAE and SLE11-HAE-SP1
 > distributions with the
 > "probe" operation that takes place when openais is first started on a node
 > to determine whether a
 > resource is actively running or not.
 >
 > Nov 17 17:47:07 gto2 lrmd: [13475]: debug: on_msg_perform_op: add an
 > operation operation monitor[2]
 > on ocf::cxfs::CXFS for client 13478, its parameters:
 > crm_feature_set=[3.0.2]
 > volnames=[dmfhome,dmfjrnls,dmfspool,dmftmp,diskmsp,data]
 > CRM_meta_timeout=[20000] to the operation list.
 > Nov 17 17:47:07 gto2 corosync[13452]: [TOTEM ] mcasted message added to
 > pending queue
 > Nov 17 17:47:07 gto2 crmd: [13478]: info: te_rsc_command: Initiating action
 > 12: monitor
 > CXFS_monitor_0 on gto3
 > Nov 17 17:47:07 gto2 lrmd: [13475]: info: rsc:CXFS:2: probe
 >
 > Note that the timeout for this operation is 20s (20000ms). Note also that it
 > is the monitor
 > operation for the resource that is actually called. The monitor operation
 > timeout for this resource
 > is set to 60s. Even manually defining a "probe" operation for the resource
 > with a longer timeout is
 > not effective. The timeout that is being used for this operation is the
 > cluster default operation
 > timeout.

A probe is a special case of the monitor op, with an interval of 0.
Try configuring it like this:

primitive CXFS ocf:sgi:cxfs \
op monitor interval="60s" timeout="60s" \
op start timeout="600s" \
op stop timeout="600s" \
op monitor interval="0" timeout="600s"

The timeout of 600s on the monitor op with the interval of zero should
thus be used when doing the probe. The timeout of 60s should be used
on the recurring monitor op with the 60s interval.


This works like a charm!


Nov 18 06:27:36 prod lrmd: [4565]: debug: on_msg_perform_op: add an operation
operation monitor[2] on ocf::cxfs::CXFS for client 4568, its parameters:
CRM_meta_op_target_rc=[7] CRM_meta_start_delay=[0]
volnames=[lun3s0,lun3s1,lun3s2,lun3s3,lun3s4,lun0s0,lun0s1,lun2s0]
CRM_meta_timeout=[600000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor]  to
the operation list.

The probe operation timeout is 600s even though my cluster default operation 
timeout is set to 20s.

Thanks again!   - Ron

--

Ron Kerry         rke...@sgi.com

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to