[Pacemaker] "probe" operations always use cluster default operation timeout

Ron Kerry Wed, 17 Nov 2010 16:29:27 -0800

I have noted a problem that exists in both SLE11-HAE and SLE11-HAE-SP1 distributions with the"probe" operation that takes place when openais is first started on a node to determine whether aresource is actively running or not.


Nov 17 17:47:07 gto2 lrmd: [13475]: debug: on_msg_perform_op: add an operation 
operation monitor[2]
  on ocf::cxfs::CXFS for client 13478, its parameters: crm_feature_set=[3.0.2]
  volnames=[dmfhome,dmfjrnls,dmfspool,dmftmp,diskmsp,data]
  CRM_meta_timeout=[20000]  to the operation list.
Nov 17 17:47:07 gto2 corosync[13452]:  [TOTEM ] mcasted message added to 
pending queue

Nov 17 17:47:07 gto2 crmd: [13478]: info: te_rsc_command: Initiating action 12: monitorCXFS_monitor_0 on gto3

Nov 17 17:47:07 gto2 lrmd: [13475]: info: rsc:CXFS:2: probe

Note that the timeout for this operation is 20s (20000ms). Note also that it is the monitoroperation for the resource that is actually called. The monitor operation timeout for this resourceis set to 60s. Even manually defining a "probe" operation for the resource with a longer timeout isnot effective. The timeout that is being used for this operation is the cluster default operationtimeout.


<primitive class="ocf" id="CXFS" provider="sgi" type="cxfs">
 <operations id="CXFS-operations">
  <op id="CXFS-op-monitor-60s" interval="60s" name="monitor" on-fail="restart" 
timeout="60s"/>

<op id="CXFS-op-start-0" interval="0" name="start" on-fail="restart" requires="fencing"timeout="600s"/>

  <op id="CXFS-op-stop-0" interval="0" name="stop" on-fail="fence" 
timeout="600s"/>
  <op id="CXFS-op-probe-0" interval="0" name="probe" timeout="600s"/>
 </operations>
 <instance_attributes id="CXFS-instance_attributes">

<nvpair id="CXFS-instance_attributes-volnames" name="volnames"value="dmfhome,dmfjrnls,dmfspool,dmftmp,diskmsp,data"/>

 </instance_attributes>
 <meta_attributes id="CXFS-meta_attributes">
  <nvpair id="CXFS-meta_attributes-resource-stickiness" name="resource-stickiness" 
value="1"/>
  <nvpair id="CXFS-meta_attributes-migration-threshold" name="migration-threshold" 
value="1"/>
 </meta_attributes>
</primitive>

It seems to me that this special "probe" operation should be using either the monitor operationtimeout for each specific resource or a separately define probe operation timeout for each specificresource. Always using the cluster default operation timeout is just wrong.


These are the rpm levels in my test cluster where I can repeat this problem 
(SLE11-HAE-SP1).

libopenais3-1.1.3-0.2.3
cluster-glue-1.0.6-0.3.7
drbd-heartbeat-8.3.8.1-0.2.9
libgssglue-devel-0.1-6.22
libglue2-1.0.6-0.3.7
openais-1.1.3-0.2.3
pacemaker-mgmt-2.0.0-0.3.10
libpacemaker3-1.1.2-0.6.1
pacemaker-mgmt-client-2.0.0-0.3.10
pacemaker-1.1.2-0.6.1
libgssglue1-0.1-6.22
drbd-pacemaker-8.3.8.1-0.2.9

I have two questions:
  1. Is this problem a known problem in the community and has it
     already been fixed? If so, by what mod?
  2. Is there somewhere I can go to keyword search a database of
     known problems/fixes so I need not bother the community
     answering these sorts of questions?

--

Ron Kerry         rke...@sgi.com

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] "probe" operations always use cluster default operation timeout

Reply via email to