[Pacemaker] monitor operation cancel question

Phil Armstrong Tue, 21 Sep 2010 12:03:10 -0700

Hi,

This is my first post to this list so if I'm doing this wrong, please bepatient. I am using pacemaker-1.1.2-0.2.1 on sles11sp1. Thanks inadvance for any help anyone can give me.



I have configured an HA group with a number of primitives. As part of adding

the primitives, we do a simple manual relocation of the group from onenode to the other and back again after each prmitive has been added tothe group to verfiy the config as we go. I had done this for 7 primitives.


After I added this last primitive, doing a relocation:

crm_resource -r dmfGroup -M -H node2

This results in the primitive that was added last (tmf), and which getsstopped first, failing becuase the periodic monitor operation for theresource doesn't get cancelled like the rest of the primitive's monitorops do. I have added a snippet from the log. Notice that the VirtualIPand dskmspfs primitives get their monitor ops cancelled, but tmfdidn't. There is a process_pe_message with an error during what appearsto be the tmf primitive processing, but I can find no errors usingcrm_verify no mattter how many V's is use to indicate a problem withtmf's config. the tmf resource works fine until it is time to relocate.


Does anyone have any clues for me where to look next ?

Here is tmf's xml:

       <primitive class="ocf" id="tmf" provider="sgi" type="tmf">
         <operations id="tmf-operations">

<op id="tmf-op-monitor-30s" interval="30s" name="monitor"on-fail="restart" start-delay="480s" timeout="30s"/><op id="tmf-op-start-0" interval="0" name="start"on-fail="restart" requires="fencing" timeout="480s"/><op id="tmf-op-stop-0" interval="0" name="stop"on-fail="fence" timeout="480s"/>

         </operations>
         <instance_attributes id="tmf-instance_attributes">

<nvpair id="tmf-instance_attributes-devgrpnames"name="devgrpnames" value="drives"/><nvpair id="tmf-instance_attributes-mindevsup"name="mindevsup" value="1"/><nvpair id="tmf-instance_attributes-devtimeout"name="devtimeout" value="240"/><nvpair id="tmf-instance_attributes-loader_names"name="loader_names" value="sl3000"/><nvpair id="tmf-instance_attributes-loader_hosts"name="loader_hosts" value="acsls1"/><nvpair id="tmf-instance_attributes-loader_users"name="loader_users" value="root,acssa"/><nvpair id="tmf-instance_attributes-loader_passwords"name="loader_passwords" value="weasel!,dog3ear"/>

         </instance_attributes>
         <meta_attributes id="tmf-meta_attributes">

<nvpair id="tmf-meta_attributes-resource-stickiness"name="resource-stickiness" value="1"/><nvpair id="tmf-meta_attributes-migration-threshold"name="migration-threshold" value="1"/>

         </meta_attributes>
       </primitive>

Here's the log:

Sep 21 10:35:45 pry crm_resource: [14746]: info: Invoked: crm_resource-r dmfGroup -M -H punchSep 21 10:35:45 pry cib: [5597]: info: cib_process_request: Operationcomplete: op cib_delete for section constraints(origin=local/crm_resource/3, version=0.85.17): ok (rc=0)Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: -<cib admin_epoch="0" epoch="85" num_updates="17" />Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +<cib admin_epoch="0" epoch="86" num_updates="1" >Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +<configuration >Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +<constraints >Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:+ <rsc_location id="cli-prefer-dmfGroup" rsc="dmfGroup"__crm_diff_marker__="added:top" >Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:+ <rule id="cli-prefer-rule-dmfGroup" score="INFINITY"boolean-op="and" >Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:+ <expression id="cli-prefer-expr-dmfGroup" attribute="#uname"operation="eq" value="punch" type="string" />Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:+ </rule>Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff:+ </rsc_location>Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +</constraints>Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: +</configuration>

Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + </cib>

Sep 21 10:35:45 pry cib: [5597]: info: cib_process_request: Operationcomplete: op cib_modify for section constraints(origin=local/crm_resource/4, version=0.86.1): ok (rc=0)Sep 21 10:35:45 pry crmd: [5601]: info: abort_transition_graph:need_abort:59 - Triggered transition abort (complete=1) : Non-status changeSep 21 10:35:46 pry crmd: [5601]: info: need_abort: Aborting on changeto admin_epochSep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: Statetransition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALCcause=C_FSA_INTERNAL origin=abort_transition_graph ]Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: All 2cluster nodes are eligible to run resources.Sep 21 10:35:46 pry crmd: [5601]: info: do_pe_invoke: Query 258:Requesting the current CIB: S_POLICY_ENGINESep 21 10:35:46 pry kernel: cib(5597): unaligned access to0x2000000001801b6d, ip=0x200000000042e830Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to0x2000000001801b7d, ip=0x2000000000430dd0Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to0x2000000001801b6d, ip=0x2000000000431021Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to0x2000000001801b6d, ip=0x2000000000431241Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to0x2000000001801b6d, ip=0x2000000000431401Sep 21 10:35:46 pry crmd: [5601]: info: do_pe_invoke_callback: Invokingthe PE: query=258, ref=pe_calc-dc-1285083346-211, seq=156, quorate=1Sep 21 10:35:46 pry pengine: [5600]: info: unpack_config: Startupprobes: enabledSep 21 10:35:46 pry pengine: [5600]: notice: unpack_config: On loss ofCCM Quorum: IgnoreSep 21 10:35:46 pry pengine: [5600]: info: unpack_config: Node scores:'red' = -INFINITY, 'yellow' = 0, 'green' = 0

Sep 21 10:35:46 pry pengine: [5600]: info: unpack_domains: Unpacking domains

Sep 21 10:35:46 pry pengine: [5600]: info: determine_online_status: Nodepry is onlineSep 21 10:35:46 pry pengine: [5600]: info: determine_online_status: Nodepunch is onlineSep 21 10:35:46 pry pengine: [5600]: notice: group_print: ResourceGroup: dmfGroupSep 21 10:35:46 pry pengine: [5600]: notice: native_print:local_xvm (ocf::sgi:lxvm): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print:dmfusr1fs (ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print:dmfusr2fs (ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print: homefs(ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry cib: [14747]: info: write_cib_contents: Archivedprevious version as /var/lib/heartbeat/crm/cib-86.rawSep 21 10:35:46 pry pengine: [5600]: notice: native_print:journalfs (ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print: spoolfs(ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print: tmpfs(ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print: movefs(ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print:dskmspfs (ocf::heartbeat:Filesystem): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print:VirtualIP (ocf::heartbeat:IPaddr2): Started prySep 21 10:35:46 pry pengine: [5600]: notice: native_print: tmf(ocf::sgi:tmf): Started prySep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (30s) for local_xvm on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for dmfusr1fs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for dmfusr2fs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for homefs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for journalfs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for spoolfs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for tmpfs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for movefs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (20s) for dskmspfs on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (10s) for VirtualIP on punchSep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifyingon_fail=fence and stonith-enabled=false makes no senseSep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Startrecurring monitor (30s) for tmf on punchSep 21 10:35:46 pry cib: [14747]: info: write_cib_contents: Wroteversion 0.86.0 of the CIB to disk (digest: 58aef84a6b3b4a14ca7ed17f167b6fcd)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcelocal_xvm (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcedmfusr1fs (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcedmfusr2fs (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcehomefs (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcejournalfs (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcespoolfs (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcetmpfs (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcemovefs (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcedskmspfs (Started pry -> punch)Sep 21 10:35:46 pry cib: [14747]: info: retrieveCib: Reading clusterconfiguration from: /var/lib/heartbeat/crm/cib.3aJ5F5 (digest:/var/lib/heartbeat/crm/cib.TlYNDx)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourceVirtualIP (Started pry -> punch)Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resourcetmf (Started pry -> punch)Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: Statetransition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESScause=C_IPC_MESSAGE origin=handle_response ]Sep 21 10:35:46 pry crmd: [5601]: info: unpack_graph: Unpackedtransition 52: 38 actions in 38 synapsesSep 21 10:35:46 pry crmd: [5601]: info: do_te_invoke: Processing graph52 (ref=pe_calc-dc-1285083346-211) derived from/var/lib/pengine/pe-input-110.bz2Sep 21 10:35:46 pry crmd: [5601]: info: te_pseudo_action: Pseudo action51 fired and confirmedSep 21 10:35:46 pry crmd: [5601]: info: te_rsc_command: Initiatingaction 46: stop tmf_stop_0 on pry (local)Sep 21 10:35:46 pry crmd: [5601]: info: do_lrm_rsc_op: Performingkey=46:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=tmf_stop_0 )

Sep 21 10:35:46 pry lrmd: [5598]: info: rsc:tmf:74: stop

Sep 21 10:35:46 pry pengine: [5600]: info: process_pe_message:Transition 52: PEngine Input stored in: /var/lib/pengine/pe-input-110.bz2Sep 21 10:35:46 pry pengine: [5600]: info: process_pe_message:Configuration ERRORs found during PE processing. Please run "crm_verify-L" to identify issues.Sep 21 10:35:46 pry TapeReleased[8042]: |$(413)pci0002:00:01.1/fc/500104f000acc66d-500104f000acc66e/lun0, pid=14775,device releasedSep 21 10:35:46 pry tmf[14748]: INFO: stop_all_tmf_resources():successfully stopped all devices

Sep 21 10:35:46 pry tmf[14748]: INFO: stop_tmf(): tmf successfully stopped

Sep 21 10:35:46 pry crmd: [5601]: info: process_lrm_event: LRM operationtmf_stop_0 (call=74, rc=0, cib-update=259, confirmed=true) okSep 21 10:35:46 pry crmd: [5601]: info: match_graph_event: Actiontmf_stop_0 (46) confirmed on pry (rc=0)Sep 21 10:35:46 pry crmd: [5601]: info: te_rsc_command: Initiatingaction 43: stop VirtualIP_stop_0 on pry (local)Sep 21 10:35:46 pry lrmd: [5598]: info: cancel_op: operation monitor[68]on ocf::IPaddr2::VirtualIP for client 5601, its parameters:CRM_meta_interval=[10000] cidr_netmask=[24] nic=[eth0]broadcast=[128.162.246.255] crm_feature_set=[3.0.2] ip=[128.162.246.8]CRM_meta_on_fail=[restart] CRM_meta_name=[monitor]CRM_meta_start_delay=[90000] CRM_meta_timeout=[110000]CRM_meta_requires=[fencing] cancelledSep 21 10:35:46 pry crmd: [5601]: info: do_lrm_rsc_op: Performingkey=43:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=VirtualIP_stop_0 )

Sep 21 10:35:46 pry lrmd: [5598]: info: rsc:VirtualIP:75: stop

Sep 21 10:35:46 pry crmd: [5601]: info: process_lrm_event: LRM operationVirtualIP_monitor_10000 (call=68, status=1, cib-update=0,confirmed=true) Cancelled

Sep 21 10:35:47 pry IPaddr2[14796]: INFO: IP status = ok, IP_CIP=

Sep 21 10:35:47 pry IPaddr2[14796]: INFO: ip -f inet addr delete128.162.246.8/24 dev eth0Sep 21 10:35:47 pry crmd: [5601]: info: process_lrm_event: LRM operationVirtualIP_stop_0 (call=75, rc=0, cib-update=260, confirmed=true) okSep 21 10:35:47 pry crmd: [5601]: info: match_graph_event: ActionVirtualIP_stop_0 (43) confirmed on pry (rc=0)Sep 21 10:35:47 pry crmd: [5601]: info: te_rsc_command: Initiatingaction 40: stop dskmspfs_stop_0 on pry (local)Sep 21 10:35:47 pry lrmd: [5598]: info: cancel_op: operation monitor[66]on ocf::Filesystem::dskmspfs for client 5601, its parameters:CRM_meta_requires=[fencing] fstype=[xfs]device=[/dev/lxvm/tp9400_11_12_dmfbases5] crm_feature_set=[3.0.2]options=[rw,dmapi,mtpt=/dmf/dskmsp_store] directory=[/dmf/dskmsp_store]CRM_meta_on_fail=[restart] CRM_meta_name=[monitor]CRM_meta_start_delay=[60000] CRM_meta_interval=[20000]CRM_meta_timeout=[100000] cancelledSep 21 10:35:47 pry crmd: [5601]: info: do_lrm_rsc_op: Performingkey=40:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=dskmspfs_stop_0 )

Sep 21 10:35:47 pry lrmd: [5598]: info: rsc:dskmspfs:76: stop

Sep 21 10:35:47 pry crmd: [5601]: info: process_lrm_event: LRM operationdskmspfs_monitor_20000 (call=66, status=1, cib-update=0, confirmed=true)Cancelled



--
        Phil Armstrong       p...@sgi.com
        Phone: 651-683-5561  VNET 233-5561


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] monitor operation cancel question

Reply via email to