Hi,

This is my first post to this list so if I'm doing this wrong, please be patient. I am using pacemaker-1.1.2-0.2.1 on sles11sp1. Thanks in advance for any help anyone can give me.


I have configured an HA group with a number of primitives. As part of adding
the primitives, we do a simple manual relocation of the group from one node to the other and back again after each prmitive has been added to the group to verfiy the config as we go. I had done this for 7 primitives.

After I added this last primitive, doing a relocation:

crm_resource -r dmfGroup -M -H node2

This results in the primitive that was added last (tmf), and which gets stopped first, failing becuase the periodic monitor operation for the resource doesn't get cancelled like the rest of the primitive's monitor ops do. I have added a snippet from the log. Notice that the VirtualIP and dskmspfs primitives get their monitor ops cancelled, but tmf didn't. There is a process_pe_message with an error during what appears to be the tmf primitive processing, but I can find no errors using crm_verify no mattter how many V's is use to indicate a problem with tmf's config. the tmf resource works fine until it is time to relocate.

Does anyone have any clues for me where to look next ?

Here is tmf's xml:

       <primitive class="ocf" id="tmf" provider="sgi" type="tmf">
         <operations id="tmf-operations">
<op id="tmf-op-monitor-30s" interval="30s" name="monitor" on-fail="restart" start-delay="480s" timeout="30s"/> <op id="tmf-op-start-0" interval="0" name="start" on-fail="restart" requires="fencing" timeout="480s"/> <op id="tmf-op-stop-0" interval="0" name="stop" on-fail="fence" timeout="480s"/>
         </operations>
         <instance_attributes id="tmf-instance_attributes">
<nvpair id="tmf-instance_attributes-devgrpnames" name="devgrpnames" value="drives"/> <nvpair id="tmf-instance_attributes-mindevsup" name="mindevsup" value="1"/> <nvpair id="tmf-instance_attributes-devtimeout" name="devtimeout" value="240"/> <nvpair id="tmf-instance_attributes-loader_names" name="loader_names" value="sl3000"/> <nvpair id="tmf-instance_attributes-loader_hosts" name="loader_hosts" value="acsls1"/> <nvpair id="tmf-instance_attributes-loader_users" name="loader_users" value="root,acssa"/> <nvpair id="tmf-instance_attributes-loader_passwords" name="loader_passwords" value="weasel!,dog3ear"/>
         </instance_attributes>
         <meta_attributes id="tmf-meta_attributes">
<nvpair id="tmf-meta_attributes-resource-stickiness" name="resource-stickiness" value="1"/> <nvpair id="tmf-meta_attributes-migration-threshold" name="migration-threshold" value="1"/>
         </meta_attributes>
       </primitive>

Here's the log:

Sep 21 10:35:45 pry crm_resource: [14746]: info: Invoked: crm_resource -r dmfGroup -M -H punch Sep 21 10:35:45 pry cib: [5597]: info: cib_process_request: Operation complete: op cib_delete for section constraints (origin=local/crm_resource/3, version=0.85.17): ok (rc=0) Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: - <cib admin_epoch="0" epoch="85" num_updates="17" /> Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + <cib admin_epoch="0" epoch="86" num_updates="1" > Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + <configuration > Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + <constraints > Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + <rsc_location id="cli-prefer-dmfGroup" rsc="dmfGroup" __crm_diff_marker__="added:top" > Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + <rule id="cli-prefer-rule-dmfGroup" score="INFINITY" boolean-op="and" > Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + <expression id="cli-prefer-expr-dmfGroup" attribute="#uname" operation="eq" value="punch" type="string" /> Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + </rule> Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + </rsc_location> Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + </constraints> Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + </configuration>
Sep 21 10:35:45 pry cib: [5597]: info: log_data_element: cib:diff: + </cib>
Sep 21 10:35:45 pry cib: [5597]: info: cib_process_request: Operation complete: op cib_modify for section constraints (origin=local/crm_resource/4, version=0.86.1): ok (rc=0) Sep 21 10:35:45 pry crmd: [5601]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change Sep 21 10:35:46 pry crmd: [5601]: info: need_abort: Aborting on change to admin_epoch Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. Sep 21 10:35:46 pry crmd: [5601]: info: do_pe_invoke: Query 258: Requesting the current CIB: S_POLICY_ENGINE Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to 0x2000000001801b6d, ip=0x200000000042e830 Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to 0x2000000001801b7d, ip=0x2000000000430dd0 Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to 0x2000000001801b6d, ip=0x2000000000431021 Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to 0x2000000001801b6d, ip=0x2000000000431241 Sep 21 10:35:46 pry kernel: cib(5597): unaligned access to 0x2000000001801b6d, ip=0x2000000000431401 Sep 21 10:35:46 pry crmd: [5601]: info: do_pe_invoke_callback: Invoking the PE: query=258, ref=pe_calc-dc-1285083346-211, seq=156, quorate=1 Sep 21 10:35:46 pry pengine: [5600]: info: unpack_config: Startup probes: enabled Sep 21 10:35:46 pry pengine: [5600]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 21 10:35:46 pry pengine: [5600]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Sep 21 10:35:46 pry pengine: [5600]: info: unpack_domains: Unpacking domains
Sep 21 10:35:46 pry pengine: [5600]: info: determine_online_status: Node pry is online Sep 21 10:35:46 pry pengine: [5600]: info: determine_online_status: Node punch is online Sep 21 10:35:46 pry pengine: [5600]: notice: group_print: Resource Group: dmfGroup Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: local_xvm (ocf::sgi:lxvm): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: dmfusr1fs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: dmfusr2fs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: homefs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry cib: [14747]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-86.raw Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: journalfs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: spoolfs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: tmpfs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: movefs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: dskmspfs (ocf::heartbeat:Filesystem): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: VirtualIP (ocf::heartbeat:IPaddr2): Started pry Sep 21 10:35:46 pry pengine: [5600]: notice: native_print: tmf (ocf::sgi:tmf): Started pry Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (30s) for local_xvm on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for dmfusr1fs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for dmfusr2fs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for homefs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for journalfs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for spoolfs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for tmpfs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for movefs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (20s) for dskmspfs on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (10s) for VirtualIP on punch Sep 21 10:35:46 pry pengine: [5600]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense Sep 21 10:35:46 pry pengine: [5600]: notice: RecurringOp: Start recurring monitor (30s) for tmf on punch Sep 21 10:35:46 pry cib: [14747]: info: write_cib_contents: Wrote version 0.86.0 of the CIB to disk (digest: 58aef84a6b3b4a14ca7ed17f167b6fcd) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource local_xvm (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource dmfusr1fs (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource dmfusr2fs (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource homefs (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource journalfs (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource spoolfs (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource tmpfs (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource movefs (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource dskmspfs (Started pry -> punch) Sep 21 10:35:46 pry cib: [14747]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.3aJ5F5 (digest: /var/lib/heartbeat/crm/cib.TlYNDx) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource VirtualIP (Started pry -> punch) Sep 21 10:35:46 pry pengine: [5600]: notice: LogActions: Move resource tmf (Started pry -> punch) Sep 21 10:35:46 pry crmd: [5601]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Sep 21 10:35:46 pry crmd: [5601]: info: unpack_graph: Unpacked transition 52: 38 actions in 38 synapses Sep 21 10:35:46 pry crmd: [5601]: info: do_te_invoke: Processing graph 52 (ref=pe_calc-dc-1285083346-211) derived from /var/lib/pengine/pe-input-110.bz2 Sep 21 10:35:46 pry crmd: [5601]: info: te_pseudo_action: Pseudo action 51 fired and confirmed Sep 21 10:35:46 pry crmd: [5601]: info: te_rsc_command: Initiating action 46: stop tmf_stop_0 on pry (local) Sep 21 10:35:46 pry crmd: [5601]: info: do_lrm_rsc_op: Performing key=46:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=tmf_stop_0 )
Sep 21 10:35:46 pry lrmd: [5598]: info: rsc:tmf:74: stop
Sep 21 10:35:46 pry pengine: [5600]: info: process_pe_message: Transition 52: PEngine Input stored in: /var/lib/pengine/pe-input-110.bz2 Sep 21 10:35:46 pry pengine: [5600]: info: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. Sep 21 10:35:46 pry TapeReleased[8042]: |$(413) pci0002:00:01.1/fc/500104f000acc66d-500104f000acc66e/lun0, pid=14775, device released Sep 21 10:35:46 pry tmf[14748]: INFO: stop_all_tmf_resources(): successfully stopped all devices
Sep 21 10:35:46 pry tmf[14748]: INFO: stop_tmf(): tmf successfully stopped
Sep 21 10:35:46 pry crmd: [5601]: info: process_lrm_event: LRM operation tmf_stop_0 (call=74, rc=0, cib-update=259, confirmed=true) ok Sep 21 10:35:46 pry crmd: [5601]: info: match_graph_event: Action tmf_stop_0 (46) confirmed on pry (rc=0) Sep 21 10:35:46 pry crmd: [5601]: info: te_rsc_command: Initiating action 43: stop VirtualIP_stop_0 on pry (local) Sep 21 10:35:46 pry lrmd: [5598]: info: cancel_op: operation monitor[68] on ocf::IPaddr2::VirtualIP for client 5601, its parameters: CRM_meta_interval=[10000] cidr_netmask=[24] nic=[eth0] broadcast=[128.162.246.255] crm_feature_set=[3.0.2] ip=[128.162.246.8] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_start_delay=[90000] CRM_meta_timeout=[110000] CRM_meta_requires=[fencing] cancelled Sep 21 10:35:46 pry crmd: [5601]: info: do_lrm_rsc_op: Performing key=43:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=VirtualIP_stop_0 )
Sep 21 10:35:46 pry lrmd: [5598]: info: rsc:VirtualIP:75: stop
Sep 21 10:35:46 pry crmd: [5601]: info: process_lrm_event: LRM operation VirtualIP_monitor_10000 (call=68, status=1, cib-update=0, confirmed=true) Cancelled
Sep 21 10:35:47 pry IPaddr2[14796]: INFO: IP status = ok, IP_CIP=
Sep 21 10:35:47 pry IPaddr2[14796]: INFO: ip -f inet addr delete 128.162.246.8/24 dev eth0 Sep 21 10:35:47 pry crmd: [5601]: info: process_lrm_event: LRM operation VirtualIP_stop_0 (call=75, rc=0, cib-update=260, confirmed=true) ok Sep 21 10:35:47 pry crmd: [5601]: info: match_graph_event: Action VirtualIP_stop_0 (43) confirmed on pry (rc=0) Sep 21 10:35:47 pry crmd: [5601]: info: te_rsc_command: Initiating action 40: stop dskmspfs_stop_0 on pry (local) Sep 21 10:35:47 pry lrmd: [5598]: info: cancel_op: operation monitor[66] on ocf::Filesystem::dskmspfs for client 5601, its parameters: CRM_meta_requires=[fencing] fstype=[xfs] device=[/dev/lxvm/tp9400_11_12_dmfbases5] crm_feature_set=[3.0.2] options=[rw,dmapi,mtpt=/dmf/dskmsp_store] directory=[/dmf/dskmsp_store] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_start_delay=[60000] CRM_meta_interval=[20000] CRM_meta_timeout=[100000] cancelled Sep 21 10:35:47 pry crmd: [5601]: info: do_lrm_rsc_op: Performing key=40:52:0:bb6ff40d-ddaf-4b6f-9d79-e06944f1dac5 op=dskmspfs_stop_0 )
Sep 21 10:35:47 pry lrmd: [5598]: info: rsc:dskmspfs:76: stop
Sep 21 10:35:47 pry crmd: [5601]: info: process_lrm_event: LRM operation dskmspfs_monitor_20000 (call=66, status=1, cib-update=0, confirmed=true) Cancelled


--
        Phil Armstrong       p...@sgi.com
        Phone: 651-683-5561  VNET 233-5561


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to