Hi, On a test of Pacemaker before a little, the following problem happened.
* corosync 1.2.1 * Pacemaker-1-0-8463260ff667 * Reusable-Cluster-Components-c447fc25e119 * Cluster-Resource-Agents-f92935082277 A problem is that the monitor error of the prmFsPostgreSQLDB3-2 resource that stopped occurs after a stop. This problem does not always occur. [r...@srv02 ~]# crm_mon -1 ============ Last updated: Tue Apr 27 11:19:26 2010 Stack: openais Current DC: srv02 - partition with quorum Version: 1.0.8-8463260ff6674ad568e6bd3b2a84f03edce257f3 4 Nodes configured, 4 expected votes 13 Resources configured. ============ Online: [ srv02 srv03 srv04 ] OFFLINE: [ srv01 ] Resource Group: grpStonith1 prmStonithN1 (stonith:external/ssh): Started srv04 Resource Group: grpStonith2 prmStonithN2 (stonith:external/ssh): Started srv03 Resource Group: grpStonith3 prmStonithN3 (stonith:external/ssh): Started srv02 Resource Group: grpStonith4 prmStonithN4 (stonith:external/ssh): Started srv03 Clone Set: clnUMgroup01 Started: [ srv04 ] Stopped: [ clnUmResource:0 ] Clone Set: clnDiskd1 Started: [ srv02 srv03 srv04 ] Stopped: [ clnPrmDiskd1:2 ] Clone Set: clnG3dummy1 Started: [ srv02 srv03 srv04 ] Stopped: [ clnG3dummy01:0 ] Clone Set: clnG3dummy2 Started: [ srv02 srv03 srv04 ] Stopped: [ clnG3dummy02:0 ] Failed actions: prmFsPostgreSQLDB3-2_monitor_10000 (node=srv03, call=49, rc=7, status=complete): not running I think that I am caused by the fact that the elimination from the list of the monitor does not go well somehow or other. Possibly it may be caused by the practice of a postponed monitor. It is a problem to increase at the fail count of the monitor error after a resource stopped. This may cause strange resource placement. ----------------------------- Apr 27 11:09:19 srv03 lrmd: [24383]: info: rsc:prmFsPostgreSQLDB3-3:59: stop Apr 27 11:09:19 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmFsPostgreSQLDB3-3_monitor_10000 (call=51, status=1, cib-update=0, confirmed=true) Cancelled Apr 27 11:09:19 srv03 lrmd: [24383]: info: Managed prmFsPostgreSQLDB3-3:stop process 26210 exited with return code 0. Apr 27 11:09:19 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmFsPostgreSQLDB3-3_stop_0 (call=59, rc=0, cib-update=65, confirmed=true) ok Apr 27 11:09:20 srv03 lrmd: [24383]: info: cancel_op: operation monitor[49] on ocf::Dummy::prmFsPostgreSQLDB3-2 for client 24386, its parameters: CRM_meta_interval=[10000] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled Apr 27 11:09:20 srv03 lrmd: [24383]: info: flush_op: process for operation monitor[49] on ocf::Dummy::prmFsPostgreSQLDB3-2 for client 24386 still running, flush delayed Apr 27 11:09:20 srv03 crmd: [24386]: info: do_lrm_rsc_op: Performing key=88:8:0:2587fade-309b-454d-8a81-19e8884439ec op=prmFsPostgreSQLDB3-2_stop_0 ) Apr 27 11:09:20 srv03 lrmd: [24383]: info: perform_op:2873: operation monitor[49] on ocf::Dummy::prmFsPostgreSQLDB3-2 for client 24386, its parameters: CRM_meta_interval=[10000] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] for rsc is already running. Apr 27 11:09:20 srv03 lrmd: [24383]: info: perform_op:2883: postponing all ops on resource prmFsPostgreSQLDB3-2 by 1000 ms Apr 27 11:09:20 srv03 lrmd: [24383]: info: rsc:prmFsPostgreSQLDB3-2:60: stop Apr 27 11:09:20 srv03 lrmd: [24383]: info: Managed prmFsPostgreSQLDB3-2:stop process 26225 exited with return code 0. Apr 27 11:09:20 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmFsPostgreSQLDB3-2_stop_0 (call=60, rc=0, cib-update=66, confirmed=true) ok Apr 27 11:09:20 srv03 lrmd: [24383]: info: cancel_op: operation monitor[47] on ocf::Dummy::prmFsPostgreSQLDB3-1 for client 24386, its parameters: CRM_meta_interval=[10000] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled Apr 27 11:09:20 srv03 crmd: [24386]: info: do_lrm_rsc_op: Performing key=87:8:0:2587fade-309b-454d-8a81-19e8884439ec op=prmFsPostgreSQLDB3-1_stop_0 ) Apr 27 11:09:20 srv03 lrmd: [24383]: info: rsc:prmFsPostgreSQLDB3-1:61: stop Apr 27 11:09:20 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmFsPostgreSQLDB3-1_monitor_10000 (call=47, status=1, cib-update=0, confirmed=true) Cancelled Apr 27 11:09:20 srv03 lrmd: [24383]: info: Managed prmFsPostgreSQLDB3-1:stop process 26233 exited with return code 0. Apr 27 11:09:20 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmFsPostgreSQLDB3-1_stop_0 (call=61, rc=0, cib-update=67, confirmed=true) ok Apr 27 11:09:21 srv03 lrmd: [24383]: info: cancel_op: operation monitor[45] on ocf::Dummy::prmExPostgreSQLDB3 for client 24386, its parameters: CRM_meta_interval=[10000] CRM_meta_on_fail=[restart] CRM_meta_timeout=[90000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled Apr 27 11:09:21 srv03 crmd: [24386]: info: do_lrm_rsc_op: Performing key=86:8:0:2587fade-309b-454d-8a81-19e8884439ec op=prmExPostgreSQLDB3_stop_0 ) Apr 27 11:09:21 srv03 lrmd: [24383]: info: rsc:prmExPostgreSQLDB3:62: stop Apr 27 11:09:21 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmExPostgreSQLDB3_monitor_10000 (call=45, status=1, cib-update=0, confirmed=true) Cancelled Apr 27 11:09:21 srv03 lrmd: [24383]: info: Managed prmExPostgreSQLDB3:stop process 26241 exited with return code 0. Apr 27 11:09:21 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmExPostgreSQLDB3_stop_0 (call=62, rc=0, cib-update=68, confirmed=true) ok Apr 27 11:09:31 srv03 crmd: [24386]: info: process_lrm_event: LRM operation prmFsPostgreSQLDB3-2_monitor_10000 (call=49, rc=7, cib-update=69, confirmed=false) not running Apr 27 11:09:31 srv03 attrd: [24384]: info: attrd_ais_dispatch: Update relayed from srv01 Apr 27 11:09:31 srv03 attrd: [24384]: info: find_hash_entry: Creating hash entry for fail-count-prmFsPostgreSQLDB3-2 Apr 27 11:09:31 srv03 attrd: [24384]: info: attrd_local_callback: Expanded fail-count-prmFsPostgreSQLDB3-2=value++ to 1 Apr 27 11:09:31 srv03 attrd: [24384]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-prmFsPostgreSQLDB3-2 (1) Apr 27 11:09:31 srv03 attrd: [24384]: info: attrd_perform_update: Sent update 52: fail-count-prmFsPostgreSQLDB3-2=1 ----------------------------- I do not confirm this problem with the most recent version. Possibly it may be the problem that somebody already reported. Best Regards, Hideo Yamauchi. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf