Hi Andrei, Thanks for your quickly reply. Still need help as below :
On Wed, Jun 6, 2018 at 11:58 AM, Andrei Borzenkov <arvidj...@gmail.com> wrote: > 06.06.2018 04:27, Albert Weng пишет: > > Hi All, > > > > I have created active/passive pacemaker cluster on RHEL 7. > > > > Here are my environment: > > clustera : 192.168.11.1 (passive) > > clusterb : 192.168.11.2 (master) > > clustera-ilo4 : 192.168.11.10 > > clusterb-ilo4 : 192.168.11.11 > > > > cluster resource status : > > cluster_fs started on clusterb > > cluster_vip started on clusterb > > cluster_sid started on clusterb > > cluster_listnr started on clusterb > > > > Both cluster node are online status. > > > > i found my corosync.log contain many records like below: > > > > clustera pengine: info: determine_online_status_fencing: > > Node clusterb is active > > clustera pengine: info: determine_online_status: Node > > clusterb is online > > clustera pengine: info: determine_online_status_fencing: > > Node clustera is active > > clustera pengine: info: determine_online_status: Node > > clustera is online > > > > *clustera pengine: warning: unpack_rsc_op_failure: Processing > > failed op start for cluster_sid on clustera: unknown error (1)* > > *=> Question : Why pengine always trying to start cluster_sid on the > > passive node? how to fix it? * > > > > pacemaker does not have concept of "passive" or "master" node - it is up > to you to decide when you configure resource placement. By default > pacemaker will attempt to spread resources across all eligible nodes. > You can influence node selection by using constraints. See > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/ > 1.1/html/Pacemaker_Explained/_deciding_which_nodes_a_ > resource_can_run_on.html > for details. > > But in any case - all your resources MUST be capable of running of both > nodes, otherwise cluster makes no sense. If one resource A depends on > something that another resource B provides and can be started only > together with resource B (and after it is ready) - you must tell it to > pacemaker by using resource colocations and ordering. See same document > for details. > > > clustera pengine: info: native_print: ipmi-fence-clustera > > (stonith:fence_ipmilan): Started clustera > > clustera pengine: info: native_print: ipmi-fence-clusterb > > (stonith:fence_ipmilan): Started clustera > > clustera pengine: info: group_print: Resource Group: > cluster > > clustera pengine: info: native_print: cluster_fs > > (ocf::heartbeat:Filesystem): Started clusterb > > clustera pengine: info: native_print: cluster_vip > > (ocf::heartbeat:IPaddr2): Started clusterb > > clustera pengine: info: native_print: cluster_sid > > (ocf::heartbeat:oracle): Started clusterb > > clustera pengine: info: native_print: > > cluster_listnr (ocf::heartbeat:oralsnr): Started clusterb > > clustera pengine: info: get_failcount_full: cluster_sid > has > > failed INFINITY times on clustera > > > > > > *clustera pengine: warning: common_apply_stickiness: > Forcing > > cluster_sid away from clustera after 1000000 failures (max=1000000)* > > *=> Question: too much trying result in forbid the resource start on > > clustera ?* > > > > Yes. > How to find out the root cause of 1000000 failures? which log will contain the error message? > > > Couple days ago, the clusterb has been stonith by unknown reason, but > only > > "cluster_fs", "cluster_vip" moved to clustera successfully, but > > "cluster_sid" and "cluster_listnr" go to "STOP" status. > > like below messages, is it related with "op start for cluster_sid on > > clustera..." ? > > > > Yes. Node clustera is now marked as being incapable of running resource > so if node cluaterb fails, resource cannot be started anywhere. > > How could i fix it? i need some hint for troubleshooting. > > clustera pengine: warning: unpack_rsc_op_failure: Processing failed > op > > start for cluster_sid on clustera: unknown error (1) > > clustera pengine: info: native_print: ipmi-fence-clustera > > (stonith:fence_ipmilan): Started clustera > > clustera pengine: info: native_print: ipmi-fence-clusterb > > (stonith:fence_ipmilan): Started clustera > > clustera pengine: info: group_print: Resource Group: cluster > > clustera pengine: info: native_print: cluster_fs > > (ocf::heartbeat:Filesystem): Started clusterb (UNCLEAN) > > clustera pengine: info: native_print: cluster_vip > > (ocf::heartbeat:IPaddr2): Started clusterb (UNCLEAN) > > clustera pengine: info: native_print: cluster_sid > > (ocf::heartbeat:oracle): Started clusterb (UNCLEAN) > > clustera pengine: info: native_print: cluster_listnr > > (ocf::heartbeat:oralsnr): Started clusterb (UNCLEAN) > > clustera pengine: info: get_failcount_full: cluster_sid has > > failed INFINITY times on clustera > > clustera pengine: warning: common_apply_stickiness: Forcing > > cluster_sid away from clustera after 1000000 failures (max=1000000) > > clustera pengine: info: rsc_merge_weights: cluster_fs: > Rolling > > back scores from cluster_sid > > clustera pengine: info: rsc_merge_weights: cluster_vip: > Rolling > > back scores from cluster_sid > > clustera pengine: info: rsc_merge_weights: cluster_sid: > Rolling > > back scores from cluster_listnr > > clustera pengine: info: native_color: Resource cluster_sid > cannot > > run anywhere > > clustera pengine: info: native_color: Resource cluster_listnr > > cannot run anywhere > > clustera pengine: warning: custom_action: Action cluster_fs_stop_0 > on > > clusterb is unrunnable (offline) > > clustera pengine: info: RecurringOp: Start recurring monitor > > (20s) for cluster_fs on clustera > > clustera pengine: warning: custom_action: Action cluster_vip_stop_0 > on > > clusterb is unrunnable (offline) > > clustera pengine: info: RecurringOp: Start recurring monitor > > (10s) for cluster_vip on clustera > > clustera pengine: warning: custom_action: Action cluster_sid_stop_0 > on > > clusterb is unrunnable (offline) > > clustera pengine: warning: custom_action: Action cluster_sid_stop_0 > on > > clusterb is unrunnable (offline) > > clustera pengine: warning: custom_action: Action > cluster_listnr_stop_0 > > on clusterb is unrunnable (offline) > > clustera pengine: warning: custom_action: Action > cluster_listnr_stop_0 > > on clusterb is unrunnable (offline) > > clustera pengine: warning: stage6: Scheduling Node clusterb for > STONITH > > clustera pengine: info: native_stop_constraints: > > cluster_fs_stop_0 is implicit after clusterb is fenced > > clustera pengine: info: native_stop_constraints: > > cluster_vip_stop_0 is implicit after clusterb is fenced > > clustera pengine: info: native_stop_constraints: > > cluster_sid_stop_0 is implicit after clusterb is fenced > > clustera pengine: info: native_stop_constraints: > > cluster_listnr_stop_0 is implicit after clusterb is fenced > > clustera pengine: info: LogActions: Leave ipmi-fence-db01 > > (Started clustera) > > clustera pengine: info: LogActions: Leave ipmi-fence-db02 > > (Started clustera) > > clustera pengine: notice: LogActions: Move cluster_fs > > (Started clusterb -> clustera) > > clustera pengine: notice: LogActions: Move cluster_vip > > (Started clusterb -> clustera) > > clustera pengine: notice: LogActions: Stop cluster_sid > > (clusterb) > > clustera pengine: notice: LogActions: Stop cluster_listnr > > (clusterb) > > clustera pengine: warning: process_pe_message: Calculated > > Transition 26821: /var/lib/pacemaker/pengine/pe-warn-7.bz2 > > clustera crmd: info: do_state_transition: State transition > > S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > > cause=C_IPC_MESSAGE origin=handle_response ] > > clustera crmd: info: do_te_invoke: Processing graph 26821 > > (ref=pe_calc-dc-1526868653-26882) derived from > > /var/lib/pacemaker/pengine/pe-warn-7.bz2 > > clustera crmd: notice: te_fence_node: Executing reboot fencing > > operation (23) on clusterb (timeout=60000) > > > > > > Thanks ~~~~ > > > > > > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Kind regards, Albert Weng <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> 不含病毒。www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org