[ClusterLabs] Questions regarding crm_mon tool
Greetings, We are implementing an HA cluster solution and as a part of it, are using crm_mon. Part of my job is to document training materials for certain things. I am running into a problem in defining some of the information that gets outputted when running the crm_mon command. One of my principle engineers responded with this comment. crm_mon * online -> explain the different states online, offline, pending and standby (there might be more) * last updated/change (lookup what these mean but i think one of them shows when the cluster last had a config change or maybe changed states My dilemma is that I can't seem to find that within the help system or anywhere else. I'm hoping that you can assist me with this. Thanks in advance, Brian K. Vagnini Training & Documentation Coordinator Tallahassee office 850-270-0387 Slack (@bkvagnini) ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Keep printing "Sent 0 CPG messages" in corosync.log
Corosync.log has kept printing the following logs for several days. What's wrong with the corosync cluster? Now the cpu load is not high. Cluster version information: [root@paas-controller-172-167-40-24:~]$ rpm -q corosync corosync-2.4.0-9.el7_4.2.x86_64 [root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker pacemaker-1.1.16-12.el7_4.2.x86_64 Sep 30 01:23:27 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:28 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:28 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:28 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:28 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:28 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:28 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:28 [127667] paas-controller-172-21-0-2 corosync warning [MAIN ] timer_function_scheduler_timeout Corosync main process was not scheduled for 10470.3652 ms (threshold is 2400. ms). Consider token timeout increase. Sep 30 01:23:29 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:29 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:29 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:29 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:29 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:29 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:29 [127667] paas-controller-172-21-0-2 corosync notice [TOTEM ] pause_flush Process pause detected for 8760 ms, flushing membership messages. Sep 30 01:23:30 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:30 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:30 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:30 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:30 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:30 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:31 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) Sep 30 01:23:31 [128232] paas-controller-172-21-0-2cib: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=363): Try again (6) Sep 30 01:23:31 [128234] paas-controller-172-21-0-2 attrd: info: crm_cs_flush: Sent 0 CPG messages (13 remaining, last=14519): Try again (6) ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] The crmd process exited: Generic Pacemaker error (201)
Version information [root@paas-controller-172-167-40-24:~]$ rpm -q corosync corosync-2.4.0-9.el7_4.2.x86_64 [root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker pacemaker-1.1.16-12.el7_4.2.x86_64 The crmd process exited with error code of 201. The pacemakerd process tried to fork 100 times, exceeding the threshold, and the crmd process exited forever. Here is the last attempt log of forking the crmd process. I have two questions. The first one is why the crmd process exits? And the second question is whether I can set the threshold for retry times? Thank you very much! Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:error: pcmk_child_exit:The crmd process (83749) exited: Generic Pacemaker error (201) Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: crmd Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: start_child:Using uid=189 and group=189 for process crmd Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: start_child:Forked child 88033 for process crmd Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: mcp_cpg_deliver:Ignoring process list sent by peer for local node Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: mcp_cpg_deliver:Ignoring process list sent by peer for local node Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: main: CRM Git Version: 1.1.16-12.el7_4.2 (94ff4df) Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: do_log: Input I_STARTUP received in state S_STARTING from crmd_init Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: get_cluster_type: Verifying cluster type: 'corosync' Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: get_cluster_type: Assuming an active 'corosync' cluster Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: do_cib_control: CIB connection established Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: notice: crm_cluster_connect:Connecting to cluster infrastructure: corosync Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: crm_get_peer: Created entry ebd1fc7d-5c48-4c81-85ec-bad8a3f6fcb1/0x7fe04dec49a0 for node 172.167.40.24/167040024 (1 total) Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: crm_get_peer: Node 167040024 is now known as 172.167.40.24 Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: peer_update_callback: 172.167.40.24 is now in unknown state Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: crm_get_peer: Node 167040024 has uuid 167040024 Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: crm_update_peer_proc: cluster_connect_cpg: Node 172.167.40.24[167040024] - corosync-cpg is now online Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: peer_update_callback: Client 172.167.40.24/peer now has status [online] (DC=, changed=400) Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: init_cs_connection_once:Connection to 'corosync': established Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: notice: cluster_connect_quorum: Quorum acquired Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: do_ha_control: Connected to the cluster Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: lrmd_ipc_connect: Connecting to lrmd Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: do_lrm_control: LRM connection established Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: do_started: Delaying start, no membership data (0010) Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: do_started: Delaying start, no membership data (0010) Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: pcmk_quorum_notification: Quorum retained membership=4 members=1 Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: notice: crm_update_peer_state_iter: Node 172.167.40.24 state is now member nodeid=167040024 previous=unknown source=pcmk_quorum_notification Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: peer_update_callback: Cluster node 172.167.40.24 is now member (was in unknown state) Sep 08 18:10:09 [88033] paas-controller-172-167-40-24 crmd: info: do_started: Delaying start, Config not read