[ClusterLabs] Questions regarding crm_mon tool

2018-09-29 Thread Brian Vagnini
Greetings,
We are implementing an HA cluster solution and as a part of it, are using 
crm_mon. Part of my job is to document training materials for certain things. I 
am running into a problem in defining some of the information that gets 
outputted when running the crm_mon command.

One of my principle engineers responded with this comment.

crm_mon

  *   online -> explain the different states online, offline, pending and 
standby (there might be more)
  *   last updated/change (lookup what these mean but i think one of them shows 
when the cluster last had a config change or maybe changed states
My dilemma is that I can't seem to find that within the help system or anywhere 
else. I'm hoping that you can assist me with this.

Thanks in advance,

Brian K. Vagnini
Training & Documentation Coordinator
Tallahassee office
850-270-0387
Slack (@bkvagnini)

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Keep printing "Sent 0 CPG messages" in corosync.log

2018-09-29 Thread lkxjtu


Corosync.log has kept printing the following logs for several days. What's 
wrong with the corosync cluster? Now the cpu load is not high.

Cluster version information:
[root@paas-controller-172-167-40-24:~]$ rpm -q corosync
corosync-2.4.0-9.el7_4.2.x86_64
[root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker
pacemaker-1.1.16-12.el7_4.2.x86_64



Sep 30 01:23:27 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:28 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:28 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:28 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:28 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:28 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:28 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:28 [127667] paas-controller-172-21-0-2 corosync warning [MAIN  ] 
timer_function_scheduler_timeout Corosync main process was not scheduled for 
10470.3652 ms (threshold is 2400. ms). Consider token timeout increase.
Sep 30 01:23:29 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:29 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:29 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:29 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:29 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:29 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:29 [127667] paas-controller-172-21-0-2 corosync notice  [TOTEM ] 
pause_flush Process pause detected for 8760 ms, flushing membership messages.
Sep 30 01:23:30 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:30 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:30 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:30 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:30 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:30 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:31 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
Sep 30 01:23:31 [128232] paas-controller-172-21-0-2cib: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
Sep 30 01:23:31 [128234] paas-controller-172-21-0-2  attrd: info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=14519): Try again (6)
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] The crmd process exited: Generic Pacemaker error (201)

2018-09-29 Thread lkxjtu


Version information
[root@paas-controller-172-167-40-24:~]$ rpm -q corosync
corosync-2.4.0-9.el7_4.2.x86_64
[root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker
pacemaker-1.1.16-12.el7_4.2.x86_64

The crmd process exited with error code of 201. The pacemakerd process tried to 
fork 100 times, exceeding the threshold, and the crmd process exited forever. 
Here is the last attempt log of forking the crmd process.

I have two questions. The first one is why the crmd process exits? And the 
second question is whether I can set the threshold for retry times? Thank you 
very much!



Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:error: 
pcmk_child_exit:The crmd process (83749) exited: Generic Pacemaker error 
(201)
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd:   notice: 
pcmk_process_exit:  Respawning failed child process: crmd
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
start_child:Using uid=189 and group=189 for process crmd
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
start_child:Forked child 88033 for process crmd
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
mcp_cpg_deliver:Ignoring process list sent by peer for local node
Sep 08 18:10:09 [28446] paas-controller-172-167-40-24 pacemakerd: info: 
mcp_cpg_deliver:Ignoring process list sent by peer for local node
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_log_init:   Changed active directory to /var/lib/pacemaker/cores
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
main:   CRM Git Version: 1.1.16-12.el7_4.2 (94ff4df)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_log: Input I_STARTUP received in state S_STARTING from crmd_init
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
get_cluster_type:   Verifying cluster type: 'corosync'
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
get_cluster_type:   Assuming an active 'corosync' cluster
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_cib_control: CIB connection established
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:   notice: 
crm_cluster_connect:Connecting to cluster infrastructure: corosync
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_get_peer:   Created entry 
ebd1fc7d-5c48-4c81-85ec-bad8a3f6fcb1/0x7fe04dec49a0 for node 
172.167.40.24/167040024 (1 total)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_get_peer:   Node 167040024 is now known as 172.167.40.24
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
peer_update_callback:   172.167.40.24 is now in unknown state
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_get_peer:   Node 167040024 has uuid 167040024
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
crm_update_peer_proc:   cluster_connect_cpg: Node 172.167.40.24[167040024] 
- corosync-cpg is now online
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
peer_update_callback:   Client 172.167.40.24/peer now has status [online] 
(DC=, changed=400)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
init_cs_connection_once:Connection to 'corosync': established
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:   notice: 
cluster_connect_quorum: Quorum acquired
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_ha_control:  Connected to the cluster
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
lrmd_ipc_connect:   Connecting to lrmd
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_lrm_control: LRM connection established
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_started: Delaying start, no membership data (0010)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_started: Delaying start, no membership data (0010)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
pcmk_quorum_notification:   Quorum retained  membership=4 members=1
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd:   notice: 
crm_update_peer_state_iter: Node 172.167.40.24 state is now member  
nodeid=167040024 previous=unknown source=pcmk_quorum_notification
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
peer_update_callback:   Cluster node 172.167.40.24 is now member (was in 
unknown state)
Sep 08 18:10:09 [88033] paas-controller-172-167-40-24   crmd: info: 
do_started: Delaying start, Config not read