Hi All,
I have built the following environment. * RHEL7.3@KVM * libqb-1.0.2 * corosync 2.4.4 * pacemaker 2.0-rc4 Start up the cluster and pour crm files with 180 Dummy resources. Node 3 will not start. -------------- [root@rh73-01 ~]# crm_mon -1 Stack: corosync Current DC: rh73-01 (version 2.0.0-3aa2fced22) - partition with quorum Last updated: Thu May 17 18:44:39 2018 Last change: Thu May 17 18:44:18 2018 by root via cibadmin on rh73-01 2 nodes configured 180 resources configured Online: [ rh73-01 rh73-02 ] Active resources: Resource Group: grpJOS1 prmDummy1 (ocf::pacemaker:Dummy): Started rh73-01 (snip) prmDummy140 (ocf::pacemaker:Dummy): Started rh73-01 (snip) prmDummy160 (ocf::pacemaker:Dummy): Started rh73-02 -------------- Execute crm_resource -R after 120 resources are started on the clustern. -------------- [root@rh73-01 ~]# crm_resource -R Waiting for 1 replies from the controller. OK -------------- I tried the following 3 patterns. ******************* Pattern 1) When /etc/sysconfig/pacemaker is set as follows. --------------@/etc/sysconfig/pacemaker PCMK_logfacility=local1 PCMK_logpriority=info -------------- After a while, the DC node crmd fails to recover and restarts the difference. [root@rh73-01 ~]# ps -ef |grep pace root 6751 1 0 18:43 ? 00:00:00 /usr/sbin/pacemakerd -f haclust+ 6752 6751 2 18:43 ? 00:00:16 /usr/libexec/pacemaker/pacemaker-based root 6753 6751 0 18:43 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-fenced root 6754 6751 0 18:43 ? 00:00:02 /usr/libexec/pacemaker/pacemaker-execd haclust+ 6755 6751 0 18:43 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-attrd haclust+ 6756 6751 0 18:43 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-schedulerd haclust+ 20478 6751 0 18:50 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-controld root 25552 1302 0 18:52 pts/0 00:00:00 grep --color=auto pace Pattern 2) In order to avoid problems, I made the following settings. --------------@/etc/sysconfig/pacemaker PCMK_logfacility=local1 PCMK_logpriority=info PCMK_cib_timeout=120 PCMK_ipc_buffer=262144 --------------@crm file. (snip) property cib-bootstrap-options: \ cluster-ipc-limit=2000 \ (snip) -------------- Just like pattern 1, after a while, DC node crmd fails to recover and restarts the difference. [root@rh73-01 ~]# ps -ef | grep pace root 3840 1 0 18:57 ? 00:00:00 /usr/sbin/pacemakerd -f haclust+ 3841 3840 3 18:57 ? 00:00:16 /usr/libexec/pacemaker/pacemaker-based root 3842 3840 0 18:57 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-fenced root 3843 3840 0 18:57 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-execd haclust+ 3844 3840 0 18:57 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-attrd haclust+ 3845 3840 0 18:57 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-schedulerd haclust+ 6221 3840 0 19:00 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-controld root 17974 1302 0 19:05 pts/0 00:00:00 grep --color=auto pace Pattern 3) In order to avoid problems, I made the following settings. I tried to make only the value of PCMK_ipc_buffer smaller than the default. --------------@/etc/sysconfig/pacemaker PCMK_logfacility=local1 PCMK_logpriority=info PCMK_ipc_buffer=20480 -------------- Even after a while, crmd will not restart and the resources of the cluster will be configured. [root@rh73-01 ~]# ps -ef | grep pace root 23511 1 0 19:08 ? 00:00:00 /usr/sbin/pacemakerd -f haclust+ 23512 23511 16 19:08 ? 00:00:19 /usr/libexec/pacemaker/pacemaker-based root 23513 23511 0 19:08 ? 00:00:01 /usr/libexec/pacemaker/pacemaker-fenced root 23514 23511 0 19:08 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-execd haclust+ 23515 23511 0 19:08 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-attrd haclust+ 23516 23511 3 19:08 ? 00:00:04 /usr/libexec/pacemaker/pacemaker-schedulerd haclust+ 23517 23511 11 19:08 ? 00:00:13 /usr/libexec/pacemaker/pacemaker-controld root 28430 1302 0 19:10 pts/0 00:00:00 grep --color=auto pace ******************* This problem seems to happen with Pacemaker-1.1.18. If PCMK_fail_fast = yes, restarting this crmd will cause the node to reboot. If PCMK_ipc_buffer is made small, crmd will not restart properly. If it gets bigger it will restart, it may be something wrong with Pacemaker. Is not there something wrong with pacemaker? If the number of resources is large, what kind of setting is correct? * This content is registered in the following Bugzilla. - https://bugs.clusterlabs.org/show_bug.cgi?id=5349 Best Regards, Hideo Yamauchi. _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org