[ClusterLabs] [Problem and Question] If there are too many resources, pacemaker-controld restarts when re-Probe is executed.

renayama19661014 Thu, 17 May 2018 13:46:44 -0700

Hi All,


I have built the following environment.
 * RHEL7.3@KVM
 * libqb-1.0.2
 * corosync 2.4.4
 * pacemaker 2.0-rc4

Start up the cluster and pour crm files with 180 Dummy resources.
Node 3 will not start.

--------------
[root@rh73-01 ~]# crm_mon -1                                    
Stack: corosync
Current DC: rh73-01 (version 2.0.0-3aa2fced22) - partition with quorum
Last updated: Thu May 17 18:44:39 2018
Last change: Thu May 17 18:44:18 2018 by root via cibadmin on rh73-01
 2 nodes configured
180 resources configured
 Online: [ rh73-01 rh73-02 ]
 Active resources:
 Resource Group: grpJOS1
 prmDummy1  (ocf::pacemaker:Dummy): Started rh73-01
(snip)

 prmDummy140        (ocf::pacemaker:Dummy): Started rh73-01
(snip)
 prmDummy160        (ocf::pacemaker:Dummy): Started rh73-02

--------------

Execute crm_resource -R after 120 resources are started on the clustern.
--------------
[root@rh73-01 ~]# crm_resource -R      
Waiting for 1 replies from the controller. OK
--------------

I tried the following 3 patterns.

*******************

Pattern 1) When /etc/sysconfig/pacemaker is set as follows.
--------------@/etc/sysconfig/pacemaker
PCMK_logfacility=local1
PCMK_logpriority=info
--------------

After a while, the DC node crmd fails to recover and restarts the difference. 

[root@rh73-01 ~]# ps -ef |grep pace            
root      6751     1  0 18:43 ?        00:00:00 /usr/sbin/pacemakerd -f         
                                                                                
                                                 
haclust+  6752  6751  2 18:43 ?        00:00:16 
/usr/libexec/pacemaker/pacemaker-based                                          
                                                                                
root      6753  6751  0 18:43 ?        00:00:01 
/usr/libexec/pacemaker/pacemaker-fenced                                         
                                                                                
 
root      6754  6751  0 18:43 ?        00:00:02 
/usr/libexec/pacemaker/pacemaker-execd                                          
                                                                                
haclust+  6755  6751  0 18:43 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-attrd                                          
                                                                                
haclust+  6756  6751  0 18:43 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-schedulerd                                     
                                                                                
 
haclust+ 20478  6751  0 18:50 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-controld                                       
                                                                                
 
root     25552  1302  0 18:52 pts/0    00:00:00 grep --color=auto pace    



Pattern 2) In order to avoid problems, I made the following settings.
--------------@/etc/sysconfig/pacemaker
PCMK_logfacility=local1
PCMK_logpriority=info
PCMK_cib_timeout=120
PCMK_ipc_buffer=262144
--------------@crm file.
(snip)
property cib-bootstrap-options: \ cluster-ipc-limit=2000 \
(snip)
-------------- 

Just like pattern 1, after a while, DC node crmd fails to recover and restarts 
the difference. 

[root@rh73-01 ~]# ps -ef | grep pace
root      3840     1  0 18:57 ?        00:00:00 /usr/sbin/pacemakerd -f         
                                                                                
                                                 
haclust+  3841  3840  3 18:57 ?        00:00:16 
/usr/libexec/pacemaker/pacemaker-based                                          
                                                                                
root      3842  3840  0 18:57 ?        00:00:01 
/usr/libexec/pacemaker/pacemaker-fenced                                         
                                                                                
 
root      3843  3840  0 18:57 ?        00:00:01 
/usr/libexec/pacemaker/pacemaker-execd                                          
                                                                                
haclust+  3844  3840  0 18:57 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-attrd                                          
                                                                                
haclust+  3845  3840  0 18:57 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-schedulerd                                     
                                                                                
 
haclust+  6221  3840  0 19:00 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-controld                                       
                                                                                
 
root     17974  1302  0 19:05 pts/0    00:00:00 grep --color=auto pace          
      



Pattern 3) In order to avoid problems, I made the following settings. I tried 
to make only the value of PCMK_ipc_buffer smaller than the default.
--------------@/etc/sysconfig/pacemaker
PCMK_logfacility=local1
PCMK_logpriority=info
PCMK_ipc_buffer=20480
-------------- 

Even after a while, crmd will not restart and the resources of the cluster will 
be configured. 

[root@rh73-01 ~]# ps -ef | grep pace                                  
root     23511     1  0 19:08 ?        00:00:00 /usr/sbin/pacemakerd -f         
                                                                                
                                                 
haclust+ 23512 23511 16 19:08 ?        00:00:19 
/usr/libexec/pacemaker/pacemaker-based                                          
                                                                                
root     23513 23511  0 19:08 ?        00:00:01 
/usr/libexec/pacemaker/pacemaker-fenced                                         
                                                                                
 
root     23514 23511  0 19:08 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-execd                                          
                                                                                
haclust+ 23515 23511  0 19:08 ?        00:00:00 
/usr/libexec/pacemaker/pacemaker-attrd                                          
                                                                                
haclust+ 23516 23511  3 19:08 ?        00:00:04 
/usr/libexec/pacemaker/pacemaker-schedulerd                                     
                                                                                
 
haclust+ 23517 23511 11 19:08 ?        00:00:13 
/usr/libexec/pacemaker/pacemaker-controld                                       
                                                                                
 
root     28430  1302  0 19:10 pts/0    00:00:00 grep --color=auto pace        


******************* 

This problem seems to happen with Pacemaker-1.1.18. If PCMK_fail_fast = yes, 
restarting this crmd will cause the node to reboot. 

If PCMK_ipc_buffer is made small, crmd will not restart properly.

If it gets bigger it will restart, it may be something wrong with Pacemaker.
Is not there something wrong with pacemaker? 

If the number of resources is large, what kind of setting is correct? 


* This content is registered in the following Bugzilla.
- https://bugs.clusterlabs.org/show_bug.cgi?id=5349


Best Regards,
Hideo Yamauchi.
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Problem and Question] If there are too many resources, pacemaker-controld restarts when re-Probe is executed.

Reply via email to