On 03/07/2017 04:13 PM, Jehan-Guillaume de Rorthais wrote: > Hi, > > Occasionally, I find my cluster with one pending action not being executed for > some minutes (I guess until the "PEngine Recheck Timer" elapse). > > Running "crm_simulate -SL" shows the pending actions. > > I'm still confused about how it can happens, why it happens and how to avoid > this.
It's most likely a bug in the crmd, which schedules PE runs. > Earlier today, I started my test cluster with 3 nodes and a master/slave > resource[1], all with positive master score (1001, 1000 and 990), and the > cluster kept the promote action as a pending action for 15 minutes. > > You will find in attachment the first 3 pengine inputs executed after the > cluster startup. > > What are the consequences if I set cluster-recheck-interval to 30s as > instance? The cluster would consume more CPU and I/O continually recalculating the cluster state. It would be nice to have some guidelines for cluster-recheck-interval based on real-world usage, but it's just going by gut feeling at this point. The cluster automatically recalculates when something "interesting" happens -- a node comes or goes, a monitor fails, a node attribute changes, etc. The cluster-recheck-interval is (1) a failsafe for buggy situations like this, and (2) the maximum granularity of many time-based checks such as rules. I would personally use at least 5 minutes, though less is probably reasonable, especially with simple configurations (number of nodes/resources/constraints). > Thanks in advance for your lights :) > > Regards, > > [1] here is the setup: > http://dalibo.github.io/PAF/Quick_Start-CentOS-7.html#cluster-resource-creation-and-management Feel free to open a bug report and include some logs around the time of the incident (most importantly from the DC). _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org