On 09/02/2015 01:14 AM, Nicolas S. wrote: > Hello, > > I write this mailing-list because I'm having a little trouble with my cluster. > > I'm running a 3 node centos 7 cluster. Resource/stonith and all is configured. > Since a couple of days my backups take a little bit more time and one node is > getting high load. At a certain point it's fenced by the other nodes. > Of course I'm thinking of correcting that backup issue but for the moment > it's not done. > > I tried to find in the docs a general property to make the nodes to wait a > little bit more before fencing the node, but i didn't really found it (or I > misread docs). It seems that the stonith-timeout isn't the solutiion. > > Is there a best practice or such property ? > > Thanks to all. > > Regards, > > Nicolas.
Check your logs to determine what is causing the fencing, then increase the timeout for that. The logs on the node that is DC at the time will be most useful (if you're investigating after the fact, just look for the node with the most verbose logs around that time). As Ulrich suggested, most likely a monitor operation for one or more resources is timing out, and increasing the monitor timeouts will help. Of course, that may also increase the time needed to detect a real failure, so set it back after fixing the underlying issue. But it could be something else such as corosync, so the logs are the key. stonith-timeout is for how long the fencing operation itself should take. Once that starts ticking, the fencing has already been started, so that's why it doesn't help you here. If you want to get really fancy, you could use rules to automatically move some or all resources off the node during backup times: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#idm140225138998432 _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
