Hi list, I recently had some trouble with a dual-node mysql cluster, which runs in master-slave mode with Percona resource manager. While analyzing what happened to the cluster, I found this in syslog (network trouble, the cluster lost disk/iscsi access on both nodes, this is a piece from the former master trying to start up again when recovering connectivity):
Jan 6 07:26:49 infante pengine: [3839]: notice: get_failcount: Failcount for MasterSlave_mysql on infante has expired (limit was 60s) Jan 6 07:26:49 infante pengine: [3839]: notice: get_failcount: Failcount for MasterSlave_mysql on infante has expired (limit was 60s) Jan 6 07:26:49 infante pengine: [3839]: WARN: common_apply_stickiness: Forcing p-stonith-ingstad away from infante after 1000000 failures (max=1000000) Jan 6 07:26:49 infante pengine: [3839]: notice: LogActions: Start prim_mysql:0#011(infante) Jan 6 07:26:49 infante pengine: [3839]: notice: LogActions: Start prim_mysql:1#011(ingstad) I don't understand it: if this means that the stonith devices have failed a million times, why is it trying to start the mysql resource? It's agains Pacemaker policies to start resources on a cluster without working stonith devices, isn't it? -- Frank Van Damme Make everything as simple as possible, but not simpler. - Albert Einstein _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org