[Pacemaker] starting resources with failed stonith resource

Frank Van Damme Tue, 07 Jan 2014 07:46:57 -0800

Hi list,

I recently had some trouble with a dual-node mysql cluster, which runs
in master-slave mode with Percona resource manager. While analyzing
what happened to the cluster, I found this in syslog (network trouble,
the cluster lost disk/iscsi access on both nodes, this is a piece from
the former master trying to start up again when recovering
connectivity):


Jan  6 07:26:49 infante pengine: [3839]: notice: get_failcount:
Failcount for MasterSlave_mysql on infante has expired (limit was 60s)
Jan  6 07:26:49 infante pengine: [3839]: notice: get_failcount:
Failcount for MasterSlave_mysql on infante has expired (limit was 60s)
Jan  6 07:26:49 infante pengine: [3839]: WARN:
common_apply_stickiness: Forcing p-stonith-ingstad away from infante
after 1000000 failures (max=1000000)
Jan  6 07:26:49 infante pengine: [3839]: notice: LogActions: Start
prim_mysql:0#011(infante)
Jan  6 07:26:49 infante pengine: [3839]: notice: LogActions: Start
prim_mysql:1#011(ingstad)

I don't understand it: if this means that the stonith devices have
failed a million times, why is it trying to start the mysql resource?
It's agains Pacemaker policies to start resources on a cluster without
working stonith devices, isn't it?

-- 
Frank Van Damme
Make everything as simple as possible, but not simpler. - Albert Einstein

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] starting resources with failed stonith resource

Reply via email to