Re: [Pacemaker] High load issues

Dominik Klein Fri, 05 Feb 2010 00:01:53 -0800

> But generally I believe this test case is invalid.

I might agree here that this test case does not necessarily reproduce
what happened on my production system (unfortunately I do not know for
sure what happened there, the dev who caused this just tells me he used
some stupid sql statement and even executed it several times in
parallel), but I do not think the testcase is invalid. If there is an
OOM situation on a node and therefore the local pacemaker can't do it's
job anymore (I base this statement on the various lrmd "cannot allocate
memory" logs), this is a case the cluster should be able to recover from.


What I saw while doing this test was that the bad node discovered
failures on the running ip and mysql resources, scheduled the recovery,
but never managed to recover.

I think it was lmb who suggested "periodic health-checks" on the
pacemaker layer. If pacemaker on $good had periodically tried to talk to
pacemaker on $bad, then it might have seen that $bad does not respond
and might have done something about it. Just my theory though.

Opinions?

Regards
Dominik

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] High load issues

Reply via email to