Re: [Pacemaker] Very strange behavior on asymmetric cluster

Pavel Levshin Fri, 18 Mar 2011 07:12:27 -0700

18.03.2011 12:44, Lars Marowsky-Bree пишет:


messages can't be lost internally. A network dropping a packet doesn't
result in lost state, because the protocols will resent it (of course).

Imagine a network where connectivity restores for a couple of secondsthen breaks again. A node becomes online, monitoring actions aredirected, then connectivity fails. This node will be unclean offline, then.

And fencing may fail in this scenario, too, because it usually relies onthe network.

So every resource in the cluster will be impacted, were they installedon the failed node or not.

You are right that we rely on the cluster stack itself to be healthy on
a node; protecting against byzantine failure modes is extremely
difficult.

I understand this fully. This is why we should keep core cluster stackto an absolutely necessary minimum. Resource agents and theydependencies are inevitable evil, where they are really needed.

But is my proposal so difficult to implement? I feel it is moreideologic difficulty. Meanwhile, it would prevent the scenario describedabove, among with others.

If you have a RA where that doesn't work, it needs fixing. We also try
to enlarge our test coverage - feel free to get invited to a free lunch
;-) http://www.advogato.org/person/lmb/diary.html?start=110

Do you also test every RA in such an environment where it's dependenciescannot be met? For example, VirtualDomain depends on libvirtd to beinstalled, running and operational. Will you find it useful to tesh thisRA on a node which is not intended to run virtual machines, in allpossible combinations of software and configurations?

Which network faults do we not tolerate?


I hope I've answered this question already.

Quorum nodes - i.e., a full pacemaker node - are a broken idea, and work
around a deficiency in the cluster stack's quorum handling.

For now, a quorum node can prevent STONITH deathmatch, it is good enoughfor me. One of my clusters consists of three node: two for actualservices and third for some offline services, which do not needredundancy. It is natural for me to setup third node as quorum node,i.e, a node, which will not participate in failover. But it may monitorit's own resources through pacemaker. Thus, simple quorum is almost freein this setup.

But it could be another cluster, for example, one sharing a clusterfilesystem. So-called quorum node is only an example.

In any case, the scenario you describe doesn't just happen. You could
install a quorum node without any resource agents at all.

And it will not prevent a cluster from failures, caused by network orthis node failure. Do you remember that a missing RA is treated as "notinstalled", but any

Andrew has suggested to not run pacemaker on quorum node. It would help.But in this case we cannot run even STONITH from this node, and this isone of possible drawbacks.

To be impartial, I would like to know what good do you see in the
current design, for the case of asymmetrical cluster or quorum node.

Asymmetric clusters just invert the configuration syntax - it turns the
resource-to-node allocation from "default allow" to "default deny".
Depending on your requirements, that may simplify your configuration. It
doesn't mean anything else.

I mean the kind of setup which makes usage of asymmetrical syntaxpractical. The setup where some resources exists on some nodes.

Even if that is set, we need to verify that the resources are, indeed,
NOT running where they shouldn't be; remember, it is our job to ensure
that the configured policy is enforced. So, we probe them everywhere to
ensure they are indeed not around, and stop them if we find them.

Again, WHY do you need to verify things which cannot happen by setup? Ifsome resource cannot, REALLY CANNOT exist on a node, and administratorcan confirm this, why rely on network, cluster stack, resource agents,electricity in power outlet, etc. to verify that 2+2 is still 4?

The quorum node is a hack that I'm not a fan of; it adds much admin
overhead (i.e., one 3rd node per cluster to manage, update, etc).
Running a full member as "just" a quorum node is a complete waste and
IMHO broken design.


You are right. I would not use a quorum node, if it wasn't here already.

What's good in checking a resource status on nodes where the
resource can not exist? What justifies increased resource downtime
caused by monitor failures, which are inevitable in real world?

If you're saying that broken resource monitoring failures are
inevitable in the real world, I could just as well ask you why they
wouldn't fail on the actually active node too - and then we'd go into
recovery as well (stopping&  restarting the service).

If a resource fails on active node, it will be acted upon. All this highavailability is about this. If the resource fails on a standby node, Ishould accept this as clustering cost. And I should fix RA orconfiguration, if applicable. When the resource fails on a node where itcan not exist, then it fails absolutely for nothing. It's unnecessaryfailure which could be avoided by simple configuration.

The answer is: fix the resource agent.

I assume it is clear now that even perfect RA may fail to deliver it'smessage to the DC. Even nonexistent RA may fail, timeout and causeresource disruption.

But I also feel that it will be waste of time trying to repair RA whichfails in conditions where it is not supposed to work.

It is, let me state that again, completely infeasible to check (at
runtime!) against all possible _internal_ failure modes. We check some
of them, but all is just impossible. And if a component "lies" to the
rest of the stack about its state, we're out of luck. We do hope that
our stack is more reliable than the software its told to manage, yes ;-)

I've described a failure, which has happened in reality. I've proposed afix, which can absolutely prevent this failure mode. This fix is toprevent the cluster from monitoring impossible cases. It will requiresome coding and some configuration to implement, but it will notintroduce any new troubles, AFAIK. Do you have anything to say against?..



And side notes:

Testing is necessary, but it cannot be complete in finite time. Somebugs escape from test environment from time to time.

My cluster has survived 4 and failed on 5th real network failure. Howmany times should I test?



--
Pavel Levshin



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Very strange behavior on asymmetric cluster

Reply via email to