18.03.2011 12:44, Lars Marowsky-Bree пишет:

messages can't be lost internally. A network dropping a packet doesn't
result in lost state, because the protocols will resent it (of course).

Imagine a network where connectivity restores for a couple of seconds then breaks again. A node becomes online, monitoring actions are directed, then connectivity fails. This node will be unclean offline, then.

And fencing may fail in this scenario, too, because it usually relies on the network.

So every resource in the cluster will be impacted, were they installed on the failed node or not.

You are right that we rely on the cluster stack itself to be healthy on
a node; protecting against byzantine failure modes is extremely
difficult.

I understand this fully. This is why we should keep core cluster stack to an absolutely necessary minimum. Resource agents and they dependencies are inevitable evil, where they are really needed.

But is my proposal so difficult to implement? I feel it is more ideologic difficulty. Meanwhile, it would prevent the scenario described above, among with others.
If you have a RA where that doesn't work, it needs fixing. We also try
to enlarge our test coverage - feel free to get invited to a free lunch
;-) http://www.advogato.org/person/lmb/diary.html?start=110

Do you also test every RA in such an environment where it's dependencies cannot be met? For example, VirtualDomain depends on libvirtd to be installed, running and operational. Will you find it useful to tesh this RA on a node which is not intended to run virtual machines, in all possible combinations of software and configurations?

Which network faults do we not tolerate?

I hope I've answered this question already.

Quorum nodes - i.e., a full pacemaker node - are a broken idea, and work
around a deficiency in the cluster stack's quorum handling.

For now, a quorum node can prevent STONITH deathmatch, it is good enough for me. One of my clusters consists of three node: two for actual services and third for some offline services, which do not need redundancy. It is natural for me to setup third node as quorum node, i.e, a node, which will not participate in failover. But it may monitor it's own resources through pacemaker. Thus, simple quorum is almost free in this setup.

But it could be another cluster, for example, one sharing a cluster filesystem. So-called quorum node is only an example.

In any case, the scenario you describe doesn't just happen. You could
install a quorum node without any resource agents at all.

And it will not prevent a cluster from failures, caused by network or this node failure. Do you remember that a missing RA is treated as "not installed", but any

Andrew has suggested to not run pacemaker on quorum node. It would help. But in this case we cannot run even STONITH from this node, and this is one of possible drawbacks.

To be impartial, I would like to know what good do you see in the
current design, for the case of asymmetrical cluster or quorum node.
Asymmetric clusters just invert the configuration syntax - it turns the
resource-to-node allocation from "default allow" to "default deny".
Depending on your requirements, that may simplify your configuration. It
doesn't mean anything else.

I mean the kind of setup which makes usage of asymmetrical syntax practical. The setup where some resources exists on some nodes.

Even if that is set, we need to verify that the resources are, indeed,
NOT running where they shouldn't be; remember, it is our job to ensure
that the configured policy is enforced. So, we probe them everywhere to
ensure they are indeed not around, and stop them if we find them.

Again, WHY do you need to verify things which cannot happen by setup? If some resource cannot, REALLY CANNOT exist on a node, and administrator can confirm this, why rely on network, cluster stack, resource agents, electricity in power outlet, etc. to verify that 2+2 is still 4?

The quorum node is a hack that I'm not a fan of; it adds much admin
overhead (i.e., one 3rd node per cluster to manage, update, etc).
Running a full member as "just" a quorum node is a complete waste and
IMHO broken design.

You are right. I would not use a quorum node, if it wasn't here already.

What's good in checking a resource status on nodes where the
resource can not exist? What justifies increased resource downtime
caused by monitor failures, which are inevitable in real world?
If you're saying that broken resource monitoring failures are
inevitable in the real world, I could just as well ask you why they
wouldn't fail on the actually active node too - and then we'd go into
recovery as well (stopping&  restarting the service).

If a resource fails on active node, it will be acted upon. All this high availability is about this. If the resource fails on a standby node, I should accept this as clustering cost. And I should fix RA or configuration, if applicable. When the resource fails on a node where it can not exist, then it fails absolutely for nothing. It's unnecessary failure which could be avoided by simple configuration.

The answer is: fix the resource agent.

I assume it is clear now that even perfect RA may fail to deliver it's message to the DC. Even nonexistent RA may fail, timeout and cause resource disruption.

But I also feel that it will be waste of time trying to repair RA which fails in conditions where it is not supposed to work.

It is, let me state that again, completely infeasible to check (at
runtime!) against all possible _internal_ failure modes. We check some
of them, but all is just impossible. And if a component "lies" to the
rest of the stack about its state, we're out of luck. We do hope that
our stack is more reliable than the software its told to manage, yes ;-)

I've described a failure, which has happened in reality. I've proposed a fix, which can absolutely prevent this failure mode. This fix is to prevent the cluster from monitoring impossible cases. It will require some coding and some configuration to implement, but it will not introduce any new troubles, AFAIK. Do you have anything to say against?..


And side notes:

Testing is necessary, but it cannot be complete in finite time. Some bugs escape from test environment from time to time.

My cluster has survived 4 and failed on 5th real network failure. How many times should I test?


--
Pavel Levshin



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to