Re: [ClusterLabs] sub-second error detection and failover possible

Ken Gaillot Tue, 01 Sep 2015 08:21:53 -0700

On 09/01/2015 10:07 AM, Digimer wrote:
> On 01/09/15 09:27 AM, Michael Schwartzkopff wrote:
>> Hi,
>>
>> perhaps this question was answered elsewhere, but I count not find any 
>> satisfying answer. So is it possible to set uo a corosync/pacemaker cluster 
>> that detects errors and does the failover in a sub-second time span?
>>
>> if yes, how?
>>
>>
>> Mit freundlichen Grüßen,
>>
>> Michael Schwartzkopff
> 
> Corosync declares a loss of a node, so you would need to start by tuning
> it (token loss timeout and loss count). Of course, as you tighten this
> up, the chances of a transient issue causing false declaration of node
> loss increases.
> 
> Next, you'd need a fence device that can terminate and verify the node's
> termination very, very quickly. I do not know of such a device. Part of
> this is also the time taken for the fence agent to be invoked.
> 
> Last, you'd need to have pacemaker calculate the new desired state and
> make those changes. The services being recovered would need to start
> exceptionally quickly.
> 
> In theory, it's possible I suppose. In practice, very unlikely.


Another consideration: while pacemaker timeouts and intervals can be
specified in milliseconds, internally pacemaker frequently truncates
such values to whole seconds. I wouldn't recommend using anything less
than 2s in any configured value.

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] sub-second error detection and failover possible

Reply via email to