Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Christine Caulfield
On 25/07/16 16:27, Klaus Wenninger wrote:
> On 07/25/2016 04:56 PM, Thomas Lamprecht wrote:
>> Thanks for the fast reply :)
>>
>>
>> On 07/25/2016 03:51 PM, Christine Caulfield wrote:
>>> On 25/07/16 14:29, Thomas Lamprecht wrote:
 Hi all,

 I'm currently testing the new features of corosync 2.4, especially
 qdevices.
 First tests show quite nice results, like having quorum on a single
 node
 left out of a three node cluster.

 But what I'm a bit worrying about is what happens if the server where
 qnetd runs, or the qdevice daemon fails,
 in this case the cluster cannot afford any other loss of a node in my
 three node setup as votes expected are
 5 and thus 3 are needed for quorum, which I cannot fulfill if the qnetd
 does not run run or failed.
>>> We're looking into ways of making this more resilient. It might be
>>> possible to cluster a qnetd (though this is not currently supported) in
>>> a separate cluster from the arbitrated one, obviously.
>>
>> Yeah I saw that in the QDevice document, that would be a way.
>>
>> Would then the qnetd daemons act like an own cluster I guess, as there
>> would be a need to communicate which node sees which qnetd daemon?
>> So that a decision about the quorate partition can be made.
>>
>> But it's always binding the reliability of a cluster to the one of a
>> node, adding a dependency,
>> meaning that now failures of components outside from the cluster,
>> which would else have
>> no affect on the cluster behaviour may now affect it, which could be a
>> problem?
>>
>> I know that's worst case scenario but with only one qnetd running on a
>> single (external) node
>> it can happen, and if the reliability of the node running qnetd is the
>> same as the one from each cluster node
>> the reliability of the whole cluster in a three node case would be
>> quite simplified, if I remember my introduction course to this topic
>> somewhat correctly:
>>
>> Without qnetd: 1 - ( (1 - R1) *  (1 - R2) * (1 - R3))
>>
>> With qnetd: (1 - ( (1 - R1) *  (1 - R2) * (1 - R3)) ) * Rqnetd
>>
>> Where R1, R2, R3 are the reliabilities of  the cluster nodes and
>> Rqnetd is the reliability of the node running qnetd.
>> While thats a really really simplified model, not quite correctly
>> depict reallity, the base concept that the reliability
>> of the whole cluster gets dependent of the one from the node running
>> qnetd, or?
>>
> With lms and ffsplit I guess the calculation is not that simple anymore ...
> 
> correct me if I'm wrong but I think a bottomline to understanding the
> benefits of qdevice is to think of the classic quorum-generation taking
> basically a snapshot of the situation at a certain time and deriving the
> reactions from that - whereas with qdevice it is tried to benefit from
> the knowledge of the past (respectively how we got into the current
> situation).
>  


Actually no. qdevice is totally stateless (which is why I think it will
cluster well when we get round to it). it makes the best decision it can
based on the fact that it should have a full view of all nodes in the
cluster regardless of whether they can see each other - and if they
can't see qdevice then they don't get the vote anyway.

Chrissie

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Klaus Wenninger
On 07/25/2016 04:56 PM, Thomas Lamprecht wrote:
> Thanks for the fast reply :)
>
>
> On 07/25/2016 03:51 PM, Christine Caulfield wrote:
>> On 25/07/16 14:29, Thomas Lamprecht wrote:
>>> Hi all,
>>>
>>> I'm currently testing the new features of corosync 2.4, especially
>>> qdevices.
>>> First tests show quite nice results, like having quorum on a single
>>> node
>>> left out of a three node cluster.
>>>
>>> But what I'm a bit worrying about is what happens if the server where
>>> qnetd runs, or the qdevice daemon fails,
>>> in this case the cluster cannot afford any other loss of a node in my
>>> three node setup as votes expected are
>>> 5 and thus 3 are needed for quorum, which I cannot fulfill if the qnetd
>>> does not run run or failed.
>> We're looking into ways of making this more resilient. It might be
>> possible to cluster a qnetd (though this is not currently supported) in
>> a separate cluster from the arbitrated one, obviously.
>
> Yeah I saw that in the QDevice document, that would be a way.
>
> Would then the qnetd daemons act like an own cluster I guess, as there
> would be a need to communicate which node sees which qnetd daemon?
> So that a decision about the quorate partition can be made.
>
> But it's always binding the reliability of a cluster to the one of a
> node, adding a dependency,
> meaning that now failures of components outside from the cluster,
> which would else have
> no affect on the cluster behaviour may now affect it, which could be a
> problem?
>
> I know that's worst case scenario but with only one qnetd running on a
> single (external) node
> it can happen, and if the reliability of the node running qnetd is the
> same as the one from each cluster node
> the reliability of the whole cluster in a three node case would be
> quite simplified, if I remember my introduction course to this topic
> somewhat correctly:
>
> Without qnetd: 1 - ( (1 - R1) *  (1 - R2) * (1 - R3))
>
> With qnetd: (1 - ( (1 - R1) *  (1 - R2) * (1 - R3)) ) * Rqnetd
>
> Where R1, R2, R3 are the reliabilities of  the cluster nodes and
> Rqnetd is the reliability of the node running qnetd.
> While thats a really really simplified model, not quite correctly
> depict reallity, the base concept that the reliability
> of the whole cluster gets dependent of the one from the node running
> qnetd, or?
>
With lms and ffsplit I guess the calculation is not that simple anymore ...

correct me if I'm wrong but I think a bottomline to understanding the
benefits of qdevice is to think of the classic quorum-generation taking
basically a snapshot of the situation at a certain time and deriving the
reactions from that - whereas with qdevice it is tried to benefit from
the knowledge of the past (respectively how we got into the current
situation).
 
>>
>> The LMS algorithm is quite smart about how it doles out its vote and can
>> handle isolation from the main qnetd provided that the main core of the
>> cluster (the majority in a split) retains quorum, but any more serious
>> changes to the cluster config will cause it to be withdrawn. So in this
>> case you should find that your 3 node cluster will continue to work in
>> the absence of the qnetd server or link, provided you don't lose any
>> nodes.
>
> Yes I read that in the documents and saw that during testing also,
> really good work!
>
> My point of my mail was exactly the failure of qnetd itself and the
> resulting situation that the cluster
> then cannot afford to loose any node, while without qnetd it could
> afford to loose (n - 1) / 2 nodes.
>
> Or do I have also to enable quorum.last_man_standing together with
> quorum.wait_for_all to allowing down scaling of the expected votes if
> qnetd fails completely?
> I will test that.
>
> I'm just wanting to be sure if my thoughts are correct, or at least
> not completely flawed and that
> qnetd like it is makes sense in a even node count cluster with the
> ffsplit algorithm but not in an
> uneven node count cluster, if the reliability of the node running
> qnetd cannot be guaranteed,
> i.e. adding HA to the service (VM or container) running qnetd.
>
> best regards,
> Thomas
>
>>
>> In a 3 node setup obviously LMS is more appropriate than ffsplit anyway.
>>
>> Chrissie
>>
>>> So in this case I'm bound to the reliability of the server providing
>>> the
>>> qnetd service,
>>> if it fails I cannot afford to loose any other node in my three node
>>> example,
>>> or also in any other example with uneven node count as the qdevice vote
>>> subsystems provides node count -1 votes.
>>>
>>> So if I see it correctly QDevices make only sense in case of even node
>>> counts,
>>> maybe especially 2 node setups as if qnetd works we have on more node
>>> which may fail and if qnetd failed
>>> we are as good as without it as qnted provides only one vote here.
>>>
>>> Am I missing something, or any thoughts to that?
>>>
>>>
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://cl

Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Thomas Lamprecht

Thanks for the fast reply :)


On 07/25/2016 03:51 PM, Christine Caulfield wrote:

On 25/07/16 14:29, Thomas Lamprecht wrote:

Hi all,

I'm currently testing the new features of corosync 2.4, especially
qdevices.
First tests show quite nice results, like having quorum on a single node
left out of a three node cluster.

But what I'm a bit worrying about is what happens if the server where
qnetd runs, or the qdevice daemon fails,
in this case the cluster cannot afford any other loss of a node in my
three node setup as votes expected are
5 and thus 3 are needed for quorum, which I cannot fulfill if the qnetd
does not run run or failed.

We're looking into ways of making this more resilient. It might be
possible to cluster a qnetd (though this is not currently supported) in
a separate cluster from the arbitrated one, obviously.


Yeah I saw that in the QDevice document, that would be a way.

Would then the qnetd daemons act like an own cluster I guess, as there 
would be a need to communicate which node sees which qnetd daemon?

So that a decision about the quorate partition can be made.

But it's always binding the reliability of a cluster to the one of a 
node, adding a dependency,
meaning that now failures of components outside from the cluster, which 
would else have
no affect on the cluster behaviour may now affect it, which could be a 
problem?


I know that's worst case scenario but with only one qnetd running on a 
single (external) node
it can happen, and if the reliability of the node running qnetd is the 
same as the one from each cluster node
the reliability of the whole cluster in a three node case would be quite 
simplified, if I remember my introduction course to this topic somewhat 
correctly:


Without qnetd: 1 - ( (1 - R1) *  (1 - R2) * (1 - R3))

With qnetd: (1 - ( (1 - R1) *  (1 - R2) * (1 - R3)) ) * Rqnetd

Where R1, R2, R3 are the reliabilities of  the cluster nodes and Rqnetd 
is the reliability of the node running qnetd.
While thats a really really simplified model, not quite correctly depict 
reallity, the base concept that the reliability
of the whole cluster gets dependent of the one from the node running 
qnetd, or?




The LMS algorithm is quite smart about how it doles out its vote and can
handle isolation from the main qnetd provided that the main core of the
cluster (the majority in a split) retains quorum, but any more serious
changes to the cluster config will cause it to be withdrawn. So in this
case you should find that your 3 node cluster will continue to work in
the absence of the qnetd server or link, provided you don't lose any nodes.


Yes I read that in the documents and saw that during testing also, 
really good work!


My point of my mail was exactly the failure of qnetd itself and the 
resulting situation that the cluster
then cannot afford to loose any node, while without qnetd it could 
afford to loose (n - 1) / 2 nodes.


Or do I have also to enable quorum.last_man_standing together with 
quorum.wait_for_all to allowing down scaling of the expected votes if 
qnetd fails completely?

I will test that.

I'm just wanting to be sure if my thoughts are correct, or at least not 
completely flawed and that
qnetd like it is makes sense in a even node count cluster with the 
ffsplit algorithm but not in an
uneven node count cluster, if the reliability of the node running qnetd 
cannot be guaranteed,

i.e. adding HA to the service (VM or container) running qnetd.

best regards,
Thomas



In a 3 node setup obviously LMS is more appropriate than ffsplit anyway.

Chrissie


So in this case I'm bound to the reliability of the server providing the
qnetd service,
if it fails I cannot afford to loose any other node in my three node
example,
or also in any other example with uneven node count as the qdevice vote
subsystems provides node count -1 votes.

So if I see it correctly QDevices make only sense in case of even node
counts,
maybe especially 2 node setups as if qnetd works we have on more node
which may fail and if qnetd failed
we are as good as without it as qnted provides only one vote here.

Am I missing something, or any thoughts to that?



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bug

Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Christine Caulfield
On 25/07/16 14:51, Christine Caulfield wrote:
> On 25/07/16 14:29, Thomas Lamprecht wrote:
>> Hi all,
>>
>> I'm currently testing the new features of corosync 2.4, especially
>> qdevices.
>> First tests show quite nice results, like having quorum on a single node
>> left out of a three node cluster.
>>
>> But what I'm a bit worrying about is what happens if the server where
>> qnetd runs, or the qdevice daemon fails,
>> in this case the cluster cannot afford any other loss of a node in my
>> three node setup as votes expected are
>> 5 and thus 3 are needed for quorum, which I cannot fulfill if the qnetd
>> does not run run or failed.
> 
> We're looking into ways of making this more resilient. It might be
> possible to cluster a qnetd (though this is not currently supported) in
> a separate cluster from the arbitrated one, obviously.
> 
> The LMS algorithm is quite smart about how it doles out its vote and can
> handle isolation from the main qnetd provided that the main core of the
> cluster (the majority in a split) retains quorum, but any more serious
> changes to the cluster config will cause it to be withdrawn. So in this
> case you should find that your 3 node cluster will continue to work in
> the absence of the qnetd server or link, provided you don't lose any nodes.
> 

I should have also said that you'll need to enable 'wait_for_all' for
this to work.

Chrissie

> In a 3 node setup obviously LMS is more appropriate than ffsplit anyway.
> 
> Chrissie
> 
>>
>> So in this case I'm bound to the reliability of the server providing the
>> qnetd service,
>> if it fails I cannot afford to loose any other node in my three node
>> example,
>> or also in any other example with uneven node count as the qdevice vote
>> subsystems provides node count -1 votes.
>>
>> So if I see it correctly QDevices make only sense in case of even node
>> counts,
>> maybe especially 2 node setups as if qnetd works we have on more node
>> which may fail and if qnetd failed
>> we are as good as without it as qnted provides only one vote here.
>>
>> Am I missing something, or any thoughts to that?
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Christine Caulfield
On 25/07/16 14:29, Thomas Lamprecht wrote:
> Hi all,
> 
> I'm currently testing the new features of corosync 2.4, especially
> qdevices.
> First tests show quite nice results, like having quorum on a single node
> left out of a three node cluster.
> 
> But what I'm a bit worrying about is what happens if the server where
> qnetd runs, or the qdevice daemon fails,
> in this case the cluster cannot afford any other loss of a node in my
> three node setup as votes expected are
> 5 and thus 3 are needed for quorum, which I cannot fulfill if the qnetd
> does not run run or failed.

We're looking into ways of making this more resilient. It might be
possible to cluster a qnetd (though this is not currently supported) in
a separate cluster from the arbitrated one, obviously.

The LMS algorithm is quite smart about how it doles out its vote and can
handle isolation from the main qnetd provided that the main core of the
cluster (the majority in a split) retains quorum, but any more serious
changes to the cluster config will cause it to be withdrawn. So in this
case you should find that your 3 node cluster will continue to work in
the absence of the qnetd server or link, provided you don't lose any nodes.

In a 3 node setup obviously LMS is more appropriate than ffsplit anyway.

Chrissie

> 
> So in this case I'm bound to the reliability of the server providing the
> qnetd service,
> if it fails I cannot afford to loose any other node in my three node
> example,
> or also in any other example with uneven node count as the qdevice vote
> subsystems provides node count -1 votes.
> 
> So if I see it correctly QDevices make only sense in case of even node
> counts,
> maybe especially 2 node setups as if qnetd works we have on more node
> which may fail and if qnetd failed
> we are as good as without it as qnted provides only one vote here.
> 
> Am I missing something, or any thoughts to that?
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Reliability questions on the new QDevices in uneven node count Setups

2016-07-25 Thread Thomas Lamprecht

Hi all,

I'm currently testing the new features of corosync 2.4, especially qdevices.
First tests show quite nice results, like having quorum on a single node 
left out of a three node cluster.


But what I'm a bit worrying about is what happens if the server where 
qnetd runs, or the qdevice daemon fails,
in this case the cluster cannot afford any other loss of a node in my 
three node setup as votes expected are
5 and thus 3 are needed for quorum, which I cannot fulfill if the qnetd 
does not run run or failed.


So in this case I'm bound to the reliability of the server providing the 
qnetd service,
if it fails I cannot afford to loose any other node in my three node 
example,
or also in any other example with uneven node count as the qdevice vote 
subsystems provides node count -1 votes.


So if I see it correctly QDevices make only sense in case of even node 
counts,
maybe especially 2 node setups as if qnetd works we have on more node 
which may fail and if qnetd failed

we are as good as without it as qnted provides only one vote here.

Am I missing something, or any thoughts to that?

best regards,
Thomas



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org