Re: [ClusterLabs] pcs add node command is success but node is not configured to existing cluster

2021-07-28 Thread Strahil Nikolov via Users
Firewall issue ?
Did you check on corosync level if all nodes reach each other ?

Best Regards,
Strahil Nikolov






В сряда, 28 юли 2021 г., 16:32:51 ч. Гринуич+3, S Sathish S via Users 
 написа: 





  


Hi Team,

 

we are trying to add node03 to existing cluster after adding we could see only 
2 nodes are configured and validated corosync log also "Waiting for all cluster 
members. Current votes: 2 expected_votes: 3" but in node3 pcs cluster status 
output it show 3 nodes are configured and no resource are listing but in node02 
we have 40 resource configured which is not reflecting on node03.

 

This issue occur only on few problematic hardware not on all hardware , we 
don’t know why this is joining into cluster.

 

[root@node02 ~]# pcs cluster status

Cluster Status:

Stack: corosync

Current DC: node01 (version 2.0.2-744a30d655) - partition WITHOUT quorum

Last updated: Wed Jul 28 14:58:13 2021

Last change: Wed Jul 28 14:41:41 2021 by root via cibadmin on node01

2 nodes configured

40 resources configured

 

PCSD Status:

  node02: Online

  node01: Online

  node03: Online

[root@node02 ~]#

 

Corosync log on node added execution :

Jul 28 11:15:05 [17598] node01 corosync notice  [TOTEM ] A new membership 
(10.216.x.x:42660) was formed. Members

Jul 28 11:15:05 [17598] node01 corosync notice  [QUORUM] Members[2]: 1 2

Jul 28 11:15:05 [17598] node01 corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.

Jul 28 11:15:05 [17598] node01 corosync notice  [CFG   ] Config reload 
requested by node 1

Jul 28 11:15:05 [17598] node01 corosync notice  [TOTEM ] adding new UDPU member 
{10.216.x.x}

Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3

Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3

Jul 28 11:15:07 [17599] node01 corosync notice  [TOTEM ] A new membership 
(10.216.x.x:42664) was formed. Members

Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3

Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3

Jul 28 11:15:07 [17599] node01 corosync notice  [QUORUM] This node is within 
the non-primary component and will NOT provide any services.

Jul 28 11:15:07 [17599] node01 corosync notice  [QUORUM] Members[2]: 1 2

Jul 28 11:15:07 [17599] node01 corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.

Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3

Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3

Jul 28 11:15:11 [17599] node01 corosync notice  [TOTEM ] A new membership 
(10.216.x.x:42668) was formed. Members

 

 

[root@node03 ~]# pcs cluster status

Cluster Status:

Stack: corosync

Current DC: node03 (version 2.0.2-744a30d655) - partition WITHOUT quorum

Last updated: Wed Jul 28 15:04:31 2021

Last change: Wed Jul 28 15:04:00 2021 by root via cibadmin on node03

3 nodes configured

0 resources configured

 

PCSD Status:

  node03: Online

  node01: Online

  node02: Online

[root@node03 ~]#

 

Thanks and Regards,

S Sathish S



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-28 Thread Strahil Nikolov via Users
So far, I never had a cluster with nodes directly connected to the same 
switches. Usually it's a nodeA -> switchA -> switchB -> nodeB and sometimes 
connectivity between switches goes down (for example a firewall rule).

Best Regards,
Strahil Nikolov






В сряда, 28 юли 2021 г., 15:51:36 ч. Гринуич+3, john tillman  
написа: 







> Technically you could give one vote to one node and zero to the other. 
> If they lose contact only the server with one vote would make quorum. 
> The downside is that if the server with 1 vote goes down the entire
> cluster comes to a halt.
>
>
> That said, if both nodes can reach the same switch that they are
> connected to each other through, why can't they reach each other?
>

"... why can't they reach each other?"  My question as well.

It feels like a very low probability thing to me.  Some
blockage/filtering/delay of the cluster's "quorum packets" while ping
packets were allowed through, perhaps caused by network congestion.  But
I'm not a network engineer.  Any network engineers reading this care to
comment?

Thanks for echoing my thoughts and that interesting quorum-weight idea.


>
> On 7/26/21 12:21 PM, john tillman wrote:
>> They would continue running their resources and we would have split
>> brain.
>>
>> So there is no safe way to support a two node cluster 100% of the time.
>> But when all you have are two nodes and a switch ... well, when life
>> gives
>> you lemons ...
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

>


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-28 Thread kgaillot
On Wed, 2021-07-28 at 08:51 -0400, john tillman wrote:
> > Technically you could give one vote to one node and zero to the
> > other. 
> > If they lose contact only the server with one vote would make
> > quorum. 
> > The downside is that if the server with 1 vote goes down the entire
> > cluster comes to a halt.
> > 
> > 
> > That said, if both nodes can reach the same switch that they are
> > connected to each other through, why can't they reach each other?
> > 
> 
> "... why can't they reach each other?"  My question as well.
> 
> It feels like a very low probability thing to me.  Some
> blockage/filtering/delay of the cluster's "quorum packets" while ping
> packets were allowed through, perhaps caused by network
> congestion.  But
> I'm not a network engineer.  Any network engineers reading this care
> to
> comment?

It's not necessarily that they can't reach each other, but that one is
unresponsive. A kernel driver temporarily blocking activity, CPU or I/O
overload, or losing an essential disk drive (which won't affect
networking) can all cause a server to become unresponsive to cluster
traffic while still potentially having the ability to cause trouble if
resources are recovered elsewhere.

Having a separate network interface for fencing device access (ideally
on a separate physical card) is a good idea, so the interface is not a
single point of failure. Connecting that interface via a dedicated
switch, on a different UPS than the main switch, improves it even more.

A hardware watchdog is a good way to do fencing without all that
trouble.

> Thanks for echoing my thoughts and that interesting quorum-weight
> idea.
> 
> 
> > 
> > On 7/26/21 12:21 PM, john tillman wrote:
> > > They would continue running their resources and we would have
> > > split
> > > brain.
> > > 
> > > So there is no safe way to support a two node cluster 100% of the
> > > time.
> > > But when all you have are two nodes and a switch ... well, when
> > > life
> > > gives
> > > you lemons ...
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> > 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs add node command is success but node is not configured to existing cluster

2021-07-28 Thread S Sathish S via Users
Hi Team,

we are trying to add node03 to existing cluster after adding we could see only 
2 nodes are configured and validated corosync log also "Waiting for all cluster 
members. Current votes: 2 expected_votes: 3" but in node3 pcs cluster status 
output it show 3 nodes are configured and no resource are listing but in node02 
we have 40 resource configured which is not reflecting on node03.

This issue occur only on few problematic hardware not on all hardware , we 
don't know why this is joining into cluster.

[root@node02 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: node01 (version 2.0.2-744a30d655) - partition WITHOUT quorum
Last updated: Wed Jul 28 14:58:13 2021
Last change: Wed Jul 28 14:41:41 2021 by root via cibadmin on node01
2 nodes configured
40 resources configured

PCSD Status:
  node02: Online
  node01: Online
  node03: Online
[root@node02 ~]#

Corosync log on node added execution :
Jul 28 11:15:05 [17598] node01 corosync notice  [TOTEM ] A new membership 
(10.216.x.x:42660) was formed. Members
Jul 28 11:15:05 [17598] node01 corosync notice  [QUORUM] Members[2]: 1 2
Jul 28 11:15:05 [17598] node01 corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.
Jul 28 11:15:05 [17598] node01 corosync notice  [CFG   ] Config reload 
requested by node 1
Jul 28 11:15:05 [17598] node01 corosync notice  [TOTEM ] adding new UDPU member 
{10.216.x.x}
Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3
Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3
Jul 28 11:15:07 [17599] node01 corosync notice  [TOTEM ] A new membership 
(10.216.x.x:42664) was formed. Members
Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3
Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3
Jul 28 11:15:07 [17599] node01 corosync notice  [QUORUM] This node is within 
the non-primary component and will NOT provide any services.
Jul 28 11:15:07 [17599] node01 corosync notice  [QUORUM] Members[2]: 1 2
Jul 28 11:15:07 [17599] node01 corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.
Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3
Jul 28 11:15:07 [17599] node01 corosync notice  [VOTEQ ] Waiting for all 
cluster members. Current votes: 2 expected_votes: 3
Jul 28 11:15:11 [17599] node01 corosync notice  [TOTEM ] A new membership 
(10.216.x.x:42668) was formed. Members


[root@node03 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: node03 (version 2.0.2-744a30d655) - partition WITHOUT quorum
Last updated: Wed Jul 28 15:04:31 2021
Last change: Wed Jul 28 15:04:00 2021 by root via cibadmin on node03
3 nodes configured
0 resources configured

PCSD Status:
  node03: Online
  node01: Online
  node02: Online
[root@node03 ~]#

Thanks and Regards,
S Sathish S
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-28 Thread john tillman



> Technically you could give one vote to one node and zero to the other. 
> If they lose contact only the server with one vote would make quorum. 
> The downside is that if the server with 1 vote goes down the entire
> cluster comes to a halt.
>
>
> That said, if both nodes can reach the same switch that they are
> connected to each other through, why can't they reach each other?
>

"... why can't they reach each other?"  My question as well.

It feels like a very low probability thing to me.  Some
blockage/filtering/delay of the cluster's "quorum packets" while ping
packets were allowed through, perhaps caused by network congestion.  But
I'm not a network engineer.  Any network engineers reading this care to
comment?

Thanks for echoing my thoughts and that interesting quorum-weight idea.


>
> On 7/26/21 12:21 PM, john tillman wrote:
>> They would continue running their resources and we would have split
>> brain.
>>
>> So there is no safe way to support a two node cluster 100% of the time.
>> But when all you have are two nodes and a switch ... well, when life
>> gives
>> you lemons ...
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/