Re: [ClusterLabs] pcs add node command is success but node is not configured to existing cluster
Firewall issue ? Did you check on corosync level if all nodes reach each other ? Best Regards, Strahil Nikolov В сряда, 28 юли 2021 г., 16:32:51 ч. Гринуич+3, S Sathish S via Users написа: Hi Team, we are trying to add node03 to existing cluster after adding we could see only 2 nodes are configured and validated corosync log also "Waiting for all cluster members. Current votes: 2 expected_votes: 3" but in node3 pcs cluster status output it show 3 nodes are configured and no resource are listing but in node02 we have 40 resource configured which is not reflecting on node03. This issue occur only on few problematic hardware not on all hardware , we don’t know why this is joining into cluster. [root@node02 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: node01 (version 2.0.2-744a30d655) - partition WITHOUT quorum Last updated: Wed Jul 28 14:58:13 2021 Last change: Wed Jul 28 14:41:41 2021 by root via cibadmin on node01 2 nodes configured 40 resources configured PCSD Status: node02: Online node01: Online node03: Online [root@node02 ~]# Corosync log on node added execution : Jul 28 11:15:05 [17598] node01 corosync notice [TOTEM ] A new membership (10.216.x.x:42660) was formed. Members Jul 28 11:15:05 [17598] node01 corosync notice [QUORUM] Members[2]: 1 2 Jul 28 11:15:05 [17598] node01 corosync notice [MAIN ] Completed service synchronization, ready to provide service. Jul 28 11:15:05 [17598] node01 corosync notice [CFG ] Config reload requested by node 1 Jul 28 11:15:05 [17598] node01 corosync notice [TOTEM ] adding new UDPU member {10.216.x.x} Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [TOTEM ] A new membership (10.216.x.x:42664) was formed. Members Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [QUORUM] This node is within the non-primary component and will NOT provide any services. Jul 28 11:15:07 [17599] node01 corosync notice [QUORUM] Members[2]: 1 2 Jul 28 11:15:07 [17599] node01 corosync notice [MAIN ] Completed service synchronization, ready to provide service. Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:11 [17599] node01 corosync notice [TOTEM ] A new membership (10.216.x.x:42668) was formed. Members [root@node03 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: node03 (version 2.0.2-744a30d655) - partition WITHOUT quorum Last updated: Wed Jul 28 15:04:31 2021 Last change: Wed Jul 28 15:04:00 2021 by root via cibadmin on node03 3 nodes configured 0 resources configured PCSD Status: node03: Online node01: Online node02: Online [root@node03 ~]# Thanks and Regards, S Sathish S ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?
So far, I never had a cluster with nodes directly connected to the same switches. Usually it's a nodeA -> switchA -> switchB -> nodeB and sometimes connectivity between switches goes down (for example a firewall rule). Best Regards, Strahil Nikolov В сряда, 28 юли 2021 г., 15:51:36 ч. Гринуич+3, john tillman написа: > Technically you could give one vote to one node and zero to the other. > If they lose contact only the server with one vote would make quorum. > The downside is that if the server with 1 vote goes down the entire > cluster comes to a halt. > > > That said, if both nodes can reach the same switch that they are > connected to each other through, why can't they reach each other? > "... why can't they reach each other?" My question as well. It feels like a very low probability thing to me. Some blockage/filtering/delay of the cluster's "quorum packets" while ping packets were allowed through, perhaps caused by network congestion. But I'm not a network engineer. Any network engineers reading this care to comment? Thanks for echoing my thoughts and that interesting quorum-weight idea. > > On 7/26/21 12:21 PM, john tillman wrote: >> They would continue running their resources and we would have split >> brain. >> >> So there is no safe way to support a two node cluster 100% of the time. >> But when all you have are two nodes and a switch ... well, when life >> gives >> you lemons ... > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?
On Wed, 2021-07-28 at 08:51 -0400, john tillman wrote: > > Technically you could give one vote to one node and zero to the > > other. > > If they lose contact only the server with one vote would make > > quorum. > > The downside is that if the server with 1 vote goes down the entire > > cluster comes to a halt. > > > > > > That said, if both nodes can reach the same switch that they are > > connected to each other through, why can't they reach each other? > > > > "... why can't they reach each other?" My question as well. > > It feels like a very low probability thing to me. Some > blockage/filtering/delay of the cluster's "quorum packets" while ping > packets were allowed through, perhaps caused by network > congestion. But > I'm not a network engineer. Any network engineers reading this care > to > comment? It's not necessarily that they can't reach each other, but that one is unresponsive. A kernel driver temporarily blocking activity, CPU or I/O overload, or losing an essential disk drive (which won't affect networking) can all cause a server to become unresponsive to cluster traffic while still potentially having the ability to cause trouble if resources are recovered elsewhere. Having a separate network interface for fencing device access (ideally on a separate physical card) is a good idea, so the interface is not a single point of failure. Connecting that interface via a dedicated switch, on a different UPS than the main switch, improves it even more. A hardware watchdog is a good way to do fencing without all that trouble. > Thanks for echoing my thoughts and that interesting quorum-weight > idea. > > > > > > On 7/26/21 12:21 PM, john tillman wrote: > > > They would continue running their resources and we would have > > > split > > > brain. > > > > > > So there is no safe way to support a two node cluster 100% of the > > > time. > > > But when all you have are two nodes and a switch ... well, when > > > life > > > gives > > > you lemons ... > > > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] pcs add node command is success but node is not configured to existing cluster
Hi Team, we are trying to add node03 to existing cluster after adding we could see only 2 nodes are configured and validated corosync log also "Waiting for all cluster members. Current votes: 2 expected_votes: 3" but in node3 pcs cluster status output it show 3 nodes are configured and no resource are listing but in node02 we have 40 resource configured which is not reflecting on node03. This issue occur only on few problematic hardware not on all hardware , we don't know why this is joining into cluster. [root@node02 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: node01 (version 2.0.2-744a30d655) - partition WITHOUT quorum Last updated: Wed Jul 28 14:58:13 2021 Last change: Wed Jul 28 14:41:41 2021 by root via cibadmin on node01 2 nodes configured 40 resources configured PCSD Status: node02: Online node01: Online node03: Online [root@node02 ~]# Corosync log on node added execution : Jul 28 11:15:05 [17598] node01 corosync notice [TOTEM ] A new membership (10.216.x.x:42660) was formed. Members Jul 28 11:15:05 [17598] node01 corosync notice [QUORUM] Members[2]: 1 2 Jul 28 11:15:05 [17598] node01 corosync notice [MAIN ] Completed service synchronization, ready to provide service. Jul 28 11:15:05 [17598] node01 corosync notice [CFG ] Config reload requested by node 1 Jul 28 11:15:05 [17598] node01 corosync notice [TOTEM ] adding new UDPU member {10.216.x.x} Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [TOTEM ] A new membership (10.216.x.x:42664) was formed. Members Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [QUORUM] This node is within the non-primary component and will NOT provide any services. Jul 28 11:15:07 [17599] node01 corosync notice [QUORUM] Members[2]: 1 2 Jul 28 11:15:07 [17599] node01 corosync notice [MAIN ] Completed service synchronization, ready to provide service. Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:07 [17599] node01 corosync notice [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3 Jul 28 11:15:11 [17599] node01 corosync notice [TOTEM ] A new membership (10.216.x.x:42668) was formed. Members [root@node03 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: node03 (version 2.0.2-744a30d655) - partition WITHOUT quorum Last updated: Wed Jul 28 15:04:31 2021 Last change: Wed Jul 28 15:04:00 2021 by root via cibadmin on node03 3 nodes configured 0 resources configured PCSD Status: node03: Online node01: Online node02: Online [root@node03 ~]# Thanks and Regards, S Sathish S ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?
> Technically you could give one vote to one node and zero to the other. > If they lose contact only the server with one vote would make quorum. > The downside is that if the server with 1 vote goes down the entire > cluster comes to a halt. > > > That said, if both nodes can reach the same switch that they are > connected to each other through, why can't they reach each other? > "... why can't they reach each other?" My question as well. It feels like a very low probability thing to me. Some blockage/filtering/delay of the cluster's "quorum packets" while ping packets were allowed through, perhaps caused by network congestion. But I'm not a network engineer. Any network engineers reading this care to comment? Thanks for echoing my thoughts and that interesting quorum-weight idea. > > On 7/26/21 12:21 PM, john tillman wrote: >> They would continue running their resources and we would have split >> brain. >> >> So there is no safe way to support a two node cluster 100% of the time. >> But when all you have are two nodes and a switch ... well, when life >> gives >> you lemons ... > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/