Hey Andrei, Ulrich
I am working with Janghyuk on his testing effort. Thank you for your responses, you have clarified some of the terminology we have been misusing.
As Janghyuk mentions previously, we have two "full cluster" nodes using two-node quorum and multiple heart beat rings + two more servers as pacemaker-remotes. The pacemaker-remote connection resources each prefer a specific full cluster node to run on, however they are configured such that they can fail over to the other cluster node if needed. Here is the configuration again...
corosync.conf - nodelist & quorum
I am working with Janghyuk on his testing effort. Thank you for your responses, you have clarified some of the terminology we have been misusing.
As Janghyuk mentions previously, we have two "full cluster" nodes using two-node quorum and multiple heart beat rings + two more servers as pacemaker-remotes. The pacemaker-remote connection resources each prefer a specific full cluster node to run on, however they are configured such that they can fail over to the other cluster node if needed. Here is the configuration again...
corosync.conf - nodelist & quorum
nodelist {
node {
ring0_addr: node-1-subnet-1
ring1_addr: node-1-subnet-2
name: jangcluster-srv-1
nodeid: 1
}
node {
ring0_addr: node-2-subnet-1
ring1_addr: node-2-subnet-2
name: jangcluster-srv-2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
node {
ring0_addr: node-1-subnet-1
ring1_addr: node-1-subnet-2
name: jangcluster-srv-1
nodeid: 1
}
node {
ring0_addr: node-2-subnet-1
ring1_addr: node-2-subnet-2
name: jangcluster-srv-2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
crm config show
node 1: jangcluster-srv-1
node 2: jangcluster-srv-2
node jangcluster-srv-3:remote
node jangcluster-srv-4:remote
primitive GPFS-Fence stonith:fence_gpfs \
params instance=regress1 shared_filesystem="<shared filesystem path>" pcmk_host_list=" jangcluster-srv-1 jangcluster-srv-2 jangcluster-srv-3 jangcluster-srv-4" secure=true \
op monitor interval=30s timeout=500s \
op off interval=0 \
meta is-managed=true
primitive jangcluster-srv-3 ocf:pacemaker:remote \
params server=jangcluster-srv-3 reconnect_interval=1m \
op monitor interval=30s \
op_params migration-threshold=1 \
op stop interval=0 \
meta is-managed=true
primitive jangcluster-srv-4 ocf:pacemaker:remote \
params server=jangcluster-srv-4 reconnect_interval=1m \
op monitor interval=30s \
op_params migration-threshold=1 \
meta is-managed=true
location prefer-CF-Hosts GPFS-Fence \
rule 100: #uname eq jangcluster-srv-1 or #uname eq jangcluster-srv-2
location prefer-node-jangcluster-srv-3 jangcluster-srv-3 100: jangcluster-srv-1
location prefer-node-jangcluster-srv-3-2 jangcluster-srv-3 50: jangcluster-srv-2
location prefer-node-jangcluster-srv-4 jangcluster-srv-4 100: jangcluster-srv-2
location prefer-node-jangcluster-srv-4-2 jangcluster-srv-4 50: jangcluster-srv-1
node 2: jangcluster-srv-2
node jangcluster-srv-3:remote
node jangcluster-srv-4:remote
primitive GPFS-Fence stonith:fence_gpfs \
params instance=regress1 shared_filesystem="<shared filesystem path>" pcmk_host_list=" jangcluster-srv-1 jangcluster-srv-2 jangcluster-srv-3 jangcluster-srv-4" secure=true \
op monitor interval=30s timeout=500s \
op off interval=0 \
meta is-managed=true
primitive jangcluster-srv-3 ocf:pacemaker:remote \
params server=jangcluster-srv-3 reconnect_interval=1m \
op monitor interval=30s \
op_params migration-threshold=1 \
op stop interval=0 \
meta is-managed=true
primitive jangcluster-srv-4 ocf:pacemaker:remote \
params server=jangcluster-srv-4 reconnect_interval=1m \
op monitor interval=30s \
op_params migration-threshold=1 \
meta is-managed=true
location prefer-CF-Hosts GPFS-Fence \
rule 100: #uname eq jangcluster-srv-1 or #uname eq jangcluster-srv-2
location prefer-node-jangcluster-srv-3 jangcluster-srv-3 100: jangcluster-srv-1
location prefer-node-jangcluster-srv-3-2 jangcluster-srv-3 50: jangcluster-srv-2
location prefer-node-jangcluster-srv-4 jangcluster-srv-4 100: jangcluster-srv-2
location prefer-node-jangcluster-srv-4-2 jangcluster-srv-4 50: jangcluster-srv-1
However when we attempt to simulate a communication failure on srv-2's Ethernet adapter via iptables, we observe srv-3's host getting fenced before the connection resource fails over to srv-1.
The concern here is that in the future we may have many remotes connecting to a single cluster host, and so far it seems like a Ethernet adapter issue on the cluster host could lead to many remote hosts getting unnecessarily fenced.
Here are the updated iptables commands that we run on srv-2 to simulate srv-2 losing the ability to communicate to srv-4.
iptables -A INPUT -s [IP of srv-1] -j DROP ; iptables -A OUTPUT -d [IP of srv-1] -j DROP
iptables -A INPUT -s [IP of srv-3] -j DROP ; iptables -A OUTPUT -d [IP of srv-3] -j DROP
iptables -A INPUT -s [IP of srv-4] -j DROP ; iptables -A OUTPUT -d [IP of srv-4] -j DROP
As Janghyuk has shown previously, it seems that the pacemaker-remote connection monitor timesout and causes the remote host to get fenced. Here are the logs that I think are most relevant.
Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] (pe_get_failcount) info: jangcluster-srv-4 has failed 1 times on jangcluster-srv-2
Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] (pe_get_failcount) info: jangcluster-srv-4 has failed 1 times on jangcluster-srv-2
Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] (pe_get_failcount) info: jangcluster-srv-4 has failed 1 times on jangcluster-srv-2
Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] (pe_get_failcount) info: jangcluster-srv-4 has failed 1 times on jangcluster-srv-2
Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] (unpack_rsc_op_failure) warning: Unexpected result (error) was recorded for monitor of jangcluster-srv-4 on jangcluster-srv-2 at Oct 22 12:21:09 2021 | rc=1 id=jangcluster-srv-4_last_failure_0
Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] (unpack_rsc_op_failure) notice: jangcluster-srv-4 will not be started under current conditions
Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] (pe_fence_node) warning: Remote node jangcluster-srv-4 will be fenced: remote connection is unrecoverable
What we also found to be interesting is that if the cluster is only using a single heartbeat ring, then srv-2 will get fenced instead, and the pacemaker-remote connection resources will successfully fail over without any additional fencing to the remote nodes themselves. It seems a little backwards to us since our reasoning for configuring multiple heartbeat rings was to increase the clusters reliability/robustness of the cluster, but it seems to do the opposite when using pacemaker-remote. :(
Any suggestions/comments on our configuration / test scenario's are appreciated!
Gerry Sommerville
E-mail: [email protected]
----- Original message -----
From: "Andrei Borzenkov" <[email protected]>
Sent by: "Users" <[email protected]>
To: "Cluster Labs - All topics related to open-source clustering welcomed" <[email protected]>
Cc:
Subject: [EXTERNAL] Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue
Date: Thu, Oct 28, 2021 3:58 AM
On Thu, Oct 28, 2021 at 10:30 AM Ulrich Windl
<[email protected]> wrote:
>
> Fencing _is_ a part of failover!
>
As any blanket answer this is mostly incorrect in this context.
There are two separate objects here - remote host itself and pacemaker
resource used to connect to and monitor state of remote host.
Remote host itself does not failover. Resources on this host do, but
OP does not ask about it.
Pacemaker resource used to monitor remote host may failover as any
other cluster resource. This failover does not require any fencing *of
remote host itself*, and in this particular case connection between
two cluster nodes was present all the time (at least, as long as we
can believe logs) so there was no reason for fencing as well. Whether
pacemaker should attempt to failover this resource to another node if
connection to remote host fails, is subject to discussion.
So fencing of the remote host itself is most certainly *not* part of
the failover of the resource that monitors this remote host.
> >>> "Janghyuk Boo" <[email protected]> schrieb am 26.10.2021 um 22:09 in
> Nachricht
> <of6751af09.dd2c657c-on0025877a.006ea8cb-0025877a.006eb...@ibm.com>:
> Dear Community ,
> Thank you Ken for your reply last time.
> I attached the log messages as requested from the last thread.
> I have a Pacemaker cluster with two cluster nodes with two network interfaces
> each, and two remote nodes and a prototyped fencing agent(GPFS-Fence) to cut a
> hosts access from the clustered filesystem.
> I noticed that remote node gets fenced when the quorum node its connected to
> gets fenced or experiences network failure.
> For example, when I disconnected srv-2 from the rest of the cluster by using
> iptables on srv-2
> iptables -A INPUT -s [IP of srv-1] -j DROP ; iptables -A OUTPUT -s [IP of
> srv-1] -j DROP
> iptables -A INPUT -s [IP of srv-3] -j DROP ; iptables -A OUTPUT -s [IP of
> srv-3] -j DROP
> iptables -A INPUT -s [IP of srv-4] -j DROP ; iptables -A OUTPUT -s [IP of
> srv-4] -j DROP
> I expected that remote node jangcluster-srv-4 would failover to srv-1 given my
> location constraints,
> but remote node’s monitor ‘jangcluster-srv-4_monitor’ failed and srv-4 was
> getting fenced before attempting to failover.
> What would be the proper way to simulate the network failover?
> How can I configure the cluster so that remote node srv-4 fails over instead
> of getting fenced?
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
