[ClusterLabs] Pacemaker 2.1.7 final release now available
Hi all, Source code for Pacemaker version 2.1.7 is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7 This is primarily a bug fix release. See the ChangeLog or the link above for details. Many thanks to all contributors of source code to this release, including Chris Lumens, Gao,Yan, Grace Chin, Hideo Yamauchi, Jan Pokorný, Ken Gaillot, liupei, Oyvind Albrigtsen, Reid Wahl, xin liang, and xuezhixin. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Build cluster one node at a time
Correct. You want to enable pcsd to start at boot. Also, after starting pcsd the first time on a node, authorize it from the first node with "pcs host auth -u hacluster". On Tue, 2023-12-19 at 22:42 +0200, Tiaan Wessels wrote: > So i run the pcs add command for every new node on the first original > node, not on the node being added? Only corosync, pacemaker and pcsd > needs to run on the node to be added and the commands being run on > the original node will speak to these on the new node? > > On Tue, 19 Dec 2023, 21:39 Ken Gaillot, wrote: > > On Tue, 2023-12-19 at 17:03 +0200, Tiaan Wessels wrote: > > > Hi, > > > Is it possible to build a corosync pacemaker cluster on redhat9 > > one > > > node at a time? In other words, when I'm finished with the first > > node > > > and reboot it, all services are started on it. Then i build a > > second > > > node to integrate into the cluster and once done, pcs status > > shows > > > two nodes on-line ? > > > Thanks > > > > Yes, you can use pcs cluster setup with the first node, then pcs > > cluster node add for each additional node. > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Build cluster one node at a time
So i run the pcs add command for every new node on the first original node, not on the node being added? Only corosync, pacemaker and pcsd needs to run on the node to be added and the commands being run on the original node will speak to these on the new node? On Tue, 19 Dec 2023, 21:39 Ken Gaillot, wrote: > On Tue, 2023-12-19 at 17:03 +0200, Tiaan Wessels wrote: > > Hi, > > Is it possible to build a corosync pacemaker cluster on redhat9 one > > node at a time? In other words, when I'm finished with the first node > > and reboot it, all services are started on it. Then i build a second > > node to integrate into the cluster and once done, pcs status shows > > two nodes on-line ? > > Thanks > > Yes, you can use pcs cluster setup with the first node, then pcs > cluster node add for each additional node. > -- > Ken Gaillot > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Build cluster one node at a time
On Tue, 2023-12-19 at 17:03 +0200, Tiaan Wessels wrote: > Hi, > Is it possible to build a corosync pacemaker cluster on redhat9 one > node at a time? In other words, when I'm finished with the first node > and reboot it, all services are started on it. Then i build a second > node to integrate into the cluster and once done, pcs status shows > two nodes on-line ? > Thanks Yes, you can use pcs cluster setup with the first node, then pcs cluster node add for each additional node. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help
On 19.12.2023 21:42, Artem wrote: Andrei and Klaus thanks for prompt reply and clarification! As I understand, design and behavior of Pacemaker is tightly coupled with the stonith concept. But isn't it too rigid? If you insist on shooting yourself in the foot, pacemaker gives you the gun. It just does not load it by default and does not shoot itself. Seriously, this topic has been beaten to death. Just do some research. You can avoid fencing and rely on quorum in shared-nothing case. The prime example that I have seen is NetApp C-Mode ONTAP where the set of management processes go read-only preventing any modification when node(s) go(es) out of quorum. But as soon as you have shared resource, ignoring fencing will lead to data corruption sooner or later. Is there a way to leverage self-monitoring or pingd rules to trigger isolated node to umount its FS? Like vSphere High Availability host isolation response. Can resource-stickiness=off (auto-failback) decrease risk of corruption by unresponsive node coming back online? Is there a quorum feature not for cluster but for resource start/stop? Got lock - is welcome to mount, unable to refresh lease - force unmount. Can on-fail=ignore break manual failover logic (stopped will be considered as failed and thus ignored)? best regards, Artem On Tue, 19 Dec 2023 at 17:03, Klaus Wenninger wrote: On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov wrote: On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: ... Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is unrunnable (node is offline) Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (recurring_op_for_active)info: Start 20s-interval monitor for OST4 on lustre3 Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (log_list_item) notice: Actions: Stop OST4( lustre4 ) blocked This is the default for the failed stop operation. The only way pacemaker can resolve failure to stop a resource is to fence the node where this resource was active. If it is not possible (and IIRC you refuse to use stonith), pacemaker has no other choice as to block it. If you insist, you can of course sert on-fail=ignore, but this means unreachable node will continue to run resources. Whether it can lead to some corruption in your case I cannot guess. Don't know if I'm reading that correctly but I understand what you had written above that you try to trigger the failover by stopping the VM (lustre4) without ordered shutdown. With fencing disabled what we are seeing is exactly what we would expect: The state of the resource is unknown - pacemaker tries to stop it - doesn't work as the node is offline - no fencing configured - so everything it can do is wait till there is info if the resource is up or not. I guess the strange output below is because of fencing disabled - quite an unusual - also not recommended - configuration and so this might not have shown up too often in that way. Klaus Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (pcmk__create_graph) crit: Cannot fence lustre4 because of OST4: blocked (OST4_stop_0) That is a rather strange phrase. The resource is blocked because the pacemaker could not fence the node, not the other way round. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help
What if node (especially vm) freezes for several minutes and then continues to write to a shared disk where other nodes already put their data? In my opinion, fencing, preferably two-level, is mandatory for lustre, trust me, I'd developed whole HA stack for both Exascaler and PangeaFS. We've seen so many points where data loss may occur... On December 19, 2023 19:42:56 Artem wrote: Andrei and Klaus thanks for prompt reply and clarification! As I understand, design and behavior of Pacemaker is tightly coupled with the stonith concept. But isn't it too rigid? Is there a way to leverage self-monitoring or pingd rules to trigger isolated node to umount its FS? Like vSphere High Availability host isolation response. Can resource-stickiness=off (auto-failback) decrease risk of corruption by unresponsive node coming back online? Is there a quorum feature not for cluster but for resource start/stop? Got lock - is welcome to mount, unable to refresh lease - force unmount. Can on-fail=ignore break manual failover logic (stopped will be considered as failed and thus ignored)? best regards, Artem On Tue, 19 Dec 2023 at 17:03, Klaus Wenninger wrote: On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov wrote: On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: ... Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is unrunnable (node is offline) Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (recurring_op_for_active)info: Start 20s-interval monitor for OST4 on lustre3 Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (log_list_item) notice: Actions: Stop OST4( lustre4 ) blocked This is the default for the failed stop operation. The only way pacemaker can resolve failure to stop a resource is to fence the node where this resource was active. If it is not possible (and IIRC you refuse to use stonith), pacemaker has no other choice as to block it. If you insist, you can of course sert on-fail=ignore, but this means unreachable node will continue to run resources. Whether it can lead to some corruption in your case I cannot guess. Don't know if I'm reading that correctly but I understand what you had written above that you try to trigger the failover by stopping the VM (lustre4) without ordered shutdown. With fencing disabled what we are seeing is exactly what we would expect: The state of the resource is unknown - pacemaker tries to stop it - doesn't work as the node is offline - no fencing configured - so everything it can do is wait till there is info if the resource is up or not. I guess the strange output below is because of fencing disabled - quite an unusual - also not recommended - configuration and so this might not have shown up too often in that way. Klaus Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] (pcmk__create_graph) crit: Cannot fence lustre4 because of OST4: blocked (OST4_stop_0) That is a rather strange phrase. The resource is blocked because the pacemaker could not fence the node, not the other way round. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help
Andrei and Klaus thanks for prompt reply and clarification! As I understand, design and behavior of Pacemaker is tightly coupled with the stonith concept. But isn't it too rigid? Is there a way to leverage self-monitoring or pingd rules to trigger isolated node to umount its FS? Like vSphere High Availability host isolation response. Can resource-stickiness=off (auto-failback) decrease risk of corruption by unresponsive node coming back online? Is there a quorum feature not for cluster but for resource start/stop? Got lock - is welcome to mount, unable to refresh lease - force unmount. Can on-fail=ignore break manual failover logic (stopped will be considered as failed and thus ignored)? best regards, Artem On Tue, 19 Dec 2023 at 17:03, Klaus Wenninger wrote: > > > On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov > wrote: > >> On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: >> ... >> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] >> (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is >> unrunnable (node is offline) >> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] >> (recurring_op_for_active)info: Start 20s-interval monitor for OST4 on >> lustre3 >> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] >> (log_list_item) notice: Actions: Stop OST4( lustre4 >> ) blocked >> >> This is the default for the failed stop operation. The only way >> pacemaker can resolve failure to stop a resource is to fence the node >> where this resource was active. If it is not possible (and IIRC you >> refuse to use stonith), pacemaker has no other choice as to block it. >> If you insist, you can of course sert on-fail=ignore, but this means >> unreachable node will continue to run resources. Whether it can lead >> to some corruption in your case I cannot guess. >> > > Don't know if I'm reading that correctly but I understand what you had > written > above that you try to trigger the failover by stopping the VM (lustre4) > without > ordered shutdown. > With fencing disabled what we are seeing is exactly what we would expect: > The state of the resource is unknown - pacemaker tries to stop it - > doesn't work > as the node is offline - no fencing configured - so everything it can do > is wait > till there is info if the resource is up or not. > I guess the strange output below is because of fencing disabled - quite an > unusual - also not recommended - configuration and so this might not have > shown up too often in that way. > > Klaus > >> >> > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] >> (pcmk__create_graph) crit: Cannot fence lustre4 because of OST4: >> blocked (OST4_stop_0) >> >> That is a rather strange phrase. The resource is blocked because the >> pacemaker could not fence the node, not the other way round. >> ___ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] colocate Redis - weird
hi guys, Is this below not the weirdest thing? -> $ pcs constraint ref PGSQL-PAF-5435 Resource: PGSQL-PAF-5435 colocation-HA-10-1-1-84-PGSQL-PAF-5435-clone-INFINITY colocation-REDIS-6385-clone-PGSQL-PAF-5435-clone-INFINITY order-PGSQL-PAF-5435-clone-HA-10-1-1-84-Mandatory order-PGSQL-PAF-5435-clone-HA-10-1-1-84-Mandatory-1 colocation_set_PePePe Here Redis master should folow pgSQL master. Which such constraint: -> $ pcs resource status PGSQL-PAF-5435 * Clone Set: PGSQL-PAF-5435-clone [PGSQL-PAF-5435] (promotable): * Promoted: [ ubusrv1 ] * Unpromoted: [ ubusrv2 ubusrv3 ] -> $ pcs resource status REDIS-6385-clone * Clone Set: REDIS-6385-clone [REDIS-6385] (promotable): * Unpromoted: [ ubusrv1 ubusrv2 ubusrv3 ] If I remove that constrain: -> $ pcs constraint delete colocation-REDIS-6385-clone-PGSQL-PAF-5435-clone-INFINITY -> $ pcs resource status REDIS-6385-clone * Clone Set: REDIS-6385-clone [REDIS-6385] (promotable): * Promoted: [ ubusrv1 ] * Unpromoted: [ ubusrv2 ubusrv3 ] and ! I can manually move Redis master around, master moves to each server just fine. I again, add that constraint: -> $ pcs constraint colocation add master REDIS-6385-clone with master PGSQL-PAF-5435-clone and the same... ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Build cluster one node at a time
Hi, Is it possible to build a corosync pacemaker cluster on redhat9 one node at a time? In other words, when I'm finished with the first node and reboot it, all services are started on it. Then i build a second node to integrate into the cluster and once done, pcs status shows two nodes on-line ? Thanks ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help
On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov wrote: > On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: > ... > > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is > unrunnable (node is offline) > > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (recurring_op_for_active)info: Start 20s-interval monitor for OST4 on > lustre3 > > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (log_list_item) notice: Actions: Stop OST4( lustre4 > ) blocked > > This is the default for the failed stop operation. The only way > pacemaker can resolve failure to stop a resource is to fence the node > where this resource was active. If it is not possible (and IIRC you > refuse to use stonith), pacemaker has no other choice as to block it. > If you insist, you can of course sert on-fail=ignore, but this means > unreachable node will continue to run resources. Whether it can lead > to some corruption in your case I cannot guess. > Don't know if I'm reading that correctly but I understand what you had written above that you try to trigger the failover by stopping the VM (lustre4) without ordered shutdown. With fencing disabled what we are seeing is exactly what we would expect: The state of the resource is unknown - pacemaker tries to stop it - doesn't work as the node is offline - no fencing configured - so everything it can do is wait till there is info if the resource is up or not. I guess the strange output below is because of fencing disabled - quite an unusual - also not recommended - configuration and so this might not have shown up too often in that way. Klaus > > > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (pcmk__create_graph) crit: Cannot fence lustre4 because of OST4: > blocked (OST4_stop_0) > > That is a rather strange phrase. The resource is blocked because the > pacemaker could not fence the node, not the other way round. > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help
On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: ... > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is > unrunnable (node is offline) > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (recurring_op_for_active)info: Start 20s-interval monitor for OST4 on > lustre3 > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (log_list_item) notice: Actions: Stop OST4( lustre4 ) > blocked This is the default for the failed stop operation. The only way pacemaker can resolve failure to stop a resource is to fence the node where this resource was active. If it is not possible (and IIRC you refuse to use stonith), pacemaker has no other choice as to block it. If you insist, you can of course sert on-fail=ignore, but this means unreachable node will continue to run resources. Whether it can lead to some corruption in your case I cannot guess. > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (pcmk__create_graph) crit: Cannot fence lustre4 because of OST4: > blocked (OST4_stop_0) That is a rather strange phrase. The resource is blocked because the pacemaker could not fence the node, not the other way round. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/