[ClusterLabs] Beginner lost with promotable "group" design

2024-01-17 Thread Adam Cécile

Hello,


I'm trying to achieve the following setup with 3 hosts:

* One master gets a shared IP, then remove default gw, add another gw, 
start a service


* Two slaves should have none of them but add a different default gw

I managed quite easily to get the master workflow running with ordering 
constraints but I don't understand how I should move forward with the 
slave configuration.


I think I must create a promotable resource first then assign my other 
resources with started/stopped  setting depending on the promote status 
of the node. Is that correct ? How to create a promotable "placeholder" 
where I can later attach my existing resources ?


Sorry for the stupid question but I really don't understand what type of 
elements I should create...



Thanks in advance,

Regards, Adam.


PS: Bonus question should I use "pcs" or "crm" ? It seems both command 
seem to be equivalent and documentations use sometime one or another


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Mutually exclusive resources ?

2023-10-02 Thread Adam Cécile

On 9/27/23 16:58, Ken Gaillot wrote:

On Wed, 2023-09-27 at 16:24 +0200, Adam Cecile wrote:

On 9/27/23 16:02, Ken Gaillot wrote:

On Wed, 2023-09-27 at 15:42 +0300, Andrei Borzenkov wrote:

On Wed, Sep 27, 2023 at 3:21 PM Adam Cecile 
wrote:

Hello,


I'm struggling to understand if it's possible to create some
kind
of constraint to avoid two different resources to be running on
the
same host.

Basically, I'd like to have floating IP "1" and floating IP "2"
always being assigned to DIFFERENT nodes.

Is that something possible ?

Sure, negative colocation constraint.


Can you give me a hint ?


Using crmsh:

colcoation IP1-no-with-IP2 -inf: IP1 IP2


Thanks in advance, Adam.

To elaborate, use -INFINITY if you want the IPs to *never* run on
the
same node, even if there are no other nodes available (meaning one
of
them has to stop). If you *prefer* that they run on different
nodes,
but want to allow them to run on the same node in a degraded
cluster,
use a finite negative score.

That's exactly what I tried to do:
crm configure primitive Freeradius systemd:freeradius.service op
start interval=0 timeout=120 op stop interval=0 timeout=120 op
monitor interval=60 timeout=100
crm configure clone Clone-Freeradius Freeradius

crm configure primitive Shared-IPv4-Cisco-ISE-1 IPaddr2 params
ip=10.1.1.1 nic=eth0 cidr_netmask=24 meta migration-threshold=2 op
monitor interval=60 timeout=30 resource-stickiness=50
crm configure primitive Shared-IPv4-Cisco-ISE-2 IPaddr2 params
ip=10.1.1.2 nic=eth0 cidr_netmask=24 meta migration-threshold=2 op
monitor interval=60 timeout=30 resource-stickiness=50

crm configure location Shared-IPv4-Cisco-ISE-1-Prefer-BRT Shared-
IPv4-Cisco-ISE-1 50: infra-brt
crm configure location Shared-IPv4-Cisco-ISE-2-Prefer-BTZ Shared-
IPv4-Cisco-ISE-2 50: infra-btz
crm configure colocation Shared-IPv4-Cisco-ISE-Different-Nodes -100:
Shared-IPv4-Cisco-ISE-1 Shared-IPv4-Cisco-ISE-2
My hope is that IP1 stays in infra-brt and IP2 goes on infra-btz. I
want to allow them to keep running on different host so I also added
stickiness. However, I really do not want them to both run on same
node so I added a colocation with negative higher score.
Does it looks good to you ?

Yep, that should work.

The way you have it, if there's some sort of problem and both IPs end
up on the same node, the IP that doesn't prefer that node will move
back to its preferred node once the problem is resolved. That sounds
like what you want, but if you'd rather it not move, you could raise
stickiness above 100.


Hello,

Yes that's actually what I want. Clients are supposed to use both 
addresses so it really does not make any sens to have both IPs assigned 
to the same host.


Thanks a lot for your help



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] [External] : Reload DNSMasq after IPAddr2 change ?

2023-02-10 Thread Adam Cécile

On 2/9/23 15:14, Robert Hayden wrote:

From: Users  On Behalf Of Adam Cecile
Sent: Thursday, February 9, 2023 3:47 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: [External] : [ClusterLabs] Reload DNSMasq after IPAddr2 change ?

Hello,

I might be stupid but I'm completely stuck with this requirement. We just figured 
out DNSMasq proxy is not working correctly after shared IP address is moved from 
one host to another because it does not > listen on the new address.
My need is to issue a reload statement to DNSMasq to make it work again but I 
failed to find anyone describing how to implement this so I guess I completely 
wrong.


Look into Alerts and Recipients.  I had a project in Oracle's Cloud where I 
needed to register the virtual IP address with the infrastructure to get 
network traffic routed properly.   To do that is beyond the scope of your 
issue, but I created the following pcs Alert and Recipient structures.   The 
oci_move_ip.sh script can contain your commands to DNSMasq.

pcs alert create id=move_ip description="Move VIP using oci-cli" 
path=/usr/local/cluster/oci_move_ip.sh
pcs alert recipient add move_ip id=logfile_move_ip 
value=/var/log/pacemaker_move_ip.log

Relevant contents of my oci_move_ip.sh.  The Alert is triggered at any resource 
action, so you have to use the IF clause to limit it to a successful resource 
start.

#!/bin/bash
export LC_ALL=C.UTF-8
export LANG=c.UTF-8
if [ -z "$CRM_alert_version" ]; then
 echo "$0 must be run by Pacemaker version 1.1.15 or later"
 exit 0
fi
# Alert agents must always handle the case where no recipients are defined,
# even if it's a no-op (a recipient might be added to the configuration later).
if [ -z "${CRM_alert_recipient}" ]; then
 CRM_alert_recipient=/var/log/pacemaker_move_ip.log
fi
## 
https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#document-alerts
## CRM_alert_kind   The type of alert (node, fencing, resource, or 
attribute)
## CRM_alert_target_rc  The expected numerical return code of the operation 
(resource alerts only)
## CRM_alert_task   The requested fencing or resource operation (provided 
with fencing and resource alerts only)
## CRM_alert_rscThe name of the affected resource (resource alerts only)
## CRM_alert_node   Name of affected node
## Determine if resource is associated to IPaddr2 type with action being a 
successful start
if [[ "${CRM_alert_kind}" = "resource" && "${CRM_alert_target_rc}" = "0" && 
"${CRM_alert_task}" = "start" \
   && $(pcs resource show "${CRM_alert_rsc}" 2>/dev/null | grep -c "class=ocf 
provider=heartbeat type=IPaddr2") -eq 1 ]]; then

  

fi
exit 0


Thanks a lot.

It looks like extremely hackish but if I understood properly, I should 
be able to workaround my issue with that.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Automatic recover from split brain ?

2020-08-11 Thread Adam Cécile

On 8/11/20 8:48 AM, Andrei Borzenkov wrote:

08.08.2020 13:10, Adam Cécile пишет:

Hello,


I'm experiencing issue with corosync/pacemaker running on Debian Buster.
Cluster has three nodes running in VMWare virtual machine and the
cluster fails when VEEAM backups the virtual machine (I know it's doing
bad things, like freezing completely the VM for a few minutes to make
disk snapshot).

My biggest issue is that once the backup has been completed, the cluster
stays in split brain state, and I'd like it to heal itself. Here current
status:


One node is isolated:

Stack: corosync
Current DC: host2.domain.com (version 2.0.1-9e909a5bdd) - partition
WITHOUT quorum
Last updated: Sat Aug  8 11:59:46 2020
Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on
host1.domain.com

3 nodes configured
6 resources configured

Online: [ host2.domain.com ]
OFFLINE: [ host3.domain.com host1.domain.com ]


Two others are seeing each others:

Stack: corosync
Current DC: host3.domain.com (version 2.0.1-9e909a5bdd) - partition with
quorum
Last updated: Sat Aug  8 12:07:56 2020
Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on
host1.domain.com

3 nodes configured
6 resources configured

Online: [ host3.domain.com host1.domain.com ]
OFFLINE: [ host2.domain.com ]


Show your full configuration including defined STONITH resources and
cluster options (most importantly, no-quorum-policy and stonith-enabled).


Hello,

Stonith is disabled and I tried various settings for no-quorum-policy.


The problem is that one of the resources is a floating IP address which
is currently assigned to two different hosts...


Of course - each partition assumes another partition is dead and so it
is free to take over remaining resources.
I understand that but I still don't get why once all nodes are back 
online, the cluster does not heal from resources running one multiple hosts.



Can you help me configuring the cluster correctly so this cannot occurs ?


Define "correctly".

The most straightforward text book answer - you need to have STONITH
resources that will eliminate "lost" node. But your lost node is in the
middle of performing backup. Eliminating it may invalidate backup being
created.
Yeah but well, no. Killing the node is worse, sensible services are 
already running in clustering mode at application level so they do not 
rely on corosync. Basically corosync is providing a floating IP for some 
external non critical access and starting systemd timers that are 
pointless to be run on multiple hosts. Nothing critical here.


So another answer would be - put cluster in maintenance mode, perform
backup, resume normal operation. Usually backup software allows hooks to
be executed before and after backup. It may work too.
This in indeed something I might look at, but again, for my trivial 
needs it sounds a bit overkill to me.

Or find a way to not freeze VM during backup ... e.g. by using different
backup method?


Or tweaks some network settings so corosync does not consider the node 
as being dead too soon ? Backup won't last more than 2 minutes and the 
freeze is usually way below. I can definitely leave with cluster state 
being unknown for a couple of minutes. Is that possible ?


Removing VEEAM is indeed my last option and the one I used so far, but 
this time I was hoping someone else would be experiencing the same issue 
and could help me fixing that in a clean way.



Thanks


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Automatic recover from split brain ?

2020-08-10 Thread Adam Cécile

Hello,


I'm experiencing issue with corosync/pacemaker running on Debian Buster. 
Cluster has three nodes running in VMWare virtual machine and the 
cluster fails when VEEAM backups the virtual machine (I know it's doing 
bad things, like freezing completely the VM for a few minutes to make 
disk snapshot).


My biggest issue is that once the backup has been completed, the cluster 
stays in split brain state, and I'd like it to heal itself. Here current 
status:



One node is isolated:

Stack: corosync
Current DC: host2.domain.com (version 2.0.1-9e909a5bdd) - partition 
WITHOUT quorum

Last updated: Sat Aug  8 11:59:46 2020
Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on 
host1.domain.com


3 nodes configured
6 resources configured

Online: [ host2.domain.com ]
OFFLINE: [ host3.domain.com host1.domain.com ]


Two others are seeing each others:

Stack: corosync
Current DC: host3.domain.com (version 2.0.1-9e909a5bdd) - partition with 
quorum

Last updated: Sat Aug  8 12:07:56 2020
Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on 
host1.domain.com


3 nodes configured
6 resources configured

Online: [ host3.domain.com host1.domain.com ]
OFFLINE: [ host2.domain.com ]


The problem is that one of the resources is a floating IP address which 
is currently assigned to two different hosts...



Can you help me configuring the cluster correctly so this cannot occurs ?


Thanks in advance,

Adam.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Automatic recover from split brain ?

2020-08-10 Thread Adam Cécile

Hello,


I'm experiencing issue with corosync/pacemaker running on Debian Buster. 
Cluster has three nodes running in VMWare virtual machine and the 
cluster fails when VEEAM backups the virtual machine (I know it's doing 
bad things, like freezing completely the VM for a few minutes to make 
disk snapshot).


My biggest issue is that once the backup has been completed, the cluster 
stays in split brain state, and I'd like it to heal itself. Here current 
status:



One node is isolated:

Stack: corosync
Current DC: host2.domain.com (version 2.0.1-9e909a5bdd) - partition 
WITHOUT quorum

Last updated: Sat Aug  8 11:59:46 2020
Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on 
host1.domain.com


3 nodes configured
6 resources configured

Online: [ host2.domain.com ]
OFFLINE: [ host3.domain.com host1.domain.com ]


Two others are seeing each others:

Stack: corosync
Current DC: host3.domain.com (version 2.0.1-9e909a5bdd) - partition with 
quorum

Last updated: Sat Aug  8 12:07:56 2020
Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on 
host1.domain.com


3 nodes configured
6 resources configured

Online: [ host3.domain.com host1.domain.com ]
OFFLINE: [ host2.domain.com ]


The problem is that one of the resources is a floating IP address which 
is currently assigned to two different hosts...



Can you help me configuring the cluster correctly so this cannot occurs ?


Thanks in advance,

Adam.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/