Re: [ClusterLabs] Preferred node for a service (not constrained)
On 2020-11-30 23:21, Petr Bena wrote: Hello, Is there a way to setup a preferred node for a service? I know how to create constrain that will make it possible to run a service ONLY on certain node, or constrain that will make it impossible to run 2 services on same node, but I don't want any of that, as in catastrophical scenarios when services would have to be located together on same node, this would instead disable it. Essentially what I want is for service to be always started on preferred node when it is possible, but if it's not possible (eg. node is down) it would freely run on any other node, with no restrictions and when node is back up, it would migrate back. How can I do that? I do precisely this for an active/passive NFS/ZFS storage appliance pair. One of the VSA has more memory and is less used, so I have it set to prefer that host. https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_prefer_one_node_over_another.html I believe I used the value infinity, so it will prefer the 2nd host over the 1st if at all possible. My 'pcs constraint': [root@centos-vsa2 ~]# pcs constraint Location Constraints: Resource: group-zfs Enabled on: centos-vsa2 (score:INFINITY) Ordering Constraints: Colocation Constraints: Ticket Constraints: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] connection timed out fence_virsh monitor stonith
On 2020-02-24 12:17, Strahil Nikolov wrote: On February 24, 2020 4:56:07 PM GMT+02:00, Luke Camilleri wrote: Hello users, I would like to ask for assistance on the below setup please, mainly on the monitor fence timeout: I notice that the issue happens at 00:00 on both days . Have you checked for a backup or other cron job that is 'overloading' the virtualization host ? This is a very good point. I had a similar problem with a vsphere cluster. Two hyper-converged storage appliances. I used the fence-vmware-rest (or soap) stonith agent to fence the storage apps. Worked just fine. Until the vcenter server appliance got busy doing something or other. Next thing I know, I'm getting stonith agent timeouts. I ended up switching to fence_scsi. Not sure there is a good answer. I saw on a vmware forum a recommendation to increase the stonith timeout, but the recommended timeout was close to a minute, which is enough to be a problem for the VMs in that cluster... ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] 2 node clusters - ipmi fencing
On 2020-02-21 08:51, Ricardo Esteves wrote: Hi, I'm trying to understand what is the objective of the constraints to have the fencing devices running on opposite node or on its own node or running all on the same node. Can you explain the difference? IPMI fencing involves the instance interacting with the IPMI device and telling it to power-cycle/reset/whatever the target node. It waits for a confirmation, then tells the cluster SW the fencing operation was successful. If the instance is running on the same node, it will be blown away when the node is reset. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] 2 node clusters - ipmi fencing
I believe you in fact want each fence agent to run on the other node, yes. On February 20, 2020, at 6:23 PM, Ricardo Esteves wrote: Hi, I have a question regarding fencing, i have 2 physical servers: node01, node02, each one has an ipmi card so i create 2 fence devices: fence_ipmi_node01 (with ip of ipmi card of server node01) - with constraint to prefer to run on node01 fence_ipmi_node02 (with ip of ipmi card of server node02) - with constraint to prefer to run on node02 - configured also a 20s delay on this one Is this the best practice? Like this node01 can only fence itself right? and node02 also can only fence itself right? Shouldn't i configure fence_ipmi_node01 location constraint to be placed on node02 and fence_ipmi_node02 location constraint to be placed on node01 ? So that node01 can fence node02 and vice versa? ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] How to unfence without reboot (fence_mpath)
Many people don't have red hat access, so linking those urls is not useful. On February 17, 2020, at 1:40 AM, Strahil Nikolov wrote: Hello Ondrej, thanks for your reply. I really appreciate that. I have picked fence_multipath as I'm preparing for my EX436 and I can't know what agent will be useful on the exam. Also ,according to https://access.redhat.com/solutions/3201072 , there could be a race condition with fence_scsi. So, I've checked the cluster when fencing and the node immediately goes offline. Last messages from pacemaker are: Feb 17 08:21:57 node1.localdomain stonith-ng[23808]: notice: Client stonith_admin.controld.23888.b57ceee7 wants to fence (reboot) 'node1.localdomain' with device '(any)' Feb 17 08:21:57 node1.localdomain stonith-ng[23808]: notice: Requesting peer fencing (reboot) of node1.localdomain Feb 17 08:21:57 node1.localdomain stonith-ng[23808]: notice: FENCING can fence (reboot) node1.localdomain (aka. '1'): static-list Feb 17 08:21:58 node1.localdomain stonith-ng[23808]: notice: Operation reboot of node1.localdomain by node2.localdomain for stonith_admin.controld.23888@node1.localdomain.ede38ffb: OK Feb 17 08:21:58 node1.localdomain crmd[23812]: crit: We were allegedly just fenced by node2.localdomain for node1.localdomai Which for me means - node1 just got fenced again. Actually fencing works ,as I/O is immediately blocked and the reservation is removed. I've used https://access.redhat.com/solutions/2766611 to setup the fence_mpath , but I could have messed up something. Cluster config is: [root@node3 ~]# pcs config show Cluster Name: HACLUSTER2 Corosync Nodes: node1.localdomain node2.localdomain node3.localdomain Pacemaker Nodes: node1.localdomain node2.localdomain node3.localdomain Resources: Clone: dlm-clone Meta Attrs: interleave=true ordered=true Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s) start interval=0s timeout=90 (dlm-start-interval-0s) stop interval=0s timeout=100 (dlm-stop-interval-0s) Clone: clvmd-clone Meta Attrs: interleave=true ordered=true Resource: clvmd (class=ocf provider=heartbeat type=clvm) Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s) start interval=0s timeout=90s (clvmd-start-interval-0s) stop interval=0s timeout=90s (clvmd-stop-interval-0s) Clone: TESTGFS2-clone Meta Attrs: interleave=true Resource: TESTGFS2 (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/TEST/gfs2 directory=/GFS2 fstype=gfs2 options=noatime run_fsck=no Operations: monitor interval=15s on-fail=fence OCF_CHECK_LEVEL=20 (TESTGFS2-monitor-interval-15s) notify interval=0s timeout=60s (TESTGFS2-notify-interval-0s) start interval=0s timeout=60s (TESTGFS2-start-interval-0s) stop interval=0s timeout=60s (TESTGFS2-stop-interval-0s) Stonith Devices: Resource: FENCING (class=stonith type=fence_mpath) Attributes: devices=/dev/mapper/36001405cb123d000 pcmk_host_argument=key pcmk_host_map=node1.localdomain:1;node2.localdomain:2;node3.localdomain:3 pcmk_monitor_action=metadata pcmk_reboot_action=off Meta Attrs: provides=unfencing Operations: monitor interval=60s (FENCING-monitor-interval-60s) Fencing Levels: Location Constraints: Ordering Constraints: start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory) start clvmd-clone then start TESTGFS2-clone (kind:Mandatory) (id:order-clvmd-clone-TESTGFS2-clone-mandatory) Colocation Constraints: clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY) TESTGFS2-clone with clvmd-clone (score:INFINITY) (id:colocation-TESTGFS2-clone-clvmd-clone-INFINITY) Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set [root@node3 ~]# crm_mon -r1 Stack: corosync Current DC: node3.localdomain (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum Last updated: Mon Feb 17 08:39:30 2020 Last change: Sun Feb 16 18:44:06 2020 by root via cibadmin on node1.localdomain 3 nodes configured 10 resources configured Online: [ node2.localdomain node3.localdomain ] OFFLINE: [ node1.localdomain ] Full list of resources: FENCING (stonith:fence_mpath): Started node2.localdomain Clone Set: dlm-clone [dlm] Started: [ node2.localdomain node3.localdomain ] Stopped: [ node1.localdomain ] Clone Set: clvmd-clone [clvmd] Started: [ node2.localdomain node3.localdomain ] Stopped: [ node1.localdomain ] Clone Set: TESTGFS2-clone [TESTGFS2] Started: [ node2.localdomain node3.localdomain ] Stopped: [ node1.localdomain ] In the logs , I've noticed that the node is first unfenced and later it is fenced again... For the unfence , I believe "meta provides=unfencing" is 'guilty', yet I'm not sure about
Re: [ClusterLabs] Stonith configuration
On 2020-02-14 13:06, Strahil Nikolov wrote: On February 14, 2020 4:44:53 PM GMT+02:00, "BASDEN, ALASTAIR G." wrote: Hi Strahil, Note2: Consider adding a third node /for example a VM/ or a qdevice on a separate node (allows to be on a separate network, so a simple routing is the only requirement ) and reconfigure the cluster , so you have 'Expected votes: 3' . This will protect you from split brain and is highly recommended. Highly recommend qdevice. I spun one up on a small (paperback size) 'router' running CentOS7. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] fence-scsi question
On 2020-02-10 00:06, Strahil Nikolov wrote: On February 10, 2020 2:07:01 AM GMT+02:00, Dan Swartzendruber wrote: I have a 2-node CentOS7 cluster running ZFS. The two nodes (vsphere appliances on different hosts) access 2 SAS SSD in a Supermicro JBOD with 2 mini-SAS connectors. It all works fine - failover and all. My quandary was how to implement fencing. I was able to get both of the vmware SOAP and REST fencing agents to work - it just isn't reliable enough. If the vcenter server appliance is busy, fencing requests timeout. I know I can increase the timeouts, but in at least one test run, even a minute wasn't enough, and my concern is that too long switching over, and vmware will put the datastore in APD, hosing guests. I confirmed that both SSD work properly with the fence-scsi agent. Fencing the host who actively owns the ZFS pool also works perfectly (ZFS flushes data to the datastore every 5 seconds or so, so withdrawing the SCSI-3 persistent reservations causes a fatal write error to the pool, and setting the pool in failmode=panic will cause the fenced cluster node to reboot automatically.) The problem (maybe it isn't really one?) is that fencing the node that does *not* own the pool has no effect, since it holds no reservations on the devices in the pool.) I'd love to be sure this isn't an issue at all. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ Hi Dan, You can configure multiple fencing mechanisms in your cluster. For example, you can set the first fencing mechanism to be via VmWare and if it fails (being busy or currrently unavailable), then the scsi fencing can kick in to ensure a failover can be done. What you observe is normal - no scsi reservations -> no fencing. That's why major vendors require , when using fence_multipath/fence_scsi, the shared storage to be a dependency (a File system in use by the application) and not just an add-on. I personally don't like scsi reservations, as there is no guarantee that other resources (services, IPs, etc) are actually down , but the risk is low. In your case fence_scsi stonith can be a second layer of protection. Best Regards, Strahil Nikolov Okay, thanks. I'll look into multi-level then. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] fence-scsi question
I have a 2-node CentOS7 cluster running ZFS. The two nodes (vsphere appliances on different hosts) access 2 SAS SSD in a Supermicro JBOD with 2 mini-SAS connectors. It all works fine - failover and all. My quandary was how to implement fencing. I was able to get both of the vmware SOAP and REST fencing agents to work - it just isn't reliable enough. If the vcenter server appliance is busy, fencing requests timeout. I know I can increase the timeouts, but in at least one test run, even a minute wasn't enough, and my concern is that too long switching over, and vmware will put the datastore in APD, hosing guests. I confirmed that both SSD work properly with the fence-scsi agent. Fencing the host who actively owns the ZFS pool also works perfectly (ZFS flushes data to the datastore every 5 seconds or so, so withdrawing the SCSI-3 persistent reservations causes a fatal write error to the pool, and setting the pool in failmode=panic will cause the fenced cluster node to reboot automatically.) The problem (maybe it isn't really one?) is that fencing the node that does *not* own the pool has no effect, since it holds no reservations on the devices in the pool.) I'd love to be sure this isn't an issue at all. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby
On 9/11/2018 9:20 AM, Dan Ragle wrote: On 9/11/2018 1:59 AM, Andrei Borzenkov wrote: 07.09.2018 23:07, Dan Ragle пишет: On an active-active two node cluster with DRBD, dlm, filesystem mounts, a Web Server, and some crons I can't figure out how to have the crons jump from node to node in the correct order. Specifically, I have two crontabs (managed via symlink creation/deletion) which normally will run one on node1 and the other on node2. When a node goes down, I want both to run on the remaining node until the original node comes back up, at which time they should split the nodes again. However, when returning to the original node the crontab that is being moved must wait until the underlying FS mount is done on the original node before jumping. DRBD, dlm, the filesystem mounts and the Web Server are all working as expected; when I mark the second node as standby Apache stops, the FS unmounts, dlm stops, and DRBD stops on the node; and when I mark that same node unstandby the reverse happens as expected. All three of those are cloned resources. The crontab resources are not cloned and create symlinks, one resource preferring the first node and the other preferring the second. Each is colocated and order dependent on the filesystem mounts (which in turn are colocated and dependent on dlm, which in turn is colocated and dependent on DRBD promotion). I thought this would be sufficient, but when the original node is marked unstandby the crontab that prefers to be on that node attempts to jump over immediately before the FS is mounted on that node. Of course the crontab link fails because the underlying filesystem hasn't been mounted yet. pcs version is 0.9.162. Here's the obfuscated detailed list of commands for the config. I'm still trying to set it up so it's not production-ready yet, but want to get this much sorted before I add too much more. # pcs config export pcs-commands #!/usr/bin/sh # sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0 # invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands'] # targeting system: ('linux', 'centos', '7.5.1804', 'Core') # using interpreter: CPython 2.7.5 pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty pcs cluster setup --name MyCluster \ node1.mydomain.com node2.mydomain.com --transport udpu pcs cluster start --all --wait=60 pcs cluster cib tmp-cib.xml cp tmp-cib.xml tmp-cib.xml.deltasrc pcs -f tmp-cib.xml property set stonith-enabled=false pcs -f tmp-cib.xml property set no-quorum-policy=freeze pcs -f tmp-cib.xml resource defaults resource-stickiness=100 pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \ op demote interval=0s timeout=90 monitor interval=60s notify interval=0s \ timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ start interval=0s timeout=240 stop interval=0s timeout=100 pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \ allow_stonith_disabled=1 \ op monitor interval=60s start interval=0s timeout=90 stop interval=0s \ timeout=100 pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \ device=/dev/drbd1 directory=/var/www fstype=gfs2 \ options=_netdev,nodiratime,noatime \ op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \ interval=0s timeout=120s stop interval=0s timeout=120s pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \ configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status \ op monitor interval=1min start interval=0s timeout=40s stop interval=0s \ timeout=60s pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \ link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 meta resource-stickiness=0 pcs -f tmp-cib.xml \ resource create SecondaryUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 meta resource-stickiness=0 pcs -f tmp-cib.xml \ resource clone dlm clone-max=2 clone-node-max=1 interleave=true pcs -f tmp-cib.xml resource clone WWWMount interleave=true pcs -f tmp-cib.xml resource clone WebServer interleave=true pcs -f tmp-cib.xml resource clo
Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby
On 9/11/2018 1:59 AM, Andrei Borzenkov wrote: 07.09.2018 23:07, Dan Ragle пишет: On an active-active two node cluster with DRBD, dlm, filesystem mounts, a Web Server, and some crons I can't figure out how to have the crons jump from node to node in the correct order. Specifically, I have two crontabs (managed via symlink creation/deletion) which normally will run one on node1 and the other on node2. When a node goes down, I want both to run on the remaining node until the original node comes back up, at which time they should split the nodes again. However, when returning to the original node the crontab that is being moved must wait until the underlying FS mount is done on the original node before jumping. DRBD, dlm, the filesystem mounts and the Web Server are all working as expected; when I mark the second node as standby Apache stops, the FS unmounts, dlm stops, and DRBD stops on the node; and when I mark that same node unstandby the reverse happens as expected. All three of those are cloned resources. The crontab resources are not cloned and create symlinks, one resource preferring the first node and the other preferring the second. Each is colocated and order dependent on the filesystem mounts (which in turn are colocated and dependent on dlm, which in turn is colocated and dependent on DRBD promotion). I thought this would be sufficient, but when the original node is marked unstandby the crontab that prefers to be on that node attempts to jump over immediately before the FS is mounted on that node. Of course the crontab link fails because the underlying filesystem hasn't been mounted yet. pcs version is 0.9.162. Here's the obfuscated detailed list of commands for the config. I'm still trying to set it up so it's not production-ready yet, but want to get this much sorted before I add too much more. # pcs config export pcs-commands #!/usr/bin/sh # sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0 # invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands'] # targeting system: ('linux', 'centos', '7.5.1804', 'Core') # using interpreter: CPython 2.7.5 pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty pcs cluster setup --name MyCluster \ node1.mydomain.com node2.mydomain.com --transport udpu pcs cluster start --all --wait=60 pcs cluster cib tmp-cib.xml cp tmp-cib.xml tmp-cib.xml.deltasrc pcs -f tmp-cib.xml property set stonith-enabled=false pcs -f tmp-cib.xml property set no-quorum-policy=freeze pcs -f tmp-cib.xml resource defaults resource-stickiness=100 pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \ op demote interval=0s timeout=90 monitor interval=60s notify interval=0s \ timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ start interval=0s timeout=240 stop interval=0s timeout=100 pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \ allow_stonith_disabled=1 \ op monitor interval=60s start interval=0s timeout=90 stop interval=0s \ timeout=100 pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \ device=/dev/drbd1 directory=/var/www fstype=gfs2 \ options=_netdev,nodiratime,noatime \ op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \ interval=0s timeout=120s stop interval=0s timeout=120s pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \ configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status \ op monitor interval=1min start interval=0s timeout=40s stop interval=0s \ timeout=60s pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \ link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 meta resource-stickiness=0 pcs -f tmp-cib.xml \ resource create SecondaryUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 meta resource-stickiness=0 pcs -f tmp-cib.xml \ resource clone dlm clone-max=2 clone-node-max=1 interleave=true pcs -f tmp-cib.xml resource clone WWWMount interleave=true pcs -f tmp-cib.xml resource clone WebServer interleave=true pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true pcs -f tm
[ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby
On an active-active two node cluster with DRBD, dlm, filesystem mounts, a Web Server, and some crons I can't figure out how to have the crons jump from node to node in the correct order. Specifically, I have two crontabs (managed via symlink creation/deletion) which normally will run one on node1 and the other on node2. When a node goes down, I want both to run on the remaining node until the original node comes back up, at which time they should split the nodes again. However, when returning to the original node the crontab that is being moved must wait until the underlying FS mount is done on the original node before jumping. DRBD, dlm, the filesystem mounts and the Web Server are all working as expected; when I mark the second node as standby Apache stops, the FS unmounts, dlm stops, and DRBD stops on the node; and when I mark that same node unstandby the reverse happens as expected. All three of those are cloned resources. The crontab resources are not cloned and create symlinks, one resource preferring the first node and the other preferring the second. Each is colocated and order dependent on the filesystem mounts (which in turn are colocated and dependent on dlm, which in turn is colocated and dependent on DRBD promotion). I thought this would be sufficient, but when the original node is marked unstandby the crontab that prefers to be on that node attempts to jump over immediately before the FS is mounted on that node. Of course the crontab link fails because the underlying filesystem hasn't been mounted yet. pcs version is 0.9.162. Here's the obfuscated detailed list of commands for the config. I'm still trying to set it up so it's not production-ready yet, but want to get this much sorted before I add too much more. # pcs config export pcs-commands #!/usr/bin/sh # sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0 # invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands'] # targeting system: ('linux', 'centos', '7.5.1804', 'Core') # using interpreter: CPython 2.7.5 pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty pcs cluster setup --name MyCluster \ node1.mydomain.com node2.mydomain.com --transport udpu pcs cluster start --all --wait=60 pcs cluster cib tmp-cib.xml cp tmp-cib.xml tmp-cib.xml.deltasrc pcs -f tmp-cib.xml property set stonith-enabled=false pcs -f tmp-cib.xml property set no-quorum-policy=freeze pcs -f tmp-cib.xml resource defaults resource-stickiness=100 pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \ op demote interval=0s timeout=90 monitor interval=60s notify interval=0s \ timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \ start interval=0s timeout=240 stop interval=0s timeout=100 pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \ allow_stonith_disabled=1 \ op monitor interval=60s start interval=0s timeout=90 stop interval=0s \ timeout=100 pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \ device=/dev/drbd1 directory=/var/www fstype=gfs2 \ options=_netdev,nodiratime,noatime \ op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \ interval=0s timeout=120s stop interval=0s timeout=120s pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \ configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status \ op monitor interval=1min start interval=0s timeout=40s stop interval=0s \ timeout=60s pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \ link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 meta resource-stickiness=0 pcs -f tmp-cib.xml \ resource create SecondaryUserCrons ocf:heartbeat:symlink \ link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \ op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \ interval=0s timeout=15 meta resource-stickiness=0 pcs -f tmp-cib.xml \ resource clone dlm clone-max=2 clone-node-max=1 interleave=true pcs -f tmp-cib.xml resource clone WWWMount interleave=true pcs -f tmp-cib.xml resource clone WebServer interleave=true pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true pcs -f tmp-cib.xml resource clone SharedUserCrons interleave=true pcs -f tmp-cib.xml \ resource master DRBDClone DRBD master-node-max=1 clone-max=2 master-max=2 \ interleave=true notify=true clone-node-max=1 pcs -f
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 6/19/2017 5:32 AM, Klaus Wenninger wrote: On 06/16/2017 09:08 PM, Ken Gaillot wrote: On 06/16/2017 01:18 PM, Dan Ragle wrote: On 6/12/2017 10:30 AM, Ken Gaillot wrote: On 06/12/2017 09:23 AM, Klaus Wenninger wrote: On 06/12/2017 04:02 PM, Ken Gaillot wrote: On 06/10/2017 10:53 AM, Dan Ragle wrote: So I guess my bottom line question is: How does one tell Pacemaker that the individual legs of globally unique clones should *always* be spread across the available nodes whenever possible, regardless of the number of processes on any one of the nodes? For kicks I did try: pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY but it responded with an error about an invalid character (:). There isn't a way currently. It will try to do that when initially placing them, but once they've moved together, there's no simple way to tell them to move. I suppose a workaround might be to create a dummy resource that you constrain to that node so it looks like the other node is less busy. Another ugly dummy resource idea - maybe less fragile - and not tried out: One could have 2 dummy resources that would rather like to live on different nodes - no issue with primitives - and do depend collocated on ClusterIP. Wouldn't that pull them apart once possible? Sounds like a good idea H... still no luck with this. Based on your suggestion, I thought this would work (leaving out all the status displays this time): # pcs resource create Test1 systemd:test1 # pcs resource create Test2 systemd:test2 # pcs constraint location Test1 prefers node1-pcs=INFINITY # pcs constraint location Test2 prefers node1-pcs=INFINITY # pcs resource create Test3 systemd:test3 # pcs resource create Test4 systemd:test4 # pcs constraint location Test3 prefers node1-pcs=INFINITY # pcs constraint location Test4 prefers node2-pcs=INFINITY # pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=162.220.75.138 nic=bond0 cidr_netmask=24 # pcs resource meta ClusterIP resource-stickiness=0 # pcs resource clone ClusterIP clone-max=2 clone-node-max=2 globally-unique=true # pcs constraint colocation add ClusterIP-clone with Test3 INFINITY # pcs constraint colocation add ClusterIP-clone with Test4 INFINITY What I had meant was the other way round. So that trying to have both Test3 and Test4 running pacemaker would have to have instances of ClusterIP running on both nodes but they wouldn't depend on Test3 and Test4. Klaus, so did you mean: # pcs constraint colocation add Test3 with ClusterIP-clone INFINITY # pcs constraint colocation add Test4 with ClusterIP-clone INFINITY ? I actually did try that (with the rest of the recipe the same) and ended up with the same problem I started with. Immediately after setup both clone instances were on node2. After standby/unstandby of node2 they (the clones) did in fact split; but if I then followed that with a standby/unstandby of node 1 they both remained on node 2. Dan But that simply refuses to run ClusterIP at all ("Resource ClusterIP:0/1 cannot run anywhere"). And if I change the last two colocation constraints to a numeric then it runs, but with the same problem I had before (both ClusterIP instances on one node). I also tried it reversing the colocation definition (add Test3 with ClusterIP-clone) and trying differing combinations of scores between the location and colocation constraints, still with no luck. Thanks, Dan Ah of course, the colocation with both means they all have to run on the same node, which is impossible. FYI you can create dummy resources with ocf:pacemaker:Dummy so you don't have to write your own agents. OK, this is getting even hackier, but I'm thinking you can use utilization for this: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683960632560 * Create two dummy resources, each with a -INFINITY location preference for one of the nodes, so each is allowed to run on only one node. * Set the priority meta-attribute to a positive number on all your real resources, and leave the dummies at 0 (so if the cluster can't run all of them, it will stop the dummies first). * Set placement-strategy=utilization. * Define a utilization attribute, with values for each node and resource like this: ** Set a utilization of 1 on all resources except the dummies and the clone, so that their total utilization is N. ** Set a utilization of 100 on the dummies and the clone. ** Set a utilization capacity of 200 + N on each node. (I'm assuming you never expect to have more than 99 other resources. If that's not the case, just raise the 100 usage accordingly.) With those values, if only one node is up, that node can host all the real resources (including both clone instances), with the dummies stopped. If both nodes are up, the only way the cluster can run all resources (including the clone instances and dummies) is to spread the clone instances out.
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 6/16/2017 3:08 PM, Ken Gaillot wrote: On 06/16/2017 01:18 PM, Dan Ragle wrote: On 6/12/2017 10:30 AM, Ken Gaillot wrote: On 06/12/2017 09:23 AM, Klaus Wenninger wrote: On 06/12/2017 04:02 PM, Ken Gaillot wrote: On 06/10/2017 10:53 AM, Dan Ragle wrote: So I guess my bottom line question is: How does one tell Pacemaker that the individual legs of globally unique clones should *always* be spread across the available nodes whenever possible, regardless of the number of processes on any one of the nodes? For kicks I did try: pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY but it responded with an error about an invalid character (:). There isn't a way currently. It will try to do that when initially placing them, but once they've moved together, there's no simple way to tell them to move. I suppose a workaround might be to create a dummy resource that you constrain to that node so it looks like the other node is less busy. Another ugly dummy resource idea - maybe less fragile - and not tried out: One could have 2 dummy resources that would rather like to live on different nodes - no issue with primitives - and do depend collocated on ClusterIP. Wouldn't that pull them apart once possible? Sounds like a good idea H... still no luck with this. Based on your suggestion, I thought this would work (leaving out all the status displays this time): # pcs resource create Test1 systemd:test1 # pcs resource create Test2 systemd:test2 # pcs constraint location Test1 prefers node1-pcs=INFINITY # pcs constraint location Test2 prefers node1-pcs=INFINITY # pcs resource create Test3 systemd:test3 # pcs resource create Test4 systemd:test4 # pcs constraint location Test3 prefers node1-pcs=INFINITY # pcs constraint location Test4 prefers node2-pcs=INFINITY # pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=162.220.75.138 nic=bond0 cidr_netmask=24 # pcs resource meta ClusterIP resource-stickiness=0 # pcs resource clone ClusterIP clone-max=2 clone-node-max=2 globally-unique=true # pcs constraint colocation add ClusterIP-clone with Test3 INFINITY # pcs constraint colocation add ClusterIP-clone with Test4 INFINITY But that simply refuses to run ClusterIP at all ("Resource ClusterIP:0/1 cannot run anywhere"). And if I change the last two colocation constraints to a numeric then it runs, but with the same problem I had before (both ClusterIP instances on one node). I also tried it reversing the colocation definition (add Test3 with ClusterIP-clone) and trying differing combinations of scores between the location and colocation constraints, still with no luck. Thanks, Dan Ah of course, the colocation with both means they all have to run on the same node, which is impossible. FYI you can create dummy resources with ocf:pacemaker:Dummy so you don't have to write your own agents. Good to know, thanks. OK, this is getting even hackier, but I'm thinking you can use utilization for this: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683960632560 * Create two dummy resources, each with a -INFINITY location preference for one of the nodes, so each is allowed to run on only one node. * Set the priority meta-attribute to a positive number on all your real resources, and leave the dummies at 0 (so if the cluster can't run all of them, it will stop the dummies first). * Set placement-strategy=utilization. * Define a utilization attribute, with values for each node and resource like this: ** Set a utilization of 1 on all resources except the dummies and the clone, so that their total utilization is N. ** Set a utilization of 100 on the dummies and the clone. ** Set a utilization capacity of 200 + N on each node. (I'm assuming you never expect to have more than 99 other resources. If that's not the case, just raise the 100 usage accordingly.) With those values, if only one node is up, that node can host all the real resources (including both clone instances), with the dummies stopped. If both nodes are up, the only way the cluster can run all resources (including the clone instances and dummies) is to spread the clone instances out. Again, it's hacky, and I haven't tested it, but I think it would work. Interesting. That does seem to work, at least in my reduction; I've not yet tried it in my actual real-world setup yet. A few notes, though: 1. I had to set placement-strategy=balanced. When set to utilization the IP clones still would not split following a standby/unstandby of one of the nodes. 2. I still had to remember to have resource-stickiness=0 on the ClusterIP primitives. Without it, after standby/unstandby the clones still both preferred to stay where they were, with one of the dummies running on the other node and the second dummy stopped. 3. Rather than set the priority on the "real" resources to 1, I set the priority on the
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 6/12/2017 10:30 AM, Ken Gaillot wrote: On 06/12/2017 09:23 AM, Klaus Wenninger wrote: On 06/12/2017 04:02 PM, Ken Gaillot wrote: On 06/10/2017 10:53 AM, Dan Ragle wrote: So I guess my bottom line question is: How does one tell Pacemaker that the individual legs of globally unique clones should *always* be spread across the available nodes whenever possible, regardless of the number of processes on any one of the nodes? For kicks I did try: pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY but it responded with an error about an invalid character (:). There isn't a way currently. It will try to do that when initially placing them, but once they've moved together, there's no simple way to tell them to move. I suppose a workaround might be to create a dummy resource that you constrain to that node so it looks like the other node is less busy. Another ugly dummy resource idea - maybe less fragile - and not tried out: One could have 2 dummy resources that would rather like to live on different nodes - no issue with primitives - and do depend collocated on ClusterIP. Wouldn't that pull them apart once possible? Sounds like a good idea H... still no luck with this. Based on your suggestion, I thought this would work (leaving out all the status displays this time): # pcs resource create Test1 systemd:test1 # pcs resource create Test2 systemd:test2 # pcs constraint location Test1 prefers node1-pcs=INFINITY # pcs constraint location Test2 prefers node1-pcs=INFINITY # pcs resource create Test3 systemd:test3 # pcs resource create Test4 systemd:test4 # pcs constraint location Test3 prefers node1-pcs=INFINITY # pcs constraint location Test4 prefers node2-pcs=INFINITY # pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=162.220.75.138 nic=bond0 cidr_netmask=24 # pcs resource meta ClusterIP resource-stickiness=0 # pcs resource clone ClusterIP clone-max=2 clone-node-max=2 globally-unique=true # pcs constraint colocation add ClusterIP-clone with Test3 INFINITY # pcs constraint colocation add ClusterIP-clone with Test4 INFINITY But that simply refuses to run ClusterIP at all ("Resource ClusterIP:0/1 cannot run anywhere"). And if I change the last two colocation constraints to a numeric then it runs, but with the same problem I had before (both ClusterIP instances on one node). I also tried it reversing the colocation definition (add Test3 with ClusterIP-clone) and trying differing combinations of scores between the location and colocation constraints, still with no luck. Thanks, Dan ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 6/12/2017 2:03 AM, Klaus Wenninger wrote: On 06/10/2017 05:53 PM, Dan Ragle wrote: On 5/25/2017 5:33 PM, Ken Gaillot wrote: On 05/24/2017 12:27 PM, Dan Ragle wrote: I suspect this has been asked before and apologize if so, a google search didn't seem to find anything that was helpful to me ... I'm setting up an active/active two-node cluster and am having an issue where one of my two defined clusterIPs will not return to the other node after it (the other node) has been recovered. I'm running on CentOS 7.3. My resource setups look like this: # cibadmin -Q|grep dc-version # pcs resource show PublicIP-clone Clone: PublicIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PublicIP-start-interval-0s) stop interval=0s timeout=20s (PublicIP-stop-interval-0s) monitor interval=30s (PublicIP-monitor-interval-30s) # pcs resource show PrivateIP-clone Clone: PrivateIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PrivateIP-start-interval-0s) stop interval=0s timeout=20s (PrivateIP-stop-interval-0s) monitor interval=10s timeout=20s (PrivateIP-monitor-interval-10s) # pcs constraint --full | grep -i publicip start WEB-clone then start PublicIP-clone (kind:Mandatory) (id:order-WEB-clone-PublicIP-clone-mandatory) # pcs constraint --full | grep -i privateip start WEB-clone then start PrivateIP-clone (kind:Mandatory) (id:order-WEB-clone-PrivateIP-clone-mandatory) FYI These constraints cover ordering only. If you also want to be sure that the IPs only start on a node where the web service is functional, then you also need colocation constraints. When I first create the resources, they split across the two nodes as expected/desired: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] I then put the second node in standby: # pcs node standby node2-pcs And the IPs both jump to node1 as expected: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs ] Stopped: [ node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Then unstandby the second node: # pcs node unstandby node2-pcs The publicIP goes back, but the private does not: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Anybody see what I'm doing wrong? I'm not seeing anything in the logs to indicate that it tries node2 and then fails; but I'm fairly new to the software so it's possible I'm not looking in the right place. The pcs status would show any failed actions, and anything important in the logs would start with "error:" or "warning:". At any given time, one of the nodes is the DC, meaning it schedules actions for the whole cluster. That node will have more "pengine:" messages in its logs at the time. You can check those logs to see what decisions were made, as well as a "saving inputs" message to get the cluster state that was used to make those decisions. There is a crm_simulate tool that you can run on that file to get more information. By default, pacemaker will try to balance the number of resources running on each node, so I'm not sure why in this case node1 has four resources and node2 has two. crm_simulate might help explain it. However, there's nothing here telling pacemaker that the instance
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 5/25/2017 5:33 PM, Ken Gaillot wrote: On 05/24/2017 12:27 PM, Dan Ragle wrote: I suspect this has been asked before and apologize if so, a google search didn't seem to find anything that was helpful to me ... I'm setting up an active/active two-node cluster and am having an issue where one of my two defined clusterIPs will not return to the other node after it (the other node) has been recovered. I'm running on CentOS 7.3. My resource setups look like this: # cibadmin -Q|grep dc-version # pcs resource show PublicIP-clone Clone: PublicIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PublicIP-start-interval-0s) stop interval=0s timeout=20s (PublicIP-stop-interval-0s) monitor interval=30s (PublicIP-monitor-interval-30s) # pcs resource show PrivateIP-clone Clone: PrivateIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PrivateIP-start-interval-0s) stop interval=0s timeout=20s (PrivateIP-stop-interval-0s) monitor interval=10s timeout=20s (PrivateIP-monitor-interval-10s) # pcs constraint --full | grep -i publicip start WEB-clone then start PublicIP-clone (kind:Mandatory) (id:order-WEB-clone-PublicIP-clone-mandatory) # pcs constraint --full | grep -i privateip start WEB-clone then start PrivateIP-clone (kind:Mandatory) (id:order-WEB-clone-PrivateIP-clone-mandatory) FYI These constraints cover ordering only. If you also want to be sure that the IPs only start on a node where the web service is functional, then you also need colocation constraints. When I first create the resources, they split across the two nodes as expected/desired: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] I then put the second node in standby: # pcs node standby node2-pcs And the IPs both jump to node1 as expected: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs ] Stopped: [ node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Then unstandby the second node: # pcs node unstandby node2-pcs The publicIP goes back, but the private does not: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Anybody see what I'm doing wrong? I'm not seeing anything in the logs to indicate that it tries node2 and then fails; but I'm fairly new to the software so it's possible I'm not looking in the right place. The pcs status would show any failed actions, and anything important in the logs would start with "error:" or "warning:". At any given time, one of the nodes is the DC, meaning it schedules actions for the whole cluster. That node will have more "pengine:" messages in its logs at the time. You can check those logs to see what decisions were made, as well as a "saving inputs" message to get the cluster state that was used to make those decisions. There is a crm_simulate tool that you can run on that file to get more information. By default, pacemaker will try to balance the number of resources running on each node, so I'm not sure why in this case node1 has four resources and node2 has two. crm_simulate might help explain it. However, there's nothing here telling pacemaker that the instances of PrivateIP should run on different nodes when possible. With your existing constraints, pace
[ClusterLabs] ClusterIP won't return to recovered node
I suspect this has been asked before and apologize if so, a google search didn't seem to find anything that was helpful to me ... I'm setting up an active/active two-node cluster and am having an issue where one of my two defined clusterIPs will not return to the other node after it (the other node) has been recovered. I'm running on CentOS 7.3. My resource setups look like this: # cibadmin -Q|grep dc-version value="1.1.15-11.el7_3.4-e174ec8"/> # pcs resource show PublicIP-clone Clone: PublicIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PublicIP-start-interval-0s) stop interval=0s timeout=20s (PublicIP-stop-interval-0s) monitor interval=30s (PublicIP-monitor-interval-30s) # pcs resource show PrivateIP-clone Clone: PrivateIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PrivateIP-start-interval-0s) stop interval=0s timeout=20s (PrivateIP-stop-interval-0s) monitor interval=10s timeout=20s (PrivateIP-monitor-interval-10s) # pcs constraint --full | grep -i publicip start WEB-clone then start PublicIP-clone (kind:Mandatory) (id:order-WEB-clone-PublicIP-clone-mandatory) # pcs constraint --full | grep -i privateip start WEB-clone then start PrivateIP-clone (kind:Mandatory) (id:order-WEB-clone-PrivateIP-clone-mandatory) When I first create the resources, they split across the two nodes as expected/desired: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] I then put the second node in standby: # pcs node standby node2-pcs And the IPs both jump to node1 as expected: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs ] Stopped: [ node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Then unstandby the second node: # pcs node unstandby node2-pcs The publicIP goes back, but the private does not: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Anybody see what I'm doing wrong? I'm not seeing anything in the logs to indicate that it tries node2 and then fails; but I'm fairly new to the software so it's possible I'm not looking in the right place. Also, I noticed when putting a node in standby the main NIC appears to be interrupted momentarily (long enough for my SSH session, which is connected via the permanent IP on the NIC and not the clusterIP, to be dropped). Is there any way to avoid this? I was thinking that the cluster operations would only affect the ClusteIP and not the other IPs being served on that NIC. Thanks! Dan ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] where do I find the null fencing device?
I wanted to do some experiments, and the null fencing agent seemed to be just what I wanted. I don't find it anywhere, even after installing fence-agents-all and cluster-glue (this is on CentOS 7, btw...) Thanks... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Location constraints for fencing resources?
On 2016-09-13 00:20, Klaus Wenninger wrote: Location-constraints for fencing-resources are definitely supported and don't just work by accident - if this was the question. On 09/13/2016 02:43 AM, Dan Swartzendruber wrote: On 2016-09-12 10:48, Dan Swartzendruber wrote: Posting this as a separate thread from my fence_apc one. As I said in that thread, I created two fence_apc agents, one to fence node A and one to fence node B. Each was configured using a static pcmk node mapping, and constrained to only run on the other node. In the process of testing this, I discovered a bug (feature?) in the LCMC GUI I am using to manage the cluster. To wit: when I click on a fence object, it never seems to fetch the resource constraints (e.g. they show in the GUI as "nothing selected"), so if I changes something (say the power_wait timeout) and then click "Apply", the location constraints are deleted from that fencing resource. I also noticed that if I connect to port 2224 (pcsd), regular resources show me the location constraints, whereas fence resources don't, which is making me wonder if this is not supported? I'm thinking I can set up a pcmk_host_map to tell it which APC outlet manages node A and which manages node B, in which case I can just use one fence_apc resource with a dynamic pcmk host list? Thanks! Okay, this *seems* to work. e.g. pcmk_host_list has the two hosts. pmck_host_map says nas1:8;nas2:2. The fencing agent was running on nas2. I logged in to nas2 and did 'systemctl stop network'. pacemaker moved the fencing resource to nas1, then power-cycled nas2. Looking good... Basically, yes. I was puzzled, since the web GUI pcsd serves up gives no apparent way to edit location constraints for stonith resources, and LCMC apparently doesn't read them from the config, so if you edit a stonith resource and change anything, then click on "Apply", it nukes any change you *had* made. For the latter, I will open a ticket with the lcmc developer(s). Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Location constraints for fencing resources?
On 2016-09-12 10:48, Dan Swartzendruber wrote: Posting this as a separate thread from my fence_apc one. As I said in that thread, I created two fence_apc agents, one to fence node A and one to fence node B. Each was configured using a static pcmk node mapping, and constrained to only run on the other node. In the process of testing this, I discovered a bug (feature?) in the LCMC GUI I am using to manage the cluster. To wit: when I click on a fence object, it never seems to fetch the resource constraints (e.g. they show in the GUI as "nothing selected"), so if I changes something (say the power_wait timeout) and then click "Apply", the location constraints are deleted from that fencing resource. I also noticed that if I connect to port 2224 (pcsd), regular resources show me the location constraints, whereas fence resources don't, which is making me wonder if this is not supported? I'm thinking I can set up a pcmk_host_map to tell it which APC outlet manages node A and which manages node B, in which case I can just use one fence_apc resource with a dynamic pcmk host list? Thanks! Okay, this *seems* to work. e.g. pcmk_host_list has the two hosts. pmck_host_map says nas1:8;nas2:2. The fencing agent was running on nas2. I logged in to nas2 and did 'systemctl stop network'. pacemaker moved the fencing resource to nas1, then power-cycled nas2. Looking good... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Location constraints for fencing resources?
Posting this as a separate thread from my fence_apc one. As I said in that thread, I created two fence_apc agents, one to fence node A and one to fence node B. Each was configured using a static pcmk node mapping, and constrained to only run on the other node. In the process of testing this, I discovered a bug (feature?) in the LCMC GUI I am using to manage the cluster. To wit: when I click on a fence object, it never seems to fetch the resource constraints (e.g. they show in the GUI as "nothing selected"), so if I changes something (say the power_wait timeout) and then click "Apply", the location constraints are deleted from that fencing resource. I also noticed that if I connect to port 2224 (pcsd), regular resources show me the location constraints, whereas fence resources don't, which is making me wonder if this is not supported? I'm thinking I can set up a pcmk_host_map to tell it which APC outlet manages node A and which manages node B, in which case I can just use one fence_apc resource with a dynamic pcmk host list? Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_apc delay?
On 2016-09-06 10:59, Ken Gaillot wrote: On 09/05/2016 09:38 AM, Marek Grac wrote: Hi, [snip] FYI, no special configuration is needed for this with recent pacemaker versions. If multiple devices are listed in a topology level, pacemaker will automatically convert reboot requests into all-off-then-all-on. Hmmm, thinking about this some more, this just puts me back in the current situation (e.g. having an 'extra' delay.) The issue for me would be having two fencing devices, each of which needs a brief delay to let its target's PS drain. If a single PDU fencing agent does this (with proposed change): power-off wait N seconds power-on that is cool. Unfortunately, with the all-off-then-all-on pacemaker would do, I would get this: power-off node A wait N seconds power-off node B wait N seconds power-on node A power-on node B or am I missing something? If not, seems like it would be nice to have some sort of delay at the pacemaker level. e.g. tell pacemaker to convert a reboot of node A into a 'turn off node A, wait N seconds, turn on node A'? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_apc delay?
On 2016-09-06 10:59, Ken Gaillot wrote: [snip] I thought power-wait was intended for this situation, where the node's power supply can survive a brief outage, so a delay is needed to ensure it drains. In any case, I know people are using it for that. Are there any drawbacks to using power-wait for this purpose, even if that wasn't its original intent? Is it just that the "on" will get the delay as well? I can't speak to the first part of your question, but for me the second part is a definite YES. The issue is that I want a long enough delay to be sure the host is D E A D and not writing to the pool anymore; but that delay is now multiplied by 2, and if it gets "too long", vsphere guests can start getting disk I/O errors... *) Configure fence device to not use reboot but OFF, ON Very same to the situation when there are multiple power circuits; you have to switch them all OFF and afterwards turn them ON. FYI, no special configuration is needed for this with recent pacemaker versions. If multiple devices are listed in a topology level, pacemaker will automatically convert reboot requests into all-off-then-all-on. My understanding was that applied to 1.1.14? My CentOS 7 host has pacemaker 1.1.13 :( [snip] ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: fence_apc delay?
On 2016-09-05 03:04, Ulrich Windl wrote: Marek Grac schrieb am 03.09.2016 um 14:41 in Nachricht : Hi, There are two problems mentioned in the email. 1) power-wait Power-wait is a quite advanced option and there are only few fence devices/agent where it makes sense. And only because the HW/firmware on the device is somewhat broken. Basically, when we execute power ON/OFF operation, we wait for power-wait seconds before we send next command. I don't remember any issue with APC and this kind of problems. 2) the only theory I could come up with was that maybe the fencing operation was considered complete too quickly? That is virtually not possible. Even when power ON/OFF is asynchronous, we test status of device and fence agent wait until status of the plug/VM/... matches what user wants. I can imagine that a powerful power supply can deliver up to one second of power even if the mains is disconnected. If the cluster is very quick after fencing, it might be a problem. I'd suggest a 5 to 10 second delay between fencing action and cluster reaction. Ulrich, please see the response I just posted to Marek. Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_apc delay?
On 2016-09-03 08:41, Marek Grac wrote: Hi, There are two problems mentioned in the email. 1) power-wait Power-wait is a quite advanced option and there are only few fence devices/agent where it makes sense. And only because the HW/firmware on the device is somewhat broken. Basically, when we execute power ON/OFF operation, we wait for power-wait seconds before we send next command. I don't remember any issue with APC and this kind of problems. 2) the only theory I could come up with was that maybe the fencing operation was considered complete too quickly? That is virtually not possible. Even when power ON/OFF is asynchronous, we test status of device and fence agent wait until status of the plug/VM/... matches what user wants. I think you misunderstood my point (possibly I wasn't clear.) Not saying anything is wrong with either the fencing agent or the PDU, rather, my theory is that if the agent flips the power off, then back on, if the interval it is off is 'too short', possibly a host like the R905 can continue to operate for a couple of seconds, continuing to write data to the disks past the point where the other node begins to do likewise. If power_wait is not the right way to wait, say, 10 seconds to make 100% sure node A is dead as a doornail, what *is* the right way? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_apc delay?
On 2016-09-02 10:09, Ken Gaillot wrote: On 09/02/2016 08:14 AM, Dan Swartzendruber wrote: So, I was testing my ZFS dual-head JBOD 2-node cluster. Manual failovers worked just fine. I then went to try an acid-test by logging in to node A and doing 'systemctl stop network'. Sure enough, pacemaker told the APC fencing agent to power-cycle node A. The ZFS pool moved to node B as expected. As soon as node A was back up, I migrated the pool/IP back to node A. I *thought* all was okay, until a bit later, I did 'zpool status', and saw checksum errors on both sides of several of the vdevs. After much digging and poking, the only theory I could come up with was that maybe the fencing operation was considered complete too quickly? I googled for examples using this, and the best tutorial I found showed using a power-wait=5, whereas the default seems to be power-wait=0? (this is CentOS 7, btw...) I changed it to use 5 instead That's a reasonable theory -- that's why power_wait is available. It would be nice if there were a page collecting users' experience with the ideal power_wait for various devices. Even better if fence-agents used those values as the defaults. Ken, thanks. FWIW, this is a Dell Poweredge R905. I have no idea how long the power supplies in that thing can keep things going when A/C goes away. Always wary of small sample sizes, but I got filesystem corruption after 1 fencing event with power_wait=0, and none after 3 fencing events with power_wait=5. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_apc delay?
It occurred to me folks reading this might not have any knowledge about ZFS. Think of my setup as an mdraid pool with a filesystem mounted on it, shared out via NFS. Same basic idea... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] fence_apc delay?
So, I was testing my ZFS dual-head JBOD 2-node cluster. Manual failovers worked just fine. I then went to try an acid-test by logging in to node A and doing 'systemctl stop network'. Sure enough, pacemaker told the APC fencing agent to power-cycle node A. The ZFS pool moved to node B as expected. As soon as node A was back up, I migrated the pool/IP back to node A. I *thought* all was okay, until a bit later, I did 'zpool status', and saw checksum errors on both sides of several of the vdevs. After much digging and poking, the only theory I could come up with was that maybe the fencing operation was considered complete too quickly? I googled for examples using this, and the best tutorial I found showed using a power-wait=5, whereas the default seems to be power-wait=0? (this is CentOS 7, btw...) I changed it to use 5 instead of 0, and did a several fencing operations while a guest VM (vsphere via NFS) was writing to the pool. So far, no evidence of corruption. BTW, the way I was creating and managing the cluster was with the lcmc java gui. Possibly the power-wait default of 0 comes from there, I can't really tell. Any thoughts or ideas appreciated :) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Solved: converting configuration
On 2016-08-25 10:24, Gabriele Bulfon wrote: YESSS!!! That was it! :))) Upgraded to 1.1.15, rebuilt and the rng files contain a lot more stuff. Packaged, published, installed on the test machine: got all my instructions as is!!! :))) ...now last stepsmaking our custom agents/shells work on this new setup ;) For example, what does the yellow "stopped" state mean? Here is the last rows of crm output after my config instructions : Full list of resources: xstorage1-stonith (stonith:external/ssh-sonicle): Stopped xstorage2-stonith (stonith:external/ssh-sonicle): Stopped I believe you will see this if the cluster is in maintenance mode or stonith is disabled. Possibly other reasons, but these I have seen... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabs] Failing over NFSv4/TCP exports
Thanks for the info. I only use esxi, which likely explains why I never had issues... Patrick Zwahlen wrote: >Hi, > >> -Original Message- >> From: Andreas Kurz [mailto:andreas.k...@gmail.com] >> Sent: mercredi, 17 août 2016 23:16 >> To: Cluster Labs - All topics related to open-source clustering welcomed >> >> Subject: Re: [ClusterLabs] Failing over NFSv4/TCP exports >> >> This is a known problem ... have a look into the portblock RA - it has >> the feature to send out TCP tickle ACKs to reset such hanging sessions. >> So you can configure a portblock resource that blocks the tcp port >> before starting the VIP and another portblock resource that unblocks the >> port afterwards and sends out that tickle ACKs. > >Thanks Andreas for pointing me to the portblock RA. I wasn't aware of it and >will read/test. > >I also made some further testing using ESXi and I found out that the ESXi NFS >client behaves in a completely different way when compared to the Linux client >and at first sight it actually seems to work (where the Linux client fails). > >It's mainly due to 2 things: > >1) Their NFS client is much more aggressive in terms of monitoring the server >and restarting sessions. > >2) Every new TCP session comes from a different source port compared to the >Linux client which seems to stick to a single source port. This actually >solves the issue of failing back to a node with FIN_WAIT1 sessions. > >Regards, Patrick > >** >This email and any files transmitted with it are confidential and >intended solely for the use of the individual or entity to whom they >are addressed. If you have received this email in error please notify >the system manager. "postmas...@navixia.com" Navixia SA >** >___ >Users mailing list: Users@clusterlabs.org >http://clusterlabs.org/mailman/listinfo/users > >Project Home: http://www.clusterlabs.org >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-06 21:59, Digimer wrote: On 06/08/16 08:22 PM, Dan Swartzendruber wrote: On 2016-08-06 19:46, Digimer wrote: On 06/08/16 07:33 PM, Dan Swartzendruber wrote: (snip) What about using ipmitool directly? I can't imagine that such a long time is normal. Maybe there is a firmware update for the DRAC and/or BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and BIOS together). Over a minute to fence is, strictly speaking, OK. However, that's a significant delay in time to recover. Okay, I tested with 20 second timeout and 5 retries, using fence_drac5 at the command line. Ran 'date' on both sides to see how long it took. Just under a minute. It's too late now to mess around any more for tonight. I do need to verify that that works okay for vsphere. I will post back my results. Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-06 21:59, Digimer wrote: On 06/08/16 08:22 PM, Dan Swartzendruber wrote: On 2016-08-06 19:46, Digimer wrote: On 06/08/16 07:33 PM, Dan Swartzendruber wrote: (snip) What about using ipmitool directly? I can't imagine that such a long time is normal. Maybe there is a firmware update for the DRAC and/or BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and BIOS together). Unfortunately, the R905 is EoL, so any updates are not likely. Over a minute to fence is, strictly speaking, OK. However, that's a significant delay in time to recover. The thing that concerns me, though, is the delay in I/O for vsphere clients. I know 2 or more retries of 60 seconds caused issues. I'm going to try again with 5 20-second retries, and see how that works. If this doesn't cooperate, I may need to look into an PDU or something... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-06 19:46, Digimer wrote: On 06/08/16 07:33 PM, Dan Swartzendruber wrote: Okay, I almost have this all working. fence_ipmilan for the supermicro host. Had to specify lanplus for it to work. fence_drac5 for the R905. That was failing to complete due to timeout. Found a couple of helpful posts that recommended increase the retry count to 3 and the timeout to 60. That worked also. The only problem now, is that it takes well over a minute to complete the fencing operation. In that interim, the fenced host shows as UNCLEAN (offline), and because the fencing operation hasn't completed, the other node has to wait to import the pool and share out the filesystem. This causes the vsphere hosts to declare the NFS datastore down. I hadn't gotten exact timing, but I think the fencing operation took a little over a minute. I'm wondering if I could change the timeout to a smaller value, but increase the retries? Like back to the default 20 second timeout, but change retries from 1 to 5? Did you try the fence_ipmilan against the DRAC? It *should* work. Would be interesting to see if it had the same issue. Can you check the DRAC's host's power state using ipmitool directly without delay? Yes, I did try fence_ipmilan, but it got the timeout waiting for power off (or whatever). I have to admit, I switched to fence_drac and had the same issue, but after increasing the timeout and retries, got it to work, so it is possible, that fence_ipmilan is okay. They both seemed to take more than 60 seconds to complete the operation. I have to say that when I do a power cycle through the drac web interface, it takes awhile, so that might be normal. I think I will try again with 20 seconds and 5 retries and see how that goes... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
Okay, I almost have this all working. fence_ipmilan for the supermicro host. Had to specify lanplus for it to work. fence_drac5 for the R905. That was failing to complete due to timeout. Found a couple of helpful posts that recommended increase the retry count to 3 and the timeout to 60. That worked also. The only problem now, is that it takes well over a minute to complete the fencing operation. In that interim, the fenced host shows as UNCLEAN (offline), and because the fencing operation hasn't completed, the other node has to wait to import the pool and share out the filesystem. This causes the vsphere hosts to declare the NFS datastore down. I hadn't gotten exact timing, but I think the fencing operation took a little over a minute. I'm wondering if I could change the timeout to a smaller value, but increase the retries? Like back to the default 20 second timeout, but change retries from 1 to 5? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
A lot of good suggestions here. Unfortunately, my budget is tapped out for the near future at least (this is a home lab/soho setup). I'm inclined to go with Digimer's two-node approach, with IPMI fencing. I understand mobos can die and such. In such a long-shot, manual intervention is fine. So, when I get a chance, I need to remove the quorum node from the cluster and switch it to two_node mode. Thanks for the info! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-04 19:33, Digimer wrote: On 04/08/16 07:21 PM, Dan Swartzendruber wrote: On 2016-08-04 19:03, Digimer wrote: On 04/08/16 06:56 PM, Dan Swartzendruber wrote: I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it depends on a ZFS resource agent which imports or exports a pool. So far, with stonith disabled, it all works perfectly. I was dubious about a 2-node solution, so I created a 3rd node which runs as a virtual machine on one of the hosts. All it is for is quorum. So, looking at fencing next. The primary server is a poweredge R905, which has DRAC for fencing. The backup storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using the DRAC agent for the former and the ipmilan for the latter? I was reading about location constraints, where you tell each instance of the fencing agent not to run on the node that would be getting fenced. So, my first thought was to configure the drac agent and tell it not to fence node 1, and configure the ipmilan agent and tell it not to fence node 2. The thing is, there is no agent available for the quorum node. Would it make more sense instead to tell the drac agent to only run on node 2, and the ipmilan agent to only run on node 1? Thanks! This is a common mistake. Fencing and quorum solve different problems and are not interchangeable. In short; Fencing is a tool when things go wrong. Quorum is a tool when things are working. The only impact that having quorum has with regard to fencing is that it avoids a scenario when both nodes try to fence each other and the faster one wins (which is itself OK). Even then, you can add 'delay=15' the node you want to win and it will win is such a case. In the old days, it would also prevent a fence loop if you started the cluster on boot and comms were down. Now though, you set 'wait_for_all' and you won't get a fence loop, so that solves that. Said another way; Quorum is optional, fencing is not (people often get that backwards). As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence action; rebooting the node, works via the basic IPMI standard using the DRAC's BMC. To do proper redundant fencing, which is a great idea, you want something like switched PDUs. This is how we do it (with two node clusters). IPMI first, and if that fails, a pair of PDUs (one for each PSU, each PDU going to independent UPSes) as backup. Thanks for the quick response. I didn't mean to give the impression that I didn't know the different between quorum and fencing. The only reason I (currently) have the quorum node was to prevent a deathmatch (which I had read about elsewhere.) If it is as simple as adding a delay as you describe, I'm inclined to go that route. At least on CentOS7, fence_ipmilan and fence_drac are not the same. e.g. they are both python scripts that are totally different. The delay is perfectly fine. We've shipped dozens of two-node systems over the last five or so years and all were 2-node and none have had trouble. Where node failures have occurred, fencing operated properly and services were recovered. So in my opinion, in the interest of minimizing complexity, I recommend the two-node approach. As for the two agents not being symlinked, OK. It still doesn't change the core point through that both fence_ipmilan and fence_drac would be acting on the same target. Note; If you lose power to the mainboard (which we've seen, failed mainboard voltage regulator did this once), you lose the IPMI (DRAC) BMC. This scenario will leave your cluster blocked without an external secondary fence method, like switched PDUs. cheers Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-04 19:03, Digimer wrote: On 04/08/16 06:56 PM, Dan Swartzendruber wrote: I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it depends on a ZFS resource agent which imports or exports a pool. So far, with stonith disabled, it all works perfectly. I was dubious about a 2-node solution, so I created a 3rd node which runs as a virtual machine on one of the hosts. All it is for is quorum. So, looking at fencing next. The primary server is a poweredge R905, which has DRAC for fencing. The backup storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using the DRAC agent for the former and the ipmilan for the latter? I was reading about location constraints, where you tell each instance of the fencing agent not to run on the node that would be getting fenced. So, my first thought was to configure the drac agent and tell it not to fence node 1, and configure the ipmilan agent and tell it not to fence node 2. The thing is, there is no agent available for the quorum node. Would it make more sense instead to tell the drac agent to only run on node 2, and the ipmilan agent to only run on node 1? Thanks! This is a common mistake. Fencing and quorum solve different problems and are not interchangeable. In short; Fencing is a tool when things go wrong. Quorum is a tool when things are working. The only impact that having quorum has with regard to fencing is that it avoids a scenario when both nodes try to fence each other and the faster one wins (which is itself OK). Even then, you can add 'delay=15' the node you want to win and it will win is such a case. In the old days, it would also prevent a fence loop if you started the cluster on boot and comms were down. Now though, you set 'wait_for_all' and you won't get a fence loop, so that solves that. Said another way; Quorum is optional, fencing is not (people often get that backwards). As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence action; rebooting the node, works via the basic IPMI standard using the DRAC's BMC. To do proper redundant fencing, which is a great idea, you want something like switched PDUs. This is how we do it (with two node clusters). IPMI first, and if that fails, a pair of PDUs (one for each PSU, each PDU going to independent UPSes) as backup. Thanks for the quick response. I didn't mean to give the impression that I didn't know the different between quorum and fencing. The only reason I (currently) have the quorum node was to prevent a deathmatch (which I had read about elsewhere.) If it is as simple as adding a delay as you describe, I'm inclined to go that route. At least on CentOS7, fence_ipmilan and fence_drac are not the same. e.g. they are both python scripts that are totally different. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it depends on a ZFS resource agent which imports or exports a pool. So far, with stonith disabled, it all works perfectly. I was dubious about a 2-node solution, so I created a 3rd node which runs as a virtual machine on one of the hosts. All it is for is quorum. So, looking at fencing next. The primary server is a poweredge R905, which has DRAC for fencing. The backup storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using the DRAC agent for the former and the ipmilan for the latter? I was reading about location constraints, where you tell each instance of the fencing agent not to run on the node that would be getting fenced. So, my first thought was to configure the drac agent and tell it not to fence node 1, and configure the ipmilan agent and tell it not to fence node 2. The thing is, there is no agent available for the quorum node. Would it make more sense instead to tell the drac agent to only run on node 2, and the ipmilan agent to only run on node 1? Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Cannot use upstart resources
Hi If I do pcs resource standards I get: ocf lsb service upstart stonith But if I do pcs resource create upstart:yyy I get: Error: Unable to create resource 'upstart:yyy', it is not installed on this system (use --force to override) What can be the problem? I I do strace on the above I see that it looks at: stat("/usr/lib/ocf/resource.d/heartbeat/upstart:yyy", 0x7ffc8694f6b0) = -1 ENOENT (No such file or directory) stat("/usr/lib/ocf/resource.d/linbit/upstart:yyy", 0x7ffc8694f6b0) = -1 ENOENT (No such file or directory) stat("/usr/lib/ocf/resource.d/pacemaker/upstart:yyy", 0x7ffc8694f6b0) = -1 ENOENT (No such file or directory) stat("/usr/lib/ocf/resource.d/redhat/upstart:yyy", 0x7ffc8694f6b0) = -1 ENOENT (No such file or directory) Is it my version of pcs that is doing something wrong? I am using pcs 0.9.143 Dan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Need bash instead of /bin/sh
ons 2015-09-23 klockan 16:24 +0300 skrev Vladislav Bogdanov: > 23.09.2015 15:42, dan wrote: > > ons 2015-09-23 klockan 14:08 +0200 skrev Ulrich Windl: > >>>>> dan schrieb am 23.09.2015 um 13:39 in > >>>>> Nachricht > >> <1443008370.2386.8.ca...@intraphone.com>: > >>> Hi > >>> > >>> As I had problem with corosync 2.3.3 and pacemaker 1.1.10 which was > >>> default in my version of ubuntu, I have now compiled and installed > >>> corosync 2.3.4 and pacemaker 1.1.12. > >>> > >>> And now it works. > >>> > >>> Though the file /usr/lib/ocf/resource.d/pacemaker/controld > >>> does not work as /bin/sh is linked to dash on ubuntu (and I think > >>> several other Linux variants). > >>> > >>> It is line 182: > >>> local addr_list=$(cat > >>> /sys/kernel/config/dlm/cluster/comms/*/addr_list 2>/dev/null) > >> > >> That looks like plain POSIX shell to me. What part is causing the problem? > > > > Did a small test: > > ---test.sh > > controld_start() { > > local addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2) > yep, that is a bashism. > > posix shell denies assignment of local variables in the declaration. > > local addr_list; addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2) > > should work I tested that too. And it also works. > > > echo $addr_list > > } > > > > controld_start > > -- > > > > dash test.sh > > test.sh: 2: local: 10.1.1.1: bad variable name > > > > bash test.sh > > AF_INET 10.1.1.1 AF_INET 10.1.1.2 > > > > > > Dan > > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Need bash instead of /bin/sh
ons 2015-09-23 klockan 15:20 +0200 skrev Ulrich Windl: > >>> dan schrieb am 23.09.2015 um 14:42 in > >>> Nachricht > <1443012134.2386.11.ca...@intraphone.com>: > > ons 2015-09-23 klockan 14:08 +0200 skrev Ulrich Windl: > >> >>> dan schrieb am 23.09.2015 um 13:39 in > > Nachricht > >> <1443008370.2386.8.ca...@intraphone.com>: > >> > Hi > >> > > >> > As I had problem with corosync 2.3.3 and pacemaker 1.1.10 which was > >> > default in my version of ubuntu, I have now compiled and installed > >> > corosync 2.3.4 and pacemaker 1.1.12. > >> > > >> > And now it works. > >> > > >> > Though the file /usr/lib/ocf/resource.d/pacemaker/controld > >> > does not work as /bin/sh is linked to dash on ubuntu (and I think > >> > several other Linux variants). > >> > > >> > It is line 182: > >> > local addr_list=$(cat > >> > /sys/kernel/config/dlm/cluster/comms/*/addr_list 2>/dev/null) > >> > >> That looks like plain POSIX shell to me. What part is causing the problem? > > > > Did a small test: > > ---test.sh > > controld_start() { > > local addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2) > > I see: Dash needs quoting aound "$(...)" it seems. I tested that and now my test script works fine. > > > echo $addr_list > > } > > > > controld_start > > -- > > > > dash test.sh > > test.sh: 2: local: 10.1.1.1: bad variable name > > > > bash test.sh > > AF_INET 10.1.1.1 AF_INET 10.1.1.2 > > > > > > Dan > > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Need bash instead of /bin/sh
ons 2015-09-23 klockan 14:08 +0200 skrev Ulrich Windl: > >>> dan schrieb am 23.09.2015 um 13:39 in > >>> Nachricht > <1443008370.2386.8.ca...@intraphone.com>: > > Hi > > > > As I had problem with corosync 2.3.3 and pacemaker 1.1.10 which was > > default in my version of ubuntu, I have now compiled and installed > > corosync 2.3.4 and pacemaker 1.1.12. > > > > And now it works. > > > > Though the file /usr/lib/ocf/resource.d/pacemaker/controld > > does not work as /bin/sh is linked to dash on ubuntu (and I think > > several other Linux variants). > > > > It is line 182: > > local addr_list=$(cat > > /sys/kernel/config/dlm/cluster/comms/*/addr_list 2>/dev/null) > > That looks like plain POSIX shell to me. What part is causing the problem? Did a small test: ---test.sh controld_start() { local addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2) echo $addr_list } controld_start ------ dash test.sh test.sh: 2: local: 10.1.1.1: bad variable name bash test.sh AF_INET 10.1.1.1 AF_INET 10.1.1.2 Dan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Need bash instead of /bin/sh
Hi As I had problem with corosync 2.3.3 and pacemaker 1.1.10 which was default in my version of ubuntu, I have now compiled and installed corosync 2.3.4 and pacemaker 1.1.12. And now it works. Though the file /usr/lib/ocf/resource.d/pacemaker/controld does not work as /bin/sh is linked to dash on ubuntu (and I think several other Linux variants). It is line 182: local addr_list=$(cat /sys/kernel/config/dlm/cluster/comms/*/addr_list 2>/dev/null) that does not work in ubuntu's version of dash. I fixed it by changing so /bin/bash is used instead of /bin/sh (on line 1 of the script). The current version in git looks like it has the same problem. Maybe you should switch to /bin/bash for scripts that need it as not everybody have /bin/sh linked to /bin/bash. Dan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Parser error with fence_ipmilan
fre 2015-09-18 klockan 09:58 +0200 skrev Marek marx Grác: > Hi, > > > On 15 Sep 2015 at 02:41:35, Andrew Beekhof (and...@beekhof.net) wrote: > > > > >> But now when the cluster wants to stonith a node I get: > > >> > > >> fence_ipmilan: Parser error: option -n/--plug is not recognize > > >> fence_ipmilan: Please use '-h' for usage > > >> > > >> Is the problem in fence-agents or in pacemaker? > > > > > > Looking at the code producing this, I got it working by adding to > > my > > > cluster config for my stonith devices > > > port_as_ip=1 port=192.168.xx.xx > > > > > > before I had: > > > lanplus=1 ipaddr=192.168.xx.xx > > > which worked fine before the new version of pacemaker. > > > Now I have: > > > lanplus=1 ipaddr=192.168.xx.xx port_as_ip=1 port=192.168.xx.xx > > > which works. > > > > I’m glad it works, looks like a regression to me though. > > You shouldn’t need to override the value pacemaker supplies for port > > if ipaddr is being set. > > > > Can you comment on this Marek? > This is surely a problem in fence agents. I believe that it was fixed > in August in a > > https://github.com/ClusterLabs/fence-agents/commit/155a51f01e6a806e17d70519f2d1507b09d9d137 I am not that good at python but does not that fix say that if I use port_as_ip, ipaddr and port are not required? I tested to add that code into the installed code (/usr/share/fence/fencing.py) and changed by configuration to the old one. It still fails in the same way. I there a way I can get the system to output what arguments are used when calling fence_ipmilan? Now my config that fails is: Resource: ipmi-fencing-host1 (class=stonith type=fence_ipmilan) Attributes: pcmk_host_list=host1 lanplus=1 ipaddr=192.168.1.10 login=x passwd= power_wait=4 Operations: monitor interval=60s (ipmi-fencing-host1-monitor-interval-60s) What works is: the same as above but with added: port_as_ip=1 port=192.168.1.10 Dan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Parser error with fence_ipmilan
mån 2015-09-14 klockan 10:02 +0200 skrev dan: > Hi > > To see if my cluster problem go away with a newer version of pacemaker I > have now installed pcemaker 1.1.12+git+a9c8177-3ubuntu1 and I had to get > 4.0.19-1 (ubuntu) of fence-agents to get a working fence-ipmilan. > > But now when the cluster wants to stonith a node I get: > > fence_ipmilan: Parser error: option -n/--plug is not recognize > fence_ipmilan: Please use '-h' for usage > > Is the problem in fence-agents or in pacemaker? Looking at the code producing this, I got it working by adding to my cluster config for my stonith devices port_as_ip=1 port=192.168.xx.xx before I had: lanplus=1 ipaddr=192.168.xx.xx which worked fine before the new version of pacemaker. Now I have: lanplus=1 ipaddr=192.168.xx.xx port_as_ip=1 port=192.168.xx.xx which works. Dan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Parser error with fence_ipmilan
Hi To see if my cluster problem go away with a newer version of pacemaker I have now installed pcemaker 1.1.12+git+a9c8177-3ubuntu1 and I had to get 4.0.19-1 (ubuntu) of fence-agents to get a working fence-ipmilan. But now when the cluster wants to stonith a node I get: fence_ipmilan: Parser error: option -n/--plug is not recognize fence_ipmilan: Please use '-h' for usage Is the problem in fence-agents or in pacemaker? Dan ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] hanging after node shutdown
Hi I have now for a few weeks been trying to get a cluster using pacemaker to work. We are using Ubuntu 14.04.2 LTS with corosync 2.3.3-1ubuntu1 pacemaker 1.1.10+git2013 It is a 2 node cluster and it includes a gfs2 file system on top of drbd. After som initial problem with stonith not working due to dlm_stonith missing (which I fixed by compiling it myself), it looked good. I have set upp the cluster to power off the other node through stonith instead of reboot as is default. I tested failures by doing init 0, halt -f, pkill -9 coresync on one node and it worked fine. But then I detected that after the cluster had been up (both nodes) for 2 days, doing init 0 on one node resulted in that node hanging during shutdown and the other node failing to stonith it. And after forcing the hanging node to power off and then powering it on, doing pcs status on it reports not being able to talk to other node and all resources are stopped. And on the other node (which have been running the whole time) pcs status hangs (crm status works and says that all is up) and the gfs2 file system is blocking. Doing init 0 on this node never shuts it down, a reboot -f does work and after it is upp again the entire cluster is ok. So in short, everything works fine after a fresh boot of both two nodes but after 2 days a requested shutdown of one node (using init 0) hangs and the other node stops working correctly. Looking at the console on the node I did init 0 on, dlm_controld reports that cluster is down and then that drbd have problem talking to other node, and then that gfs2 is blocked. So that is why that node never powers off - gfs2 and drbd was not shutdown correctly by the pacemaker before it stopped (or is trying to stop). Looking through the logs (syslog and corosync.log) (I have debug mode on corosync) I can see that on node 1 (the one I left running the whole time) it does: stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node node2[2] - corosync-cpg is now offline crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node node2[2] - corosync-cpg is now offline crmd: info: peer_update_callback: Client node2/peer now has status [offline] (DC=node2) crmd: notice: peer_update_callback: Our peer on the DC is dead stonith-ng notice: handle_request: Client stonith-api.10797.41ef3128 wants to fence (off) '2' with device '(any)' stonith-ng notice: initiate_remote_stonith_op: Initiating remote operation off for node2: 20f62cf6-90eb-4c53-8da1-30ab 048de495 (0) stonith-ng: info: stonith_command: Processed st_fence from stonith-api.10797: Operation now in progress (-115) corosyncdebug [TOTEM ] Resetting old ring state corosyncdebug [TOTEM ] recovery to regular 1-0 corosyncdebug [MAIN ] Member left: r(0) ip(10.10.1.2) r(1) ip(192.168.12.142) corosyncdebug [TOTEM ] waiting_trans_ack changed to 1 corosyncdebug [TOTEM ] entering OPERATIONAL state. corosyncnotice [TOTEM ] A new membership (10.10.1.1:588) was formed. Members left: 2 corosyncdebug [SYNC ] Committing synchronization for corosync configuration map access corosyncdebug [QB] Not first sync -> no action corosyncdebug [CPG ] comparing: sender r(0) ip(10.10.1.1) r(1) ip(192.168.12.140) ; members(old:2 left:1) corosyncdebug [CPG ] chosen downlist: sender r(0) ip(10.10.1.1) r(1) ip(192.168.12.140) ; members(old:2 left:1) corosyncdebug [CPG ] got joinlist message from node 1 corosyncdebug [SYNC ] Committing synchronization for corosync cluster closed process group service v1.01 and a little later most log entries are: cib: info: crm_cs_flush: Sent 0 CPG messages (3 remaining, last=25): Try again (6) The Sent 0 CFG messages is logged forever until I force reboot of this node. On node 2 (the one I did init 0) I can find: stonith-ng[1415]: notice: log_operation: Operation 'monitor' [17088] for device 'ipmi-fencing-node1' returned: -201 (Generic Pacem aker error) several lines from crmd, attrd, pengine about ipmi-fencing Hard to know what log entries are important. But as as summary: after power on my 2 node cluster works fine, reboots and other node failure tests all work fine. But after letting the cluster run for 2 days, when I do node failure test parts of the cluster services fails to stop on the node failure is simulated and both nodes stop working (even though only one node was shutdown). The version of corosync and pacemaker is somewhat old - it is the official version available for our ubuntu version. Is this a known problem? I have seen that there are newer versions available, pacemaker has many changes done as I see on github. If this is a know problem, which versions of corosync and pacemaker should I try to change to? Or do you have some other idea what I can test/try to pin this down? Dan ___ Users mailing list: Users@clusterlabs.org http://clus