[ClusterLabs] Using fence_scsi agent and watchdog
Hello all, I've setup a 2 Nodes PCS lab to test the fence_scsi agent and how it works. The lab is comprised by the following VMs, all CentOS 7.3 under VMware Workstation: pcs1 - 192.168.199.101 pcs2 - 192.168.199.102 iscsi - 192.168.199.200 ISCSI Server The ISCSI server is providing 3 Block Volumes like these to both PCS nodes: /dev/sdb 200 MB fence volume with working SCSI-3 persistent reservation /dev/sdc 1GB data volume XFS /dev/sdd 2GB data volume XFS The Fencing agent is configured like this: pcs stonith create FenceSCSI fence_scsi pcmk_host_list="pcs1 pcs2" devices=/dev/sdb meta provides=unfencing Then I've created 2 ResGroups, each with an LVM Volume mounted under /cluster/fs1 and /cluster/fs2. PCS is working like expected in managing resources. Coming to the fence_scsi it seems that to be sure to have one node fenced the only solution is to install the watchdog rpm and to link the correct /usr/share/cluster/fence_scsi_check file in the /etc/watchdog.d directory. But I've noticed that there is a significant lag between the effective reboot of the node and the resource takeover on the surviving node which could lead to a dangerous situation, for example: 1. stonith_admin -F pcs1 2. PCS will stop on pcs1 and resource are switched on node pcs2 in a few moments 3. watchdog in some more time will trigger the reboot of the pcs1 node. I've the following questions: A. Is this the only possible configuration in order to use the fence_scsi agent to be sure that fenced node is rebooted? If yes I think that documentation should be updated accordingly because it is not very clear B. is there a way to make the surviving node to wait that the fenced node is actually rebooted before taking over the resources from the fenced node? Thanks in advance for any answers. Best regards, Luca ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Retries before setting fail-count to INFINITY
On Mon, 2017-08-21 at 15:39 +0200, Ulrich Windl wrote: > >>> Vaibhaw Pandeyschrieb am 21.08.2017 um 14:58 in > Nachricht >
Re: [ClusterLabs] SLES11 SP4: Strange problem with "(crm configure) commit"
Ulrich Windlwrites: > Hi! > > I just had a strange problem: When trying to "clean up" the cib configuration > (acually deleting unneded "operations" lines), I failed to commit the change, > even through it verified OK: > > crm(live)configure# commit > Call cib_apply_diff failed (-206): Application of an update diff failed > ERROR: could not patch cib (rc=206) > INFO: offending xml diff: It looks to me (from a cursory glance) like you may be hitting a bug with the patch generation in pacemaker. But there isn't enough details to say for sure. Try running crmsh with the "-dR" command line options to get it to output the patch it tries to apply to the log. Cheers, Kristoffer > > In Syslog I see this: > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > meta_attributes.14926208 to position 1 instead of 2 (0xe3f0f0) > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > meta_attributes.9876096 to position 1 instead of 2 (0xe3c470) > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > utilization.10594784 to position 1 instead of 2 (0x96a2b0) > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > meta_attributes.11397008 to position 1 instead of 2 (0xacc5b0) > Aug 21 15:01:48 h02 cib[19397]: warning: cib_server_process_diff: Something > went wrong in compatibility mode, requesting full refresh > Aug 21 15:01:48 h02 cib[19397]: warning: cib_process_request: Completed > cib_apply_diff operation for section 'all': Application of an update diff > failed (rc=-206, origin=local/cibadmin/2, version=1.65.23) > > What could be causing this? I think I did the same change about three years > ago without problem (with different software, of course). > > # rpm -q pacemaker corosync crmsh > pacemaker-1.1.12-18.1 > corosync-1.4.7-0.23.5 > crmsh-2.1.2+git132.gbc9fde0-18.2 > (latest) > > Regards, > Ulrich > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Retries before setting fail-count to INFINITY
>>> Vaibhaw Pandeyschrieb am 21.08.2017 um 14:58 in Nachricht
[ClusterLabs] Retries before setting fail-count to INFINITY
Version in use: 1.1 along with corosync 1.4 Hello, I am new to pacemaker and was trying to setup a MySQL master/slave cluster using pacemaker and had a question on resource failure response which I couldn't resolve from the documentation. The pacemaker doc ( https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_failure_response.html) says clearly that: "Normally, if a running resource fails, pacemaker will try to stop it and start it again." I was wondering if there is a way to configure the # of times pacemaker will attempt this start and stop sequence - we want to try and restart the resource 2 or 3 times before it is stopped. Obviously setting a migration-threshold doesn't work in this case because the moment the 1st attempt to restart the resource fails, fail-count is set to INFINITY. Our failure-timeout is set to default (0). The reason we wish to do this is that, at times the database is busy and the monitor action fails. However there is a good chance it might succeed on a second or third attempt. Is there a parameter in pacemaker that we can utilize to cause this behavior or will this have to be coded in the resource agent? Thanks, Vaibhaw ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org