[ClusterLabs] Antw: Re: fence_sanlock and pacemaker
>>> "Laurent B." schrieb am 27.08.2015 um 08:06 in Nachricht <55dea8cc.3080...@qmail.re>: > Hello, > >> You’d have to build it yourself, but sbd could be an option >> > > do you have any clue on how to install it on redhat (6.5) ? I installed > the gluster glue package and the sbd package (provided by OpenSUSE) but > now I'm stuck. The stonith resource creation give me an error saying > that the sbd resource was not found. sbd has to be started before the cluster software. SUSE does something like: SBD_CONFIG=/etc/sysconfig/sbd SBD_BIN="/usr/sbin/sbd" if [ -f $SBD_CONFIG ]; then . $SBD_CONFIG fi [ -x "$exec" ] || exit 0 SBD_DEVS=${SBD_DEVICE%;} SBD_DEVICE=${SBD_DEVS//;/ -d } : ${SBD_DELAY_START:="no"} StartSBD() { test -x $SBD_BIN || return if [ -n "$SBD_DEVICE" ]; then if ! pidofproc $SBD_BIN >/dev/null 2>&1 ; then echo -n "Starting SBD - " if ! $SBD_BIN -d $SBD_DEVICE -D $SBD_OPTS watch ; then echo "SBD failed to start; aborting." exit 1 fi if env_is_true ${SBD_DELAY_START} ; then sleep $(sbd -d "$SBD_DEVICE" dump | grep -m 1 ms gwait | awk '{print $4}') 2>/dev/null fi fi fi } StopSBD() { test -x $SBD_BIN || return if [ -n "$SBD_DEVICE" ]; then echo -n "Stopping SBD - " if ! $SBD_BIN -d $SBD_DEVICE -D $SBD_OPTS message LOCAL exit ; t hen echo "SBD failed to stop; aborting." exit 1 fi fi while pidofproc $SBD_BIN >/dev/null 2>&1 ; do sleep 1 done echo -n "done " } > > Thank you, > > Laurent > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_sanlock and pacemaker
> On 27 Aug 2015, at 4:06 pm, Laurent B. wrote: > > Hello, > >> You’d have to build it yourself, but sbd could be an option >> > > do you have any clue on how to install it on redhat (6.5) ? I installed > the gluster glue package and the sbd package (provided by OpenSUSE) but’ don’t do that. grab the el7 one, no glue needed some details on setting it up in https://bugzilla.redhat.com/show_bug.cgi?id=1221680 /me makes a note to document it properly one of these days > now I'm stuck. The stonith resource none needed > creation give me an error saying > that the sbd resource was not found. > > Thank you, > > Laurent > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: resource-stickiness
>>> Andrew Beekhof schrieb am 27.08.2015 um 00:20 in Nachricht : >> On 26 Aug 2015, at 10:09 pm, Rakovec Jost wrote: >> >> Sorry one typo: problem is the same >> >> >> location cli-prefer-aapche aapche role=Started 10: sles2 > > Change the name of your constraint. > The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by > the command line tools (which therefor feel entitled to delete them as > necessary). In which ways is "cli-prefer-" handled specially, if I may ask... > >> >> to: >> >> location cli-prefer-aapche aapche role=Started inf: sles2 >> >> >> It keep change to infinity. >> >> >> >> my configuration is: >> >> node sles1 >> node sles2 >> primitive filesystem Filesystem \ >>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ >>op start interval=0 timeout=60 \ >>op stop interval=0 timeout=60 \ >>op monitor interval=20 timeout=40 >> primitive myip IPaddr2 \ >>params ip=x.x.x.x \ >>op start interval=0 timeout=20s \ >>op stop interval=0 timeout=20s \ >>op monitor interval=10s timeout=20s >> primitive stonith_sbd stonith:external/sbd \ >>params pcmk_delay_max=30 >> primitive web apache \ >>params configfile="/etc/apache2/httpd.conf" \ >>op start interval=0 timeout=40s \ >>op stop interval=0 timeout=60s \ >>op monitor interval=10 timeout=20s >> group aapche filesystem myip web \ >>meta target-role=Started is-managed=true resource-stickiness=1000 >> location cli-prefer-aapche aapche role=Started 10: sles2 >> property cib-bootstrap-options: \ >>stonith-enabled=true \ >>no-quorum-policy=ignore \ >>placement-strategy=balanced \ >>expected-quorum-votes=2 \ >>dc-version=1.1.12-f47ea56 \ >>cluster-infrastructure="classic openais (with plugin)" \ >>last-lrm-refresh=1440502955 \ >>stonith-timeout=40s >> rsc_defaults rsc-options: \ >>resource-stickiness=1000 \ >>migration-threshold=3 >> op_defaults op-options: \ >>timeout=600 \ >>record-pending=true >> >> >> >> and after migration: >> >> >> node sles1 >> node sles2 >> primitive filesystem Filesystem \ >>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ >>op start interval=0 timeout=60 \ >>op stop interval=0 timeout=60 \ >>op monitor interval=20 timeout=40 >> primitive myip IPaddr2 \ >>params ip=10.9.131.86 \ >>op start interval=0 timeout=20s \ >>op stop interval=0 timeout=20s \ >>op monitor interval=10s timeout=20s >> primitive stonith_sbd stonith:external/sbd \ >>params pcmk_delay_max=30 >> primitive web apache \ >>params configfile="/etc/apache2/httpd.conf" \ >>op start interval=0 timeout=40s \ >>op stop interval=0 timeout=60s \ >>op monitor interval=10 timeout=20s >> group aapche filesystem myip web \ >>meta target-role=Started is-managed=true resource-stickiness=1000 >> location cli-prefer-aapche aapche role=Started inf: sles2 >> property cib-bootstrap-options: \ >>stonith-enabled=true \ >>no-quorum-policy=ignore \ >>placement-strategy=balanced \ >>expected-quorum-votes=2 \ >>dc-version=1.1.12-f47ea56 \ >>cluster-infrastructure="classic openais (with plugin)" \ >>last-lrm-refresh=1440502955 \ >>stonith-timeout=40s >> rsc_defaults rsc-options: \ >>resource-stickiness=1000 \ >>migration-threshold=3 >> op_defaults op-options: \ >>timeout=600 \ >>record-pending=true >> >> >> From: Rakovec Jost >> Sent: Wednesday, August 26, 2015 1:33 PM >> To: users@clusterlabs.org >> Subject: resource-stickiness >> >> Hi list, >> >> >> I have configure simple cluster on sles 11 sp4 and have a problem with > “auto_failover off". The problem is that when ever I migrate resource group > via HAWK my configuration change from: >> >> location cli-prefer-aapche aapche role=Started 10: sles2 >> >> to: >> >> location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 >> >> >> It keep change to inf. >> >> >> and then after fance node, resource is moving back to original node which I > don't want. How can I avoid this situation? >> >> my configuration is: >> >> node sles1 >> node sles2 >> primitive filesystem Filesystem \ >>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ >>op start interval=0 timeout=60 \ >>op stop interval=0 timeout=60 \ >>op monitor interval=20 timeout=40 >> primitive myip IPaddr2 \ >>params ip=x.x.x.x \ >>op start interval=0 timeout=20s \ >>op stop interval=0 timeout=20s \ >>op monitor interval=10s timeout=20s >> primitive stonith_sbd stonith:external/sbd \ >>params pcmk_dela
Re: [ClusterLabs] fence_sanlock and pacemaker
Hello, > You’d have to build it yourself, but sbd could be an option > do you have any clue on how to install it on redhat (6.5) ? I installed the gluster glue package and the sbd package (provided by OpenSUSE) but now I'm stuck. The stonith resource creation give me an error saying that the sbd resource was not found. Thank you, Laurent ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [corosync] Question: Duration of DC election
> On 25 Aug 2015, at 7:46 pm, Stefan Wenk wrote: > > Hi, > > I'm performing downtime measurement tests using corosync version 2.3.0 and > pacemaker version 1.1.12 under RHEL 6.5 MRG and although not recommended, I > tuned the corosync configuration settings to following insane values: > ># Timeout for token >token: 60 >token_retransmits_before_loss_const: 1 > ># How long to wait for join messages in the membership protocol (ms) >join: 35 >consensus: 70 > > My two node cluster consists of a kamailio clone resource, which replicates > the so called userlocation state using DMQ on application level (see [1]). > The switchover performs the migration of a ocf:heartbeat:IPaddr2 resource. > With these settings, the service downtime is lower 100ms in case of a > controlled cluster switchover, when "/etc/init.d/pacemaker stop" and > "/etc/init.d/corosync stop" get executed. > > The service downtime is about 400ms when the power loss is simulated on the > active node, which does not execute the DC task. When I simulate power loss > on the active node, which is active and executes the DC task, the service > downtime increases to about 1500ms. As the timestamps in the logs are on > second resolution only, it is hard to provide more detailed numbers, but > apparently the DC election procedure takes more than 1000ms. > > Are there any possibilities to tune the DC election process? Is there > documentation available what is happening in this situation? > > Tests with more nodes in the cluster showed that the service downtime > increases with the number of online cluster nodes, even if the DC is executed > on one of the nodes, which remain active. When there is only 2 nodes, then there is effectively no election happening and the delay is made up of: - corosync detection time - time for the crmd to send a message to itself via corosync - time for the policy engine to figure out where to put the service - time for the start action of your service(s) to execute There _should_ be an entry for “time to fence the peer”, but based on your reported times I’m assuming you’ve turned that off. As the node count goes up, elections need to start happening for real (so you need to hear from everyone and have them all agree on a winner) but still it should be pretty quick. The policy engine will take incrementally longer because it has more nodes to loop through, but that should be negligible on the scale that corosync can operate at. I’d be interested to know what log messages you’re basing your timing numbers on. > > I'm using one ring only. It looks as the usage of two rings do not change the > test results a lot. > > Thank you, > > Stefan > > [1] http://kamailio.org/docs/modules/devel/modules/dmq.html > > ___ > discuss mailing list > disc...@corosync.org > http://lists.corosync.org/mailman/listinfo/discuss ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] resource-stickiness
> On 26 Aug 2015, at 10:09 pm, Rakovec Jost wrote: > > Sorry one typo: problem is the same > > > location cli-prefer-aapche aapche role=Started 10: sles2 Change the name of your constraint. The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by the command line tools (which therefor feel entitled to delete them as necessary). > > to: > > location cli-prefer-aapche aapche role=Started inf: sles2 > > > It keep change to infinity. > > > > my configuration is: > > node sles1 > node sles2 > primitive filesystem Filesystem \ >params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ >op start interval=0 timeout=60 \ >op stop interval=0 timeout=60 \ >op monitor interval=20 timeout=40 > primitive myip IPaddr2 \ >params ip=x.x.x.x \ >op start interval=0 timeout=20s \ >op stop interval=0 timeout=20s \ >op monitor interval=10s timeout=20s > primitive stonith_sbd stonith:external/sbd \ >params pcmk_delay_max=30 > primitive web apache \ >params configfile="/etc/apache2/httpd.conf" \ >op start interval=0 timeout=40s \ >op stop interval=0 timeout=60s \ >op monitor interval=10 timeout=20s > group aapche filesystem myip web \ >meta target-role=Started is-managed=true resource-stickiness=1000 > location cli-prefer-aapche aapche role=Started 10: sles2 > property cib-bootstrap-options: \ >stonith-enabled=true \ >no-quorum-policy=ignore \ >placement-strategy=balanced \ >expected-quorum-votes=2 \ >dc-version=1.1.12-f47ea56 \ >cluster-infrastructure="classic openais (with plugin)" \ >last-lrm-refresh=1440502955 \ >stonith-timeout=40s > rsc_defaults rsc-options: \ >resource-stickiness=1000 \ >migration-threshold=3 > op_defaults op-options: \ >timeout=600 \ >record-pending=true > > > > and after migration: > > > node sles1 > node sles2 > primitive filesystem Filesystem \ >params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ >op start interval=0 timeout=60 \ >op stop interval=0 timeout=60 \ >op monitor interval=20 timeout=40 > primitive myip IPaddr2 \ >params ip=10.9.131.86 \ >op start interval=0 timeout=20s \ >op stop interval=0 timeout=20s \ >op monitor interval=10s timeout=20s > primitive stonith_sbd stonith:external/sbd \ >params pcmk_delay_max=30 > primitive web apache \ >params configfile="/etc/apache2/httpd.conf" \ >op start interval=0 timeout=40s \ >op stop interval=0 timeout=60s \ >op monitor interval=10 timeout=20s > group aapche filesystem myip web \ >meta target-role=Started is-managed=true resource-stickiness=1000 > location cli-prefer-aapche aapche role=Started inf: sles2 > property cib-bootstrap-options: \ >stonith-enabled=true \ >no-quorum-policy=ignore \ >placement-strategy=balanced \ >expected-quorum-votes=2 \ >dc-version=1.1.12-f47ea56 \ >cluster-infrastructure="classic openais (with plugin)" \ >last-lrm-refresh=1440502955 \ >stonith-timeout=40s > rsc_defaults rsc-options: \ >resource-stickiness=1000 \ >migration-threshold=3 > op_defaults op-options: \ >timeout=600 \ >record-pending=true > > > From: Rakovec Jost > Sent: Wednesday, August 26, 2015 1:33 PM > To: users@clusterlabs.org > Subject: resource-stickiness > > Hi list, > > > I have configure simple cluster on sles 11 sp4 and have a problem with > “auto_failover off". The problem is that when ever I migrate resource group > via HAWK my configuration change from: > > location cli-prefer-aapche aapche role=Started 10: sles2 > > to: > > location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 > > > It keep change to inf. > > > and then after fance node, resource is moving back to original node which I > don't want. How can I avoid this situation? > > my configuration is: > > node sles1 > node sles2 > primitive filesystem Filesystem \ >params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ >op start interval=0 timeout=60 \ >op stop interval=0 timeout=60 \ >op monitor interval=20 timeout=40 > primitive myip IPaddr2 \ >params ip=x.x.x.x \ >op start interval=0 timeout=20s \ >op stop interval=0 timeout=20s \ >op monitor interval=10s timeout=20s > primitive stonith_sbd stonith:external/sbd \ >params pcmk_delay_max=30 > primitive web apache \ >params configfile="/etc/apache2/httpd.conf" \ >op start interval=0 timeout=40s \ >op stop interval=0 timeout=60s \ >op monitor interval=10 timeout=20s > group aapche filesystem myip web \ >meta target-rol
Re: [ClusterLabs] fence_sanlock and pacemaker
> On 27 Aug 2015, at 4:11 am, Laurent B. wrote: > > Gents, > > I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well > except the fencing. The cluster's node are not connected to the > management lan (where stand all the iLO/UPS/APC devices) and it's not > planned to connecting them to this lan. > > With these constraints, I figured out that a way to get fencing working > is to use *fence_sanlock*. I followed this tutorial: > https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some > problem with SELinux that I finally disabled like specified in the > following RHEL 6.5 release note: > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/ > ) > > So perfect. The problem is that fence_sanlock relies on cman and not > pacemaker. So with stonith disabled, pacemaker restarts the resources > without waiting for the victim to be fenced and with stonith enabled, > pacemaker complains about the lack of stonith resources and block all > the cluster. > I tried to put fence_sanlock as a stonith resource at the pacemaker > level but as explained there > http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it > does not work and as explained there > https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to > make it work. > > My question: is there a workaround ? You’d have to build it yourself, but sbd could be an option > > Thank you, > > Laurent > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] multiple drives looks like balancing but why and causing troubles
On 26/08/15 02:46 PM, Streeter, Michelle N wrote: > I have a two node cluster. Both nodes are virtual and have five shared > drives attached via sas controller. For some reason, the cluster shows > both nodes have half the drives started on them. Not sure if this is > called split brain or not. It definitely looks load balancing. But I > did not set up load balancing. On my client, I only see the data for > the shares on the active cluster node. But they should all be on the > active cluster node. Any suggestions as to why this is happening? Is > there a setting so that everything works on only one node at a time? Can you explain what you mean by "shared drives"? Are these iSCSI LUNs or direct connections to either port on SAS drives? A split-brain is when either node things the other is dead and is operating without coordinating with the peer. It is a disasterous situation with shared storage and it is what fencing (stonith) prevents, which you don't have configured. If you are using KVM, use fence_virsh or fence_virt. If you're using vmware, use fence_vmware. Please make this a priority before solving your storage issue. > pcs cluster status: > > Cluster name: CNAS > > Last updated: Wed Aug 26 13:35:47 2015 > > Last change: Wed Aug 26 13:28:55 2015 > > Stack: classic openais (with plugin) > > Current DC: nas02 - partition with quorum > > Version: 1.1.11-97629de > > 2 Nodes configured, 2 expected votes > > 11 Resources configured > > > > > > Online: [ nas01 nas02 ] > > > > Full list of resources: > > > > NAS(ocf::heartbeat:IPaddr2): Started nas01 > > Resource Group: datag > > datashare (ocf::heartbeat:Filesystem):Started nas02 > > dataserver (ocf::heartbeat:nfsserver): Started nas02 > > Resource Group: oomtlg > > oomtlshare (ocf::heartbeat:Filesystem):Started nas01 > > oomtlserver(ocf::heartbeat:nfsserver): Started nas01 > > Resource Group: oomtrg > > oomtrshare (ocf::heartbeat:Filesystem):Started nas02 > > oomtrserver(ocf::heartbeat:nfsserver): Started as02 > > Resource Group: oomblg > > oomblshare (ocf::heartbeat:Filesystem):Started nas01 > > oomblserver(ocf::heartbeat:nfsserver): Started nas01 > > Resource Group: oombrg > > oombrshare (ocf::heartbeat:Filesystem):Started nas02 > > oombrserver(ocf::heartbeat:nfsserver): Started nas02 > > > > pcs config show: > > Cluster Name: CNAS > > Corosync Nodes: > > nas01 nas02 > > Pacemaker Nodes: > > nas01 nas02 > > > > Resources: > > Resource: NAS (class=ocf provider=heartbeat type=IPaddr2) > > Attributes: ip=192.168.56.110 cidr_netmask=24 > > Operations: start interval=0s timeout=20s (NAS-start-timeout-20s) > > stop interval=0s timeout=20s (NAS-stop-timeout-20s) > > monitor interval=10s timeout=20s (NAS-monitor-interval-10s) > > Group: datag > > Resource: datashare (class=ocf provider=heartbeat type=Filesystem) > >Attributes: device=/dev/sdb1 directory=/data fstype=ext4 > >Operations: start interval=0s timeout=60 (datashare-start-timeout-60) > >stop interval=0s timeout=60 (datashare-stop-timeout-60) > >monitor interval=20 timeout=40 > (datashare-monitor-interval-20) > > Resource: dataserver (class=ocf provider=heartbeat type=nfsserver) > >Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true > >Operations: start interval=0s timeout=40 (dataserver-start-timeout-40) > >stop interval=0s timeout=20s (dataserver-stop-timeout-20s) > >monitor interval=10 timeout=20s > (dataserver-monitor-interval-10) > > Group: oomtlg > > Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem) > >Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4 > >Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60) > >stop interval=0s timeout=60 (oomtlshare-stop-timeout-60) > >monitor interval=20 timeout=40 > (oomtlshare-monitor-interval-20) > > Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver) > >Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true > >Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40) > >stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s) > >monitor interval=10 timeout=20s > (oomtlserver-monitor-interval-10) > > Group: oomtrg > > Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem) > >Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4 > >Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60) > >stop interval=0s timeout=60 (oomtrshare-stop-timeout-60) > >monitor interval=20 timeout=40 > (oomtrshare-monitor-interval-20) > > Resource: oomtrserver (class=ocf provider=heartb
[ClusterLabs] multiple drives looks like balancing but why and causing troubles
I have a two node cluster. Both nodes are virtual and have five shared drives attached via sas controller. For some reason, the cluster shows both nodes have half the drives started on them. Not sure if this is called split brain or not. It definitely looks load balancing. But I did not set up load balancing. On my client, I only see the data for the shares on the active cluster node. But they should all be on the active cluster node. Any suggestions as to why this is happening? Is there a setting so that everything works on only one node at a time? pcs cluster status: Cluster name: CNAS Last updated: Wed Aug 26 13:35:47 2015 Last change: Wed Aug 26 13:28:55 2015 Stack: classic openais (with plugin) Current DC: nas02 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 11 Resources configured Online: [ nas01 nas02 ] Full list of resources: NAS(ocf::heartbeat:IPaddr2): Started nas01 Resource Group: datag datashare (ocf::heartbeat:Filesystem):Started nas02 dataserver (ocf::heartbeat:nfsserver): Started nas02 Resource Group: oomtlg oomtlshare (ocf::heartbeat:Filesystem):Started nas01 oomtlserver(ocf::heartbeat:nfsserver): Started nas01 Resource Group: oomtrg oomtrshare (ocf::heartbeat:Filesystem):Started nas02 oomtrserver(ocf::heartbeat:nfsserver): Started as02 Resource Group: oomblg oomblshare (ocf::heartbeat:Filesystem):Started nas01 oomblserver(ocf::heartbeat:nfsserver): Started nas01 Resource Group: oombrg oombrshare (ocf::heartbeat:Filesystem):Started nas02 oombrserver(ocf::heartbeat:nfsserver): Started nas02 pcs config show: Cluster Name: CNAS Corosync Nodes: nas01 nas02 Pacemaker Nodes: nas01 nas02 Resources: Resource: NAS (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.56.110 cidr_netmask=24 Operations: start interval=0s timeout=20s (NAS-start-timeout-20s) stop interval=0s timeout=20s (NAS-stop-timeout-20s) monitor interval=10s timeout=20s (NAS-monitor-interval-10s) Group: datag Resource: datashare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdb1 directory=/data fstype=ext4 Operations: start interval=0s timeout=60 (datashare-start-timeout-60) stop interval=0s timeout=60 (datashare-stop-timeout-60) monitor interval=20 timeout=40 (datashare-monitor-interval-20) Resource: dataserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (dataserver-start-timeout-40) stop interval=0s timeout=20s (dataserver-stop-timeout-20s) monitor interval=10 timeout=20s (dataserver-monitor-interval-10) Group: oomtlg Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4 Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60) stop interval=0s timeout=60 (oomtlshare-stop-timeout-60) monitor interval=20 timeout=40 (oomtlshare-monitor-interval-20) Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40) stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s) monitor interval=10 timeout=20s (oomtlserver-monitor-interval-10) Group: oomtrg Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4 Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60) stop interval=0s timeout=60 (oomtrshare-stop-timeout-60) monitor interval=20 timeout=40 (oomtrshare-monitor-interval-20) Resource: oomtrserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oomtr/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (oomtrserver-start-timeout-40) stop interval=0s timeout=20s (oomtrserver-stop-timeout-20s) monitor interval=10 timeout=20s (oomtrserver-monitor-interval-10) Group: oomblg Resource: oomblshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sde1 directory=/oombl fstype=ext4 Operations: start interval=0s timeout=60 (oomblshare-start-timeout-60) stop interval=0s timeout=60 (oomblshare-stop-timeout-60) monitor interval=20 timeout=40 (oomblshare-monitor-interval-20) Resource: oomblserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oombl/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (oomblserver-start-timeout-40) stop interval=0s timeout=20s (oomblserver-stop-
[ClusterLabs] fence_sanlock and pacemaker
Gents, I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well except the fencing. The cluster's node are not connected to the management lan (where stand all the iLO/UPS/APC devices) and it's not planned to connecting them to this lan. With these constraints, I figured out that a way to get fencing working is to use *fence_sanlock*. I followed this tutorial: https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some problem with SELinux that I finally disabled like specified in the following RHEL 6.5 release note: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/ ) So perfect. The problem is that fence_sanlock relies on cman and not pacemaker. So with stonith disabled, pacemaker restarts the resources without waiting for the victim to be fenced and with stonith enabled, pacemaker complains about the lack of stonith resources and block all the cluster. I tried to put fence_sanlock as a stonith resource at the pacemaker level but as explained there http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it does not work and as explained there https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to make it work. My question: is there a workaround ? Thank you, Laurent ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: NFS exports
>>> "Streeter, Michelle N" schrieb am >>> 26.08.2015 um 15:42 in Nachricht <9a18847a77a9a14da7e0fd240efcafc2504...@xch-phx-501.sw.nos.boeing.com>: > I have been using linux /etc/exports to put my exports for my cluster and it > works fine this way as long as every node has this done. > > I tried to add the exportfs resource but this keeps failing. Did you use fully qualified names? > > Is it preferred that we use /etc/exports or the exportfs for pacemaker? > > Michelle Streeter > ASC2 MCS - SDE/ACL/SDL/EDL OKC Software Engineer > The Boeing Company ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] NFS exports
I have been using linux /etc/exports to put my exports for my cluster and it works fine this way as long as every node has this done. I tried to add the exportfs resource but this keeps failing. Is it preferred that we use /etc/exports or the exportfs for pacemaker? Michelle Streeter ASC2 MCS - SDE/ACL/SDL/EDL OKC Software Engineer The Boeing Company ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] resource-stickiness
Sorry one typo: problem is the same location cli-prefer-aapche aapche role=Started 10: sles2 to: location cli-prefer-aapche aapche role=Started inf: sles2 It keep change to infinity. my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile="/etc/apache2/httpd.conf" \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure="classic openais (with plugin)" \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true and after migration: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=10.9.131.86 \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile="/etc/apache2/httpd.conf" \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started inf: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure="classic openais (with plugin)" \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true From: Rakovec Jost Sent: Wednesday, August 26, 2015 1:33 PM To: users@clusterlabs.org Subject: resource-stickiness Hi list, I have configure simple cluster on sles 11 sp4 and have a problem with “auto_failover off". The problem is that when ever I migrate resource group via HAWK my configuration change from: location cli-prefer-aapche aapche role=Started 10: sles2 to: location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 It keep change to inf. and then after fance node, resource is moving back to original node which I don't want. How can I avoid this situation? my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile="/etc/apache2/httpd.conf" \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure="classic openais (with plugin)" \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true
[ClusterLabs] resource-stickiness
Hi list, I have configure simple cluster on sles 11 sp4 and have a problem with "auto_failover off". The problem is that when ever I migrate resource group via HAWK my configuration change from: location cli-prefer-aapche aapche role=Started 10: sles2 to: location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 It keep change to inf. and then after fance node, resource is moving back to original node which I don't want. How can I avoid this situation? my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile="/etc/apache2/httpd.conf" \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure="classic openais (with plugin)" \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true and after migration: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=10.9.131.86 \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile="/etc/apache2/httpd.conf" \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure="classic openais (with plugin)" \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true thanks Best Regards Jost ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync GitHub vs. dev list
Jan Friesse writes: >> Since Corosync is hosted on GitHub, I wonder if it's enough to submit >> pull requests/issues/patch comments there to get the developers > > Yes, gh is enough. Thanks for the clarification and the quick action! -- Regards, Feri. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync GitHub vs. dev list
Ferenc, Hi, Since Corosync is hosted on GitHub, I wonder if it's enough to submit pull requests/issues/patch comments there to get the developers Yes, gh is enough. Regards, Honza attention, or should I also post to develop...@clusterlabs.org? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org