[ClusterLabs] Antw: Re: fence_sanlock and pacemaker

2015-08-26 Thread Ulrich Windl
>>> "Laurent B."  schrieb am 27.08.2015 um 08:06 in
Nachricht
<55dea8cc.3080...@qmail.re>:
> Hello,
> 
>> You’d have to build it yourself, but sbd could be an option
>>
> 
> do you have any clue on how to install it on redhat (6.5) ? I installed
> the gluster glue package and the sbd package (provided by OpenSUSE) but
> now I'm stuck. The stonith resource creation give me an error saying
> that the sbd resource was not found.

sbd has to be started before the cluster software. SUSE does something like:
SBD_CONFIG=/etc/sysconfig/sbd
SBD_BIN="/usr/sbin/sbd"
if [ -f $SBD_CONFIG ]; then
. $SBD_CONFIG
fi

[ -x "$exec" ] || exit 0

SBD_DEVS=${SBD_DEVICE%;}
SBD_DEVICE=${SBD_DEVS//;/ -d }

: ${SBD_DELAY_START:="no"}

StartSBD() {
test -x $SBD_BIN || return
if [ -n "$SBD_DEVICE" ]; then
if ! pidofproc $SBD_BIN >/dev/null 2>&1 ; then
echo -n "Starting SBD - "
if ! $SBD_BIN -d $SBD_DEVICE -D $SBD_OPTS watch ;
then
echo "SBD failed to start; aborting."
exit 1
fi
if env_is_true ${SBD_DELAY_START} ; then
sleep $(sbd -d "$SBD_DEVICE" dump | grep -m 1
ms
gwait | awk '{print $4}') 2>/dev/null
fi
fi
fi
}

StopSBD() {
test -x $SBD_BIN || return
if [ -n "$SBD_DEVICE" ]; then
echo -n "Stopping SBD - "
if ! $SBD_BIN -d $SBD_DEVICE -D $SBD_OPTS message LOCAL exit ;
t
hen
echo "SBD failed to stop; aborting."
exit 1
fi
fi
while pidofproc $SBD_BIN >/dev/null 2>&1 ; do
sleep 1
done
echo -n "done "
}

> 
> Thank you,
> 
> Laurent
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_sanlock and pacemaker

2015-08-26 Thread Andrew Beekhof

> On 27 Aug 2015, at 4:06 pm, Laurent B.  wrote:
> 
> Hello,
> 
>> You’d have to build it yourself, but sbd could be an option
>> 
> 
> do you have any clue on how to install it on redhat (6.5) ? I installed
> the gluster glue package and the sbd package (provided by OpenSUSE) but’

don’t do that.
grab the el7 one, no glue needed

some details on setting it up in 
https://bugzilla.redhat.com/show_bug.cgi?id=1221680

/me makes a note to document it properly one of these days

> now I'm stuck. The stonith resource

none needed

> creation give me an error saying
> that the sbd resource was not found.
> 
> Thank you,
> 
> Laurent
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: resource-stickiness

2015-08-26 Thread Ulrich Windl
>>> Andrew Beekhof  schrieb am 27.08.2015 um 00:20 in
Nachricht
:

>> On 26 Aug 2015, at 10:09 pm, Rakovec Jost  wrote:
>> 
>> Sorry  one typo: problem is the same
>> 
>> 
>> location cli-prefer-aapche aapche role=Started 10: sles2
> 
> Change the name of your constraint.
> The 'cli-prefer-’ prefix is reserved for “temporary” constraints
created by 
> the command line tools (which therefor feel entitled to delete them as 
> necessary).

In which ways is "cli-prefer-" handled specially, if I may ask...

> 
>> 
>> to:
>> 
>> location cli-prefer-aapche aapche role=Started inf: sles2 
>> 
>> 
>> It keep change to infinity. 
>> 
>> 
>> 
>> my configuration is:
>> 
>> node sles1 
>> node sles2 
>> primitive filesystem Filesystem \ 
>>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \

>>op start interval=0 timeout=60 \ 
>>op stop interval=0 timeout=60 \ 
>>op monitor interval=20 timeout=40 
>> primitive myip IPaddr2 \ 
>>params ip=x.x.x.x \ 
>>op start interval=0 timeout=20s \ 
>>op stop interval=0 timeout=20s \ 
>>op monitor interval=10s timeout=20s 
>> primitive stonith_sbd stonith:external/sbd \ 
>>params pcmk_delay_max=30 
>> primitive web apache \ 
>>params configfile="/etc/apache2/httpd.conf" \ 
>>op start interval=0 timeout=40s \ 
>>op stop interval=0 timeout=60s \ 
>>op monitor interval=10 timeout=20s 
>> group aapche filesystem myip web \ 
>>meta target-role=Started is-managed=true resource-stickiness=1000 
>> location cli-prefer-aapche aapche role=Started 10: sles2 
>> property cib-bootstrap-options: \ 
>>stonith-enabled=true \ 
>>no-quorum-policy=ignore \ 
>>placement-strategy=balanced \ 
>>expected-quorum-votes=2 \ 
>>dc-version=1.1.12-f47ea56 \ 
>>cluster-infrastructure="classic openais (with plugin)" \ 
>>last-lrm-refresh=1440502955 \ 
>>stonith-timeout=40s 
>> rsc_defaults rsc-options: \ 
>>resource-stickiness=1000 \ 
>>migration-threshold=3 
>> op_defaults op-options: \ 
>>timeout=600 \ 
>>record-pending=true 
>> 
>> 
>> 
>> and after migration:
>> 
>> 
>> node sles1 
>> node sles2 
>> primitive filesystem Filesystem \ 
>>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \

>>op start interval=0 timeout=60 \ 
>>op stop interval=0 timeout=60 \ 
>>op monitor interval=20 timeout=40 
>> primitive myip IPaddr2 \ 
>>params ip=10.9.131.86 \ 
>>op start interval=0 timeout=20s \ 
>>op stop interval=0 timeout=20s \ 
>>op monitor interval=10s timeout=20s 
>> primitive stonith_sbd stonith:external/sbd \ 
>>params pcmk_delay_max=30 
>> primitive web apache \ 
>>params configfile="/etc/apache2/httpd.conf" \ 
>>op start interval=0 timeout=40s \ 
>>op stop interval=0 timeout=60s \ 
>>op monitor interval=10 timeout=20s 
>> group aapche filesystem myip web \ 
>>meta target-role=Started is-managed=true resource-stickiness=1000 
>> location cli-prefer-aapche aapche role=Started inf: sles2 
>> property cib-bootstrap-options: \ 
>>stonith-enabled=true \ 
>>no-quorum-policy=ignore \ 
>>placement-strategy=balanced \ 
>>expected-quorum-votes=2 \ 
>>dc-version=1.1.12-f47ea56 \ 
>>cluster-infrastructure="classic openais (with plugin)" \ 
>>last-lrm-refresh=1440502955 \ 
>>stonith-timeout=40s 
>> rsc_defaults rsc-options: \ 
>>resource-stickiness=1000 \ 
>>migration-threshold=3 
>> op_defaults op-options: \ 
>>timeout=600 \ 
>>record-pending=true
>> 
>> 
>> From: Rakovec Jost
>> Sent: Wednesday, August 26, 2015 1:33 PM
>> To: users@clusterlabs.org 
>> Subject: resource-stickiness
>>  
>> Hi list,
>> 
>> 
>> I have configure simple cluster on sles 11 sp4 and have a problem with 
> “auto_failover off". The problem is that when ever I migrate resource
group 
> via HAWK my configuration change from:
>> 
>> location cli-prefer-aapche aapche role=Started 10: sles2
>> 
>> to:
>> 
>> location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1
>> 
>> 
>> It keep change to inf. 
>> 
>> 
>> and then after fance node, resource is moving back to original node which I

> don't want. How can I avoid this situation?
>> 
>> my configuration is:
>> 
>> node sles1 
>> node sles2 
>> primitive filesystem Filesystem \ 
>>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \

>>op start interval=0 timeout=60 \ 
>>op stop interval=0 timeout=60 \ 
>>op monitor interval=20 timeout=40 
>> primitive myip IPaddr2 \ 
>>params ip=x.x.x.x \ 
>>op start interval=0 timeout=20s \ 
>>op stop interval=0 timeout=20s \ 
>>op monitor interval=10s timeout=20s 
>> primitive stonith_sbd stonith:external/sbd \ 
>>params pcmk_dela

Re: [ClusterLabs] fence_sanlock and pacemaker

2015-08-26 Thread Laurent B.
Hello,

> You’d have to build it yourself, but sbd could be an option
>

do you have any clue on how to install it on redhat (6.5) ? I installed
the gluster glue package and the sbd package (provided by OpenSUSE) but
now I'm stuck. The stonith resource creation give me an error saying
that the sbd resource was not found.

Thank you,

Laurent


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [corosync] Question: Duration of DC election

2015-08-26 Thread Andrew Beekhof

> On 25 Aug 2015, at 7:46 pm, Stefan Wenk  wrote:
> 
> Hi,
> 
> I'm performing downtime measurement tests using corosync version 2.3.0 and 
> pacemaker version 1.1.12  under RHEL 6.5 MRG and although not recommended, I 
> tuned the corosync configuration settings to following insane values:
> 
># Timeout for token
>token: 60
>token_retransmits_before_loss_const: 1
> 
># How long to wait for join messages in the membership protocol (ms)
>join: 35
>consensus: 70
> 
> My two node cluster consists of a kamailio clone resource, which replicates 
> the so called userlocation state using DMQ on application level (see [1]). 
> The switchover performs the migration of a ocf:heartbeat:IPaddr2 resource. 
> With these settings, the service downtime is lower 100ms in case of a 
> controlled cluster switchover, when "/etc/init.d/pacemaker stop" and 
> "/etc/init.d/corosync stop" get executed. 
> 
> The service downtime is about 400ms when the power loss is simulated on the 
> active node, which does not execute the DC task. When I simulate power loss 
> on the active node, which is active and executes the DC task, the service 
> downtime increases to about 1500ms. As the timestamps in the logs are on 
> second resolution only, it is hard to provide more detailed numbers, but 
> apparently the DC election procedure takes more than 1000ms.
> 
> Are there any possibilities to tune the DC election process? Is there 
> documentation available what is happening in this situation?
> 
> Tests with more nodes in the cluster showed that the service downtime 
> increases with the number of online cluster nodes, even if the DC is executed 
> on one of the nodes, which remain active. 

When there is only 2 nodes, then there is effectively no election happening and 
the delay is made up of:
- corosync detection time
- time for the crmd to send a message to itself via corosync
- time for the policy engine to figure out where to put the service
- time for the start action of your service(s) to execute

There _should_ be an entry for “time to fence the peer”, but based on your 
reported times I’m assuming you’ve turned that off.

As the node count goes up, elections need to start happening for real (so you 
need to hear from everyone and have them all agree on a winner) but still it 
should be pretty quick.
The policy engine will take incrementally longer because it has more nodes to 
loop through, but that should be negligible on the scale that corosync can 
operate at.

I’d be interested to know what log messages you’re basing your timing numbers 
on.

> 
> I'm using one ring only. It looks as the usage of two rings do not change the 
> test results a lot.
> 
> Thank you,
> 
> Stefan
> 
> [1] http://kamailio.org/docs/modules/devel/modules/dmq.html
> 
> ___
> discuss mailing list
> disc...@corosync.org
> http://lists.corosync.org/mailman/listinfo/discuss


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource-stickiness

2015-08-26 Thread Andrew Beekhof

> On 26 Aug 2015, at 10:09 pm, Rakovec Jost  wrote:
> 
> Sorry  one typo: problem is the same
> 
> 
> location cli-prefer-aapche aapche role=Started 10: sles2

Change the name of your constraint.
The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by the 
command line tools (which therefor feel entitled to delete them as necessary).

> 
> to:
> 
> location cli-prefer-aapche aapche role=Started inf: sles2 
> 
> 
> It keep change to infinity. 
> 
> 
> 
> my configuration is:
> 
> node sles1 
> node sles2 
> primitive filesystem Filesystem \ 
>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ 
>op start interval=0 timeout=60 \ 
>op stop interval=0 timeout=60 \ 
>op monitor interval=20 timeout=40 
> primitive myip IPaddr2 \ 
>params ip=x.x.x.x \ 
>op start interval=0 timeout=20s \ 
>op stop interval=0 timeout=20s \ 
>op monitor interval=10s timeout=20s 
> primitive stonith_sbd stonith:external/sbd \ 
>params pcmk_delay_max=30 
> primitive web apache \ 
>params configfile="/etc/apache2/httpd.conf" \ 
>op start interval=0 timeout=40s \ 
>op stop interval=0 timeout=60s \ 
>op monitor interval=10 timeout=20s 
> group aapche filesystem myip web \ 
>meta target-role=Started is-managed=true resource-stickiness=1000 
> location cli-prefer-aapche aapche role=Started 10: sles2 
> property cib-bootstrap-options: \ 
>stonith-enabled=true \ 
>no-quorum-policy=ignore \ 
>placement-strategy=balanced \ 
>expected-quorum-votes=2 \ 
>dc-version=1.1.12-f47ea56 \ 
>cluster-infrastructure="classic openais (with plugin)" \ 
>last-lrm-refresh=1440502955 \ 
>stonith-timeout=40s 
> rsc_defaults rsc-options: \ 
>resource-stickiness=1000 \ 
>migration-threshold=3 
> op_defaults op-options: \ 
>timeout=600 \ 
>record-pending=true 
> 
> 
> 
> and after migration:
> 
> 
> node sles1 
> node sles2 
> primitive filesystem Filesystem \ 
>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ 
>op start interval=0 timeout=60 \ 
>op stop interval=0 timeout=60 \ 
>op monitor interval=20 timeout=40 
> primitive myip IPaddr2 \ 
>params ip=10.9.131.86 \ 
>op start interval=0 timeout=20s \ 
>op stop interval=0 timeout=20s \ 
>op monitor interval=10s timeout=20s 
> primitive stonith_sbd stonith:external/sbd \ 
>params pcmk_delay_max=30 
> primitive web apache \ 
>params configfile="/etc/apache2/httpd.conf" \ 
>op start interval=0 timeout=40s \ 
>op stop interval=0 timeout=60s \ 
>op monitor interval=10 timeout=20s 
> group aapche filesystem myip web \ 
>meta target-role=Started is-managed=true resource-stickiness=1000 
> location cli-prefer-aapche aapche role=Started inf: sles2 
> property cib-bootstrap-options: \ 
>stonith-enabled=true \ 
>no-quorum-policy=ignore \ 
>placement-strategy=balanced \ 
>expected-quorum-votes=2 \ 
>dc-version=1.1.12-f47ea56 \ 
>cluster-infrastructure="classic openais (with plugin)" \ 
>last-lrm-refresh=1440502955 \ 
>stonith-timeout=40s 
> rsc_defaults rsc-options: \ 
>resource-stickiness=1000 \ 
>migration-threshold=3 
> op_defaults op-options: \ 
>timeout=600 \ 
>record-pending=true
> 
> 
> From: Rakovec Jost
> Sent: Wednesday, August 26, 2015 1:33 PM
> To: users@clusterlabs.org
> Subject: resource-stickiness
>  
> Hi list,
> 
> 
> I have configure simple cluster on sles 11 sp4 and have a problem with 
> “auto_failover off". The problem is that when ever I migrate resource group 
> via HAWK my configuration change from:
> 
> location cli-prefer-aapche aapche role=Started 10: sles2
> 
> to:
> 
> location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1
> 
> 
> It keep change to inf. 
> 
> 
> and then after fance node, resource is moving back to original node which I 
> don't want. How can I avoid this situation?
> 
> my configuration is:
> 
> node sles1 
> node sles2 
> primitive filesystem Filesystem \ 
>params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \ 
>op start interval=0 timeout=60 \ 
>op stop interval=0 timeout=60 \ 
>op monitor interval=20 timeout=40 
> primitive myip IPaddr2 \ 
>params ip=x.x.x.x \ 
>op start interval=0 timeout=20s \ 
>op stop interval=0 timeout=20s \ 
>op monitor interval=10s timeout=20s 
> primitive stonith_sbd stonith:external/sbd \ 
>params pcmk_delay_max=30 
> primitive web apache \ 
>params configfile="/etc/apache2/httpd.conf" \ 
>op start interval=0 timeout=40s \ 
>op stop interval=0 timeout=60s \ 
>op monitor interval=10 timeout=20s 
> group aapche filesystem myip web \ 
>meta target-rol

Re: [ClusterLabs] fence_sanlock and pacemaker

2015-08-26 Thread Andrew Beekhof

> On 27 Aug 2015, at 4:11 am, Laurent B.  wrote:
> 
> Gents,
> 
> I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well
> except the fencing. The cluster's node are not connected to the
> management lan (where stand all the iLO/UPS/APC devices) and it's not
> planned to connecting them to this lan.
> 
> With these constraints, I figured out that a way to get fencing working
> is to use *fence_sanlock*. I followed this tutorial:
> https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some
> problem with SELinux that I finally disabled like specified in the
> following RHEL 6.5 release note:
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/
> )
> 
> So perfect. The problem is that fence_sanlock relies on cman and not
> pacemaker. So with stonith disabled, pacemaker restarts the resources
> without waiting for the victim to be fenced and with stonith enabled,
> pacemaker complains about the lack of stonith resources and block all
> the cluster.
> I tried to put fence_sanlock as a stonith resource at the pacemaker
> level but as explained there
> http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it
> does not work and as explained there
> https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to
> make it work.
> 
> My question: is there a workaround ?

You’d have to build it yourself, but sbd could be an option

> 
> Thank you,
> 
> Laurent
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] multiple drives looks like balancing but why and causing troubles

2015-08-26 Thread Digimer
On 26/08/15 02:46 PM, Streeter, Michelle N wrote:
> I have a two node cluster.  Both nodes are virtual and have five shared
> drives attached via sas controller.  For some reason, the cluster shows
> both nodes have half the drives started on them.   Not sure if this is
> called split brain or not.   It definitely looks load balancing.   But I
> did not set up load balancing.   On my client, I only see the data for
> the shares on the active cluster node.   But they should all be on the
> active cluster node.  Any suggestions as to why this is happening?  Is
> there a setting so that everything works on only one node at a time?

Can you explain what you mean by "shared drives"? Are these iSCSI LUNs
or direct connections to either port on SAS drives?

A split-brain is when either node things the other is dead and is
operating without coordinating with the peer. It is a disasterous
situation with shared storage and it is what fencing (stonith) prevents,
which you don't have configured.

If you are using KVM, use fence_virsh or fence_virt. If you're using
vmware, use fence_vmware. Please make this a priority before solving
your storage issue.

> pcs cluster status:
> 
> Cluster name: CNAS
> 
> Last updated: Wed Aug 26 13:35:47 2015
> 
> Last change: Wed Aug 26 13:28:55 2015
> 
> Stack: classic openais (with plugin)
> 
> Current DC: nas02 - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured, 2 expected votes
> 
> 11 Resources configured
> 
>  
> 
>  
> 
> Online: [ nas01 nas02 ]
> 
>  
> 
> Full list of resources:
> 
>  
> 
> NAS(ocf::heartbeat:IPaddr2):   Started nas01
> 
> Resource Group: datag
> 
>  datashare  (ocf::heartbeat:Filesystem):Started nas02
> 
>  dataserver (ocf::heartbeat:nfsserver): Started nas02
> 
> Resource Group: oomtlg
> 
>  oomtlshare (ocf::heartbeat:Filesystem):Started nas01
> 
>  oomtlserver(ocf::heartbeat:nfsserver): Started nas01
> 
> Resource Group: oomtrg
> 
>  oomtrshare (ocf::heartbeat:Filesystem):Started nas02
> 
>  oomtrserver(ocf::heartbeat:nfsserver): Started as02
> 
> Resource Group: oomblg
> 
>  oomblshare (ocf::heartbeat:Filesystem):Started nas01
> 
>  oomblserver(ocf::heartbeat:nfsserver): Started nas01
> 
> Resource Group: oombrg
> 
>  oombrshare (ocf::heartbeat:Filesystem):Started nas02
> 
>  oombrserver(ocf::heartbeat:nfsserver): Started nas02
> 
>  
> 
> pcs config show:
> 
> Cluster Name: CNAS
> 
> Corosync Nodes:
> 
> nas01 nas02
> 
> Pacemaker Nodes:
> 
> nas01 nas02
> 
>  
> 
> Resources:
> 
> Resource: NAS (class=ocf provider=heartbeat type=IPaddr2)
> 
>   Attributes: ip=192.168.56.110 cidr_netmask=24
> 
>   Operations: start interval=0s timeout=20s (NAS-start-timeout-20s)
> 
>   stop interval=0s timeout=20s (NAS-stop-timeout-20s)
> 
>   monitor interval=10s timeout=20s (NAS-monitor-interval-10s)
> 
> Group: datag
> 
>   Resource: datashare (class=ocf provider=heartbeat type=Filesystem)
> 
>Attributes: device=/dev/sdb1 directory=/data fstype=ext4
> 
>Operations: start interval=0s timeout=60 (datashare-start-timeout-60)
> 
>stop interval=0s timeout=60 (datashare-stop-timeout-60)
> 
>monitor interval=20 timeout=40
> (datashare-monitor-interval-20)
> 
>   Resource: dataserver (class=ocf provider=heartbeat type=nfsserver)
> 
>Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true
> 
>Operations: start interval=0s timeout=40 (dataserver-start-timeout-40)
> 
>stop interval=0s timeout=20s (dataserver-stop-timeout-20s)
> 
>monitor interval=10 timeout=20s
> (dataserver-monitor-interval-10)
> 
> Group: oomtlg
> 
>   Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem)
> 
>Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4
> 
>Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60)
> 
>stop interval=0s timeout=60 (oomtlshare-stop-timeout-60)
> 
>monitor interval=20 timeout=40
> (oomtlshare-monitor-interval-20)
> 
>   Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver)
> 
>Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true
> 
>Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40)
> 
>stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s)
> 
>monitor interval=10 timeout=20s
> (oomtlserver-monitor-interval-10)
> 
> Group: oomtrg
> 
>   Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem)
> 
>Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4
> 
>Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60)
> 
>stop interval=0s timeout=60 (oomtrshare-stop-timeout-60)
> 
>monitor interval=20 timeout=40
> (oomtrshare-monitor-interval-20)
> 
>   Resource: oomtrserver (class=ocf provider=heartb

[ClusterLabs] multiple drives looks like balancing but why and causing troubles

2015-08-26 Thread Streeter, Michelle N
I have a two node cluster.  Both nodes are virtual and have five shared drives 
attached via sas controller.  For some reason, the cluster shows both nodes 
have half the drives started on them.   Not sure if this is called split brain 
or not.   It definitely looks load balancing.   But I did not set up load 
balancing.   On my client, I only see the data for the shares on the active 
cluster node.   But they should all be on the active cluster node.  Any 
suggestions as to why this is happening?  Is there a setting so that everything 
works on only one node at a time?

pcs cluster status:
Cluster name: CNAS
Last updated: Wed Aug 26 13:35:47 2015
Last change: Wed Aug 26 13:28:55 2015
Stack: classic openais (with plugin)
Current DC: nas02 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
11 Resources configured


Online: [ nas01 nas02 ]

Full list of resources:

NAS(ocf::heartbeat:IPaddr2):   Started nas01
Resource Group: datag
 datashare  (ocf::heartbeat:Filesystem):Started nas02
 dataserver (ocf::heartbeat:nfsserver): Started nas02
Resource Group: oomtlg
 oomtlshare (ocf::heartbeat:Filesystem):Started nas01
 oomtlserver(ocf::heartbeat:nfsserver): Started nas01
Resource Group: oomtrg
 oomtrshare (ocf::heartbeat:Filesystem):Started nas02
 oomtrserver(ocf::heartbeat:nfsserver): Started as02
Resource Group: oomblg
 oomblshare (ocf::heartbeat:Filesystem):Started nas01
 oomblserver(ocf::heartbeat:nfsserver): Started nas01
Resource Group: oombrg
 oombrshare (ocf::heartbeat:Filesystem):Started nas02
 oombrserver(ocf::heartbeat:nfsserver): Started nas02

pcs config show:
Cluster Name: CNAS
Corosync Nodes:
nas01 nas02
Pacemaker Nodes:
nas01 nas02

Resources:
Resource: NAS (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.56.110 cidr_netmask=24
  Operations: start interval=0s timeout=20s (NAS-start-timeout-20s)
  stop interval=0s timeout=20s (NAS-stop-timeout-20s)
  monitor interval=10s timeout=20s (NAS-monitor-interval-10s)
Group: datag
  Resource: datashare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sdb1 directory=/data fstype=ext4
   Operations: start interval=0s timeout=60 (datashare-start-timeout-60)
   stop interval=0s timeout=60 (datashare-stop-timeout-60)
   monitor interval=20 timeout=40 (datashare-monitor-interval-20)
  Resource: dataserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (dataserver-start-timeout-40)
   stop interval=0s timeout=20s (dataserver-stop-timeout-20s)
   monitor interval=10 timeout=20s (dataserver-monitor-interval-10)
Group: oomtlg
  Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4
   Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60)
   stop interval=0s timeout=60 (oomtlshare-stop-timeout-60)
   monitor interval=20 timeout=40 (oomtlshare-monitor-interval-20)
  Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40)
   stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s)
   monitor interval=10 timeout=20s (oomtlserver-monitor-interval-10)
Group: oomtrg
  Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4
   Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60)
   stop interval=0s timeout=60 (oomtrshare-stop-timeout-60)
   monitor interval=20 timeout=40 (oomtrshare-monitor-interval-20)
  Resource: oomtrserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/oomtr/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (oomtrserver-start-timeout-40)
   stop interval=0s timeout=20s (oomtrserver-stop-timeout-20s)
   monitor interval=10 timeout=20s (oomtrserver-monitor-interval-10)
Group: oomblg
  Resource: oomblshare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sde1 directory=/oombl fstype=ext4
   Operations: start interval=0s timeout=60 (oomblshare-start-timeout-60)
   stop interval=0s timeout=60 (oomblshare-stop-timeout-60)
   monitor interval=20 timeout=40 (oomblshare-monitor-interval-20)
  Resource: oomblserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/oombl/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (oomblserver-start-timeout-40)
   stop interval=0s timeout=20s (oomblserver-stop-

[ClusterLabs] fence_sanlock and pacemaker

2015-08-26 Thread Laurent B.
Gents,

I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well
except the fencing. The cluster's node are not connected to the
management lan (where stand all the iLO/UPS/APC devices) and it's not
planned to connecting them to this lan.

With these constraints, I figured out that a way to get fencing working
is to use *fence_sanlock*. I followed this tutorial:
https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some
problem with SELinux that I finally disabled like specified in the
following RHEL 6.5 release note:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/
)

So perfect. The problem is that fence_sanlock relies on cman and not
pacemaker. So with stonith disabled, pacemaker restarts the resources
without waiting for the victim to be fenced and with stonith enabled,
pacemaker complains about the lack of stonith resources and block all
the cluster.
I tried to put fence_sanlock as a stonith resource at the pacemaker
level but as explained there
http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it
does not work and as explained there
https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to
make it work.

My question: is there a workaround ?

Thank you,

Laurent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: NFS exports

2015-08-26 Thread Ulrich Windl
>>> "Streeter, Michelle N"  schrieb am 
>>> 26.08.2015
um 15:42 in Nachricht
<9a18847a77a9a14da7e0fd240efcafc2504...@xch-phx-501.sw.nos.boeing.com>:
> I have been using linux /etc/exports to put my exports for my cluster and it 
> works fine this way as long as every node has this done.
> 
> I tried to add the exportfs resource but this keeps failing.

Did you use fully qualified names?

> 
> Is it preferred that we use /etc/exports or the exportfs for pacemaker?
> 
> Michelle Streeter
> ASC2 MCS - SDE/ACL/SDL/EDL OKC Software Engineer
> The Boeing Company





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] NFS exports

2015-08-26 Thread Streeter, Michelle N
I have been using linux /etc/exports to put my exports for my cluster and it 
works fine this way as long as every node has this done.

I tried to add the exportfs resource but this keeps failing.

Is it preferred that we use /etc/exports or the exportfs for pacemaker?

Michelle Streeter
ASC2 MCS - SDE/ACL/SDL/EDL OKC Software Engineer
The Boeing Company

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource-stickiness

2015-08-26 Thread Rakovec Jost
Sorry  one typo: problem is the same



location cli-prefer-aapche aapche role=Started 10: sles2

to:

location cli-prefer-aapche aapche role=Started inf: sles2



It keep change to infinity.




my configuration is:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=x.x.x.x \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile="/etc/apache2/httpd.conf" \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure="classic openais (with plugin)" \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true



and after migration:


node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=10.9.131.86 \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile="/etc/apache2/httpd.conf" \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started inf: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure="classic openais (with plugin)" \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true




From: Rakovec Jost
Sent: Wednesday, August 26, 2015 1:33 PM
To: users@clusterlabs.org
Subject: resource-stickiness


Hi list,



I have configure simple cluster on sles 11 sp4 and have a problem with 
“auto_failover off". The problem is that when ever I migrate resource group via 
HAWK my configuration change from:


location cli-prefer-aapche aapche role=Started 10: sles2

to:

location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1


It keep change to inf.


and then after fance node, resource is moving back to original node which I 
don't want. How can I avoid this situation?

my configuration is:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=x.x.x.x \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile="/etc/apache2/httpd.conf" \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure="classic openais (with plugin)" \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true

[ClusterLabs] resource-stickiness

2015-08-26 Thread Rakovec Jost
Hi list,



I have configure simple cluster on sles 11 sp4 and have a problem with 
"auto_failover off". The problem is that when ever I migrate resource group via 
HAWK my configuration change from:


location cli-prefer-aapche aapche role=Started 10: sles2

to:

location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1


It keep change to inf.


and then after fance node, resource is moving back to original node which I 
don't want. How can I avoid this situation?

my configuration is:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=x.x.x.x \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile="/etc/apache2/httpd.conf" \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure="classic openais (with plugin)" \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true



and after migration:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory="/srv/www/vhosts" device="/dev/xvdd1" \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=10.9.131.86 \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile="/etc/apache2/httpd.conf" \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure="classic openais (with plugin)" \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true




thanks

Best Regards

Jost







___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync GitHub vs. dev list

2015-08-26 Thread Ferenc Wagner
Jan Friesse  writes:

>> Since Corosync is hosted on GitHub, I wonder if it's enough to submit
>> pull requests/issues/patch comments there to get the developers
>
> Yes, gh is enough.

Thanks for the clarification and the quick action!
-- 
Regards,
Feri.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync GitHub vs. dev list

2015-08-26 Thread Jan Friesse

Ferenc,



Hi,

Since Corosync is hosted on GitHub, I wonder if it's enough to submit
pull requests/issues/patch comments there to get the developers


Yes, gh is enough.

Regards,
  Honza



attention, or should I also post to develop...@clusterlabs.org?




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org