from:"emmanuel segura"

Re: [ClusterLabs] 答复: the PAF switchover does not happen if the VIP resource is stopped

2018-04-26 Thread emmanuel segura

But I think using ifdown isn't the correct way to test the cluster, this
topic was discussed many times

2018-04-26 9:53 GMT+02:00 范国腾 :

> 1. There is no failure in initial status. sds1 is master
>
>
>
> 2. ifdown the sds1 VIP network card.
>
> 3. ifup the sds1 VIP network card and then ifdown sds2 VIP network card
>
>
>
>
>
> -邮件原件-
> 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
> 发送时间: 2018年4月26日 15:07
> 收件人: 范国腾 
> 抄送: Cluster Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>; 李梦怡 
> 主题: Re: [ClusterLabs] the PAF switchover does not happen if the VIP
> resource is stopped
>
>
>
> On Thu, 26 Apr 2018 02:53:33 +
>
> 范国腾  wrote:
>
>
>
> > Hi Rorthais，
>
> >
>
> > Thank you for your help.
>
> >
>
> > The replication works at that time.
>
> >
>
> > I try again today.
>
> > (1) If I run "ifup enp0s3" in node2, then run "ifdown enp0s3" in
>
> > node1, the switchover issue could be reproduced. (2) But if I run
>
> > "ifup enp0s3" in node2, run "pcs resource cleanup mastergroup" to
>
> > clean the VIP resource, and there is no Failed Actions in "pcs
>
> > status", then run "ifdown enp0s3" in node1, it works. The switchover
> could happened again.
>
> >
>
> >
>
> > Is there any parameter to control this behaviors so that I don't need
>
> > to execute the "pcs cleanup" command every time?
>
>
>
> Check the failcounts for each resource on each nodes (pcs resource
> failcount [...]).
>
> Check the scores as well (crm_simulate -sL).
>
>
>
> >
>
> > -邮件原件-
>
> > 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com
> ]
>
> > 发送时间: 2018年4月25日 18:39
>
> > 收件人: 范国腾 
>
> > 抄送: Cluster Labs - All topics related to open-source clustering
>
> > welcomed ; 李梦怡  主题: Re:
>
> > [ClusterLabs] the PAF switchover does not happen if the VIP resource
>
> > is stopped
>
> >
>
> >
>
> > On Wed, 25 Apr 2018 08:58:34 +
>
> > 范国腾  wrote:
>
> >
>
> > >
>
> > > Our lab has two resource: (1) PAF (master/slave)(2) VIP (bind to
> the
>
> > > master PAF node). The configuration is in the attachment.
>
> > >
>
> > > Each node has two network card: One(enp0s8) is for the pacemaker
>
> > > heartbeat in internal network, the other(enp0s3) is for the master
>
> > > VIP in the external network.
>
> > >
>
> > >
>
> > >
>
> > > We are testing the following case: if the master VIP network card is
>
> > > down, the master postgres and VIP could switch to another node.
>
> > >
>
> > >
>
> > >
>
> > > 1. At first, node2 is master, I run "ifdown enp0s3" in node2, then
>
> > > node1 become the master, that is ok.
>
> > >
>
> > > 2. Then I run "ifup enp0s3" in node2, wait for 60 seconds,
>
> >
>
> > Did you check PostgreSQL instances were replicating again?
>
> >
>
> > > then run "ifdown enp0s3" in node1, but the node1 still be master.
>
> > > Why does switchover doesn't happened? How to recover to make system
> work?
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] the PAF switchover does not happen if the VIP resource is stopped

2018-04-25 Thread emmanuel segura

https://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019224.html

2018-04-25 10:58 GMT+02:00 范国腾 :

> Hi,
>
>
>
> Our lab has two resource: (1) PAF (master/slave)(2) VIP (bind to the
> master PAF node). The configuration is in the attachment.
>
> Each node has two network card: One(enp0s8) is for the pacemaker heartbeat
> in internal network, the other(enp0s3) is for the master VIP in the
> external network.
>
>
>
> We are testing the following case: if the master VIP network card is down,
> the master postgres and VIP could switch to another node.
>
>
>
> 1. At first, node2 is master, I run "ifdown enp0s3" in node2, then node1
> become the master, that is ok.
>
>
>
>
>
>
>
> 2. Then I run "ifup enp0s3" in node2, wait for 60 seconds, then run
> "ifdown enp0s3" in node1, but the node1 still be master. Why does
> switchover doesn't happened? How to recover to make system work?
>
>
>
> The log is in the attachment. Node1 reports the following waring:
>
>
>
> Apr 25 04:49:27 node1 crmd[24678]:  notice: State transition S_IDLE ->
> S_POLICY_ENGINE
>
> Apr 25 04:49:27 node1 pengine[24677]: warning: Processing failed op start
> for master-vip on sds2: unknown error (1)
>
> Apr 25 04:49:27 node1 pengine[24677]: warning: Processing failed op start
> for master-vip on sds1: unknown error (1)
>
> Apr 25 04:49:27 node1 pengine[24677]: warning: Forcing master-vip away
> from sds1 after 100 failures (max=100)
>
> Apr 25 04:49:27 node1 pengine[24677]: warning: Forcing master-vip away
> from sds2 after 100 failures (max=100)
>
> Apr 25 04:49:27 node1 pengine[24677]:  notice: Calculated transition 14,
> saving inputs in /var/lib/pacemaker/pengine/pe-input-59.bz2
>
> Apr 25 04:49:27 node1 crmd[24678]:  notice: Transition 14 (Complete=0,
> Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-59.bz2):
> Complete
>
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

2018-04-13 Thread emmanuel segura

the first thing that you need to configure is the stonith, because you have
this constraint "constraint order promote DrbdResClone then start HALVM"

To recover and promote drbd to master when you crash a node, configurare
the drbd fencing handler.

pacemaker execute monitor in both nodes, so this is normal, to test why
monitor fail, use ocf-tester

2018-04-13 15:29 GMT+02:00 Marco Marino :

> Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM
> (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I
> decided to write this long post. I need to really understand what I'm doing
> and where I'm doing wrong.
> More precisely, I'm configuring a pacemaker cluster with 2 nodes and only
> one drbd resource. Here all operations:
>
> - System configuration
> hostnamectl set-hostname pcmk[12]
> yum update -y
> yum install vim wget git -y
> vim /etc/sysconfig/selinux  -> permissive mode
> systemctl disable firewalld
> reboot
>
> - Network configuration
> [pcmk1]
> nmcli connection modify corosync ipv4.method manual ipv4.addresses
> 192.168.198.201/24 ipv6.method ignore connection.autoconnect yes
> nmcli connection modify replication ipv4.method manual ipv4.addresses
> 192.168.199.201/24 ipv6.method ignore connection.autoconnect yes
> [pcmk2]
> nmcli connection modify corosync ipv4.method manual ipv4.addresses
> 192.168.198.202/24 ipv6.method ignore connection.autoconnect yes
> nmcli connection modify replication ipv4.method manual ipv4.addresses
> 192.168.199.202/24 ipv6.method ignore connection.autoconnect yes
>
> ssh-keyget -t rsa
> ssh-copy-id root@pcmk[12]
> scp /etc/hosts root@pcmk2:/etc/hosts
>
> - Drbd Repo configuration and drbd installation
> rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
> rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.
> noarch.rpm
> yum update -y
> yum install drbd84-utils kmod-drbd84 -y
>
> - Drbd Configuration:
> Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type
> "Linux" (83)
> [/etc/drbd.d/global_common.conf]
> usage-count no;
> [/etc/drbd.d/myres.res]
> resource myres {
> on pcmk1 {
> device /dev/drbd0;
> disk /dev/vdb1;
> address 192.168.199.201:7789;
> meta-disk internal;
> }
> on pcmk2 {
> device /dev/drbd0;
> disk /dev/vdb1;
> address 192.168.199.202:7789;
> meta-disk internal;
> }
> }
>
> scp /etc/drbd.d/myres.res root@pcmk2:/etc/drbd.d/myres.res
> systemctl start drbd <-- only for test. The service is disabled at
> boot!
> drbdadm create-md myres
> drbdadm up myres
> drbdadm primary --force myres
>
> - LVM Configuration
> [root@pcmk1 ~]# lsblk
> NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
> sr0  11:01 1024M  0 rom
> vda 252:00   20G  0 disk
> ├─vda1  252:101G  0 part /boot
> └─vda2  252:20   19G  0 part
>   ├─cl-root 253:00   17G  0 lvm  /
>   └─cl-swap 253:102G  0 lvm  [SWAP]
> vdb 252:16   08G  0 disk
> └─vdb1  252:17   08G  0 part  <--- /dev/vdb1 is the partition
> I'd like to use as backing device for drbd
>   └─drbd0   147:008G  0 disk
>
> [/etc/lvm/lvm.conf]
> write_cache_state = 0
> use_lvmetad = 0
> filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ]
>
> Disabling lvmetad service
> systemctl disable lvm2-lvmetad.service
> systemctl disable lvm2-lvmetad.socket
> reboot
>
> - Creating volume group and logical volume
> systemctl start drbd (both nodes)
> drbdadm primary myres
> pvcreate /dev/drbd0
> vgcreate havolumegroup /dev/drbd0
> lvcreate -n c-vol1 -L1G havolumegroup
> [root@pcmk1 ~]# lvs
> LV VGAttr   LSize   Pool Origin Data%  Meta%
> Move Log Cpy%Sync Convert
> root   cl-wi-ao <17.00g
>
> swap   cl-wi-ao   2.00g
>
> c-vol1 havolumegroup -wi-a-   1.00g
>
>
> - Cluster Configuration
> yum install pcs fence-agents-all -y
> systemctl enable pcsd
> systemctl start pcsd
> echo redhat | passwd --stdin hacluster
> pcs cluster auth pcmk1 pcmk2
> pcs cluster setup --name ha_cluster pcmk1 pcmk2
> pcs cluster start --all
> pcs cluster enable --all
> pcs property set stonith-enabled=false<--- Just for test!!!
> pcs property set no-quorum-policy=ignore
>
> - Drbd resource configuration
> pcs cluster cib drbd_cfg
> pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd
> drbd_resource=myres op monitor interval=60s
> pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1
> master-node-max=1 clone-max=2 clone-node-max=1 notify=true
> [root@pcmk1 ~]# pcs -f drbd_cfg

Re: [ClusterLabs] custom resource agent FAILED (blocked)

2018-04-12 Thread emmanuel segura

the start function, need to start the resource when monitor doesn't return
success

2018-04-12 23:38 GMT+02:00 Bishoy Mikhael :

> Hi All,
>
> I'm trying to create a resource agent to promote a standby HDFS namenode
> to active when the virtual IP failover to another node.
>
> I've taken the skeleton from the Dummy OCF agent.
>
> The modifications I've done to the Dummy agent are as follows:
>
> HDFSHA_start() {
> HDFSHA_monitor
> if [ $? =  $OCF_SUCCESS ]; then
> /opt/hadoop/sbin/hdfs-ha.sh start
> return $OCF_SUCCESS
> fi
> }
>
> HDFSHA_stop() {
> HDFSHA_monitor
> if [ $? =  $OCF_SUCCESS ]; then
> /opt/hadoop/sbin/hdfs-ha.sh stop
> fi
> return $OCF_SUCCESS
> }
>
> HDFSHA_monitor() {
> # Monitor _MUST!_ differentiate correctly between running
> # (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING).
> # That is THREE states, not just yes/no.
> active_nn=$(hdfs haadmin -getAllServiceState | grep active | cut -d":" -f
> 1)
> current_node=$(uname -n)
> if [[ ${active_nn} == ${current_node} ]]; then
>return $OCF_SUCCESS
> fi
> }
>
> HDFSHA_validate() {
>
> return $OCF_SUCCESS
> }
>
>
> I've created the resource as follows:
>
> # pcs resource create hdfs-ha ocf:heartbeat:HDFSHA op monitor interval=30s
>
>
> The resource fails right away as follows:
>
>
> # pcs status
>
> Cluster name: hdfs_cluster
>
> Stack: corosync
>
> Current DC: taulog (version 1.1.16-12.el7_4.8-94ff4df) - partition with
> quorum
>
> Last updated: Thu Apr 12 03:30:57 2018
>
> Last change: Thu Apr 12 03:30:54 2018 by root via cibadmin on lingcod
>
>
> 3 nodes configured
>
> 2 resources configured
>
>
> Online: [ dentex lingcod taulog ]
>
>
> Full list of resources:
>
>
>  VirtualIP (ocf::heartbeat:IPaddr2): Started taulog
>
>  hdfs-ha (ocf::heartbeat:HDFSHA): FAILED (blocked)[ taulog dentex ]
>
>
> Failed Actions:
>
> * hdfs-ha_stop_0 on taulog 'insufficient privileges' (4): call=12,
> status=complete, exitreason='none',
>
> last-rc-change='Thu Apr 12 03:17:37 2018', queued=0ms, exec=1ms
>
> * hdfs-ha_stop_0 on dentex 'insufficient privileges' (4): call=10,
> status=complete, exitreason='none',
>
> last-rc-change='Thu Apr 12 03:17:43 2018', queued=0ms, exec=1ms
>
>
>
> Daemon Status:
>
>   corosync: active/enabled
>
>   pacemaker: active/enabled
>
>   pcsd: active/enabled
>
> I debug the resource as follows, and it returns 0
>
> # pcs resource debug-monitor hdfs-ha
>
> Operation monitor for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
>  >  stderr: DEBUG: hdfs-ha monitor : 0
>
>
> # pcs resource debug-stop hdfs-ha
>
> Operation stop for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
>  >  stderr: DEBUG: hdfs-ha stop : 0
>
>
> # pcs resource debug-start hdfs-ha
>
> Operation start for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0
>
>  >  stderr: DEBUG: hdfs-ha start : 0
>
>
>
> I don't understand what am I doing wrong!
>
>
> Regards,
>
> Bishoy Mikhael
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemaker pingd with ms drbd = double masters short time when disconnected networks.

2017-12-19 Thread emmanuel segura

You need to configure the stonith and drbd stonith handler

2017-12-19 8:19 GMT+01:00 Прокопов Павел :

> Hello!
>
> pacemaker pingd with ms drbd = double masters short time when disconnected
> networks.
>
> My crm config:
>
> node 168885811: pp-pacemaker1.heliosoft.ru
> node 168885812: pp-pacemaker2.heliosoft.ru
> primitive drbd1 ocf:linbit:drbd \
> params drbd_resource=drbd1 \
> op monitor interval=60s \
> op start interval=15 timeout=240s \
> op stop interval=15 timeout=240s \
> op monitor role=Master interval=30s \
> op monitor role=Slave interval=60s
> primitive fs_drbd1 Filesystem \
> params device="/dev/drbd1" directory="/mnt/drbd1" fstype=ext4
> options=noatime
> primitive pinger ocf:pacemaker:ping \
> params host_list=10.16.4.1 multiplier=100 \
> op monitor interval=15s \
> op start interval=0 timeout=5s \
> op stop interval=0
> primitive vip IPaddr2 \
> params ip=10.16.5.227 nic=eth0 \
> op monitor interval=10s
> primitive vip2 IPaddr2 \
> params ip=10.16.254.50 nic=eth1 \
> op monitor interval=10s
> group group_master fs_drbd1 vip vip2
> ms ms_drbd1 drbd1 \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> clone pingerclone pinger \
> meta globally-unique=false
> colocation colocation_master inf: ms_drbd1:Master group_master
> location location_master_ms_drbd1 ms_drbd1 \
> rule $role=Master -inf: not_defined pingd or pingd lte 0
> order main_order Mandatory: pingerclone:start ms_drbd1:promote
> group_master:start
> property cib-bootstrap-options: \
> stonith-enabled=false \
> no-quorum-policy=ignore \
> default-resource-stickiness=500 \
> cluster-name=pp1
>
> root@pp-pacemaker2:~# crm_mon -1
> Stack: corosync
> Current DC: pp-pacemaker2.heliosoft.ru (version 1.1.16-94ff4df) -
> partition with quorum
> Last updated: Fri Dec 15 13:48:10 2017
> Last change: Fri Dec 15 13:46:38 2017 by root via cibadmin on
> pp-pacemaker1.heliosoft.ru
>
> 2 nodes configured
> 7 resources configured
>
> Online: [ pp-pacemaker1.heliosoft.ru pp-pacemaker2.heliosoft.ru ]
>
> Active resources:
>
>  Resource Group: group_master
>  fs_drbd1(ocf::heartbeat:Filesystem):Started
> pp-pacemaker1.heliosoft.ru
>  vip(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker1.heliosoft.ru
>  vip2(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker1.heliosoft.ru
>  Master/Slave Set: ms_drbd1 [drbd1]
>  Masters: [ pp-pacemaker1.heliosoft.ru ]
>  Slaves: [ pp-pacemaker2.heliosoft.ru ]
>  Clone Set: pingerclone [pinger]
>  Started: [ pp-pacemaker1.heliosoft.ru pp-pacemaker2.heliosoft.ru ]
> #end crm_mon
>
> When I disconnect pp-pacemaker2 from all networks, I have:
> root@pp-pacemaker2:~# crm_mon -1
> Stack: corosync
> Current DC: pp-pacemaker2.heliosoft.ru (version 1.1.16-94ff4df) -
> partition with quorum
> Last updated: Fri Dec 15 13:53:15 2017
> Last change: Fri Dec 15 13:53:00 2017 by root via cibadmin on
> pp-pacemaker2.heliosoft.ru
>
> 2 nodes configured
> 7 resources configured
>
> Online: [ pp-pacemaker2.heliosoft.ru]
> OFFLINE: [pp-pacemaker1.heliosoft.ru ]
>
> Active resources:
>
>  Resource Group: group_master
>  fs_drbd1(ocf::heartbeat:Filesystem):Started
> pp-pacemaker2.heliosoft.ru
>  vip(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker2.heliosoft.ru
>  vip2(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker2.heliosoft.ru
>  Master/Slave Set: ms_drbd1 [drbd1]
>  Masters: [ pp-pacemaker2.heliosoft.ru ]
>  Clone Set: pingerclone [pinger]
>  Started: [ pp-pacemaker2.heliosoft.ru ]
> #end crm_mon
>
> Wait 5 seconds.
>
> root@pp-pacemaker2:~# crm_mon -1
> Stack: corosync
> Current DC: pp-pacemaker2.heliosoft.ru (version 1.1.16-94ff4df) -
> partition with quorum
> Last updated: Fri Dec 15 13:48:10 2017
> Last change: Fri Dec 15 13:46:38 2017 by root via cibadmin on
> pp-pacemaker1.heliosoft.ru
>
> 2 nodes configured
> 7 resources configured
>
> Online: [ pp-pacemaker2.heliosoft.ru
> OFFLINE: [pp-pacemaker1.heliosoft.ru ]
>
> Active resources:
>
>  Master/Slave Set: ms_drbd1 [drbd1]
>  Slaves: [ pp-pacemaker2.heliosoft.ru ]
>  Clone Set: pingerclone [pinger]
>  Started: [ pp-pacemaker2.heliosoft.ru ]
> #end crm_mon
>
> Why pp-pacemaker2 first become a master? It breaks drdb.
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-16 Thread emmanuel segura

I put a node in maintenance mode?

do you mean you put the cluster in maintenance mode

2017-10-16 19:24 GMT+02:00 Lentes, Bernd :

> Hi,
>
> i have the following behavior: I put a node in maintenance mode,
> afterwards stop corosync on that node with /etc/init.d/openais stop.
> This node is immediately fenced. Is that expected behavior ? I thought
> putting a node into maintenance does mean the cluster does not care anymore
> about that node.
>
> OS on my nodes is SLES 11 SP4.
>
> Thanks.
>
>
> Bernd
>
> --
> Bernd Lentes
>
> Systemadministration
> institute of developmental genetics
> Gebäude 35.34 - Raum 208
> HelmholtzZentrum München
> bernd.len...@helmholtz-muenchen.de
> phone: +49 (0)89 3187 1241
> fax: +49 (0)89 3187 2294
>
> no backup - no mercy
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] IP clone issue

2017-09-05 Thread emmanuel segura

If you have two copy of the clone in the same, it cannot work, because is
like to have a dupplicate ip in the same node, because you are using
clone-node-max="2"

2017-09-05 16:15 GMT+02:00 Octavian Ciobanu <coctavian1...@gmail.com>:

> Based on ocf:heartbeat:IPaddr2 man page it can be used without an static
> IP address if the kernel has net.ipv4.conf.all.promote_secondaries=1.
>
> "There must be at least one static IP address, which is not managed by the
> cluster, assigned to the network interface. If you can not assign any
> static IP address on the interface, modify this kernel parameter: sysctl -w
> net.ipv4.conf.all.promote_secondaries=1 (or per device)"
>
> This kernel parameter is set by default in CentOS 7.3.
>
> With clone-node-max="1" it works as it should be but with
> clone-node-max="2" both instances of VIP are started on the same node even
> if the other node is online.
>
> Pacemaker 1.1 Cluster from Scratch say that
> "clone-node-max=2 says that one node can run up to 2 instances of the
> clone. This should also equal the number of nodes that can host the IP, so
> that if any node goes down, another node can take over the failed node’s
> "request bucket". Otherwise, requests intended for the failed node would be
> discarded."
>
> To have this functionality do I must have a static IP set on the
> interfaces ?
>
>
>
> On Tue, Sep 5, 2017 at 4:54 PM, emmanuel segura <emi2f...@gmail.com>
> wrote:
>
>> I never tried to set an virtual ip in one interfaces without ip, because
>> the vip is a secondary ip that switch between nodes, not primary ip
>>
>> 2017-09-05 15:41 GMT+02:00 Octavian Ciobanu <coctavian1...@gmail.com>:
>>
>>> Hello all,
>>>
>>> I've encountered an issue with IP cloning.
>>>
>>> Based the "Pacemaker 1.1 Clusters from Scratch" I've configured a test
>>> configuration with 2 nodes based on CentOS 7.3. The nodes have 2 Ethernet
>>> cards one for cluster communication with private IP network and second for
>>> public access to services. The public Ethernet has no IP assigned at boot.
>>>
>>> I've created an IP resource with clone using the following command
>>>
>>> pcs resource create ClusterIP ocf:heartbeat:IPaddr2 params nic="ens192"
>>> ip="xxx.yyy.zzz.www" cidr_netmask="24" clusterip_hash="sourceip" op start
>>> interval="0" timeout="20" op stop interval="0" timeout="20" op monitor
>>> interval="10" timeout="20" meta resource-stickiness=0 clone meta
>>> clone-max="2" clone-node-max="2" interleave="true" globally-unique="true"
>>>
>>> The xxx.yyy.zzz.www is public IP not a private one.
>>>
>>> With the above command the IP clone is created but it is started only on
>>> one node. This is the output of pcs status command
>>>
>>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>  ClusterIP:0(ocf::heartbeat:IPaddr2):Started node02
>>>  ClusterIP:1(ocf::heartbeat:IPaddr2):Started node02
>>>
>>> If I modify the clone-node-max to 1 then the resource is started on both
>>> nodes as seen in this pcs status output:
>>>
>>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>  ClusterIP:0(ocf::heartbeat:IPaddr2):Started node02
>>>  ClusterIP:1(ocf::heartbeat:IPaddr2):Started node01
>>>
>>> But if one node fails the IP resource is not migrated to active node as
>>> is said in documentation.
>>>
>>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>  ClusterIP:0(ocf::heartbeat:IPaddr2):Started node02
>>>  ClusterIP:1(ocf::heartbeat:IPaddr2):Stopped
>>>
>>> When the IP is active on both nodes the services are accessible so there
>>> is not an issue with the fact that the interface dose not have an IP
>>> allocated at boot. The gateway is set with another pcs command and it is
>>> working.
>>>
>>> Thank in advance for any info.
>>>
>>> Best regards
>>> Octavian Ciobanu
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: htt

Re: [ClusterLabs] IP clone issue

2017-09-05 Thread emmanuel segura

I never tried to set an virtual ip in one interfaces without ip, because
the vip is a secondary ip that switch between nodes, not primary ip

2017-09-05 15:41 GMT+02:00 Octavian Ciobanu :

> Hello all,
>
> I've encountered an issue with IP cloning.
>
> Based the "Pacemaker 1.1 Clusters from Scratch" I've configured a test
> configuration with 2 nodes based on CentOS 7.3. The nodes have 2 Ethernet
> cards one for cluster communication with private IP network and second for
> public access to services. The public Ethernet has no IP assigned at boot.
>
> I've created an IP resource with clone using the following command
>
> pcs resource create ClusterIP ocf:heartbeat:IPaddr2 params nic="ens192"
> ip="xxx.yyy.zzz.www" cidr_netmask="24" clusterip_hash="sourceip" op start
> interval="0" timeout="20" op stop interval="0" timeout="20" op monitor
> interval="10" timeout="20" meta resource-stickiness=0 clone meta
> clone-max="2" clone-node-max="2" interleave="true" globally-unique="true"
>
> The xxx.yyy.zzz.www is public IP not a private one.
>
> With the above command the IP clone is created but it is started only on
> one node. This is the output of pcs status command
>
> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>  ClusterIP:0(ocf::heartbeat:IPaddr2):Started node02
>  ClusterIP:1(ocf::heartbeat:IPaddr2):Started node02
>
> If I modify the clone-node-max to 1 then the resource is started on both
> nodes as seen in this pcs status output:
>
> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>  ClusterIP:0(ocf::heartbeat:IPaddr2):Started node02
>  ClusterIP:1(ocf::heartbeat:IPaddr2):Started node01
>
> But if one node fails the IP resource is not migrated to active node as is
> said in documentation.
>
> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>  ClusterIP:0(ocf::heartbeat:IPaddr2):Started node02
>  ClusterIP:1(ocf::heartbeat:IPaddr2):Stopped
>
> When the IP is active on both nodes the services are accessible so there
> is not an issue with the fact that the interface dose not have an IP
> allocated at boot. The gateway is set with another pcs command and it is
> working.
>
> Thank in advance for any info.
>
> Best regards
> Octavian Ciobanu
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread emmanuel segura

yes, if you are using drbd in master/slave, first promote the resource to
master and then start vm on the node, if you use drbd in multimaster, only
start the vm when drbd is started.

Use SAN, with multipath.

2017-07-18 16:34 GMT+02:00 Lentes, Bernd :

>
>
> - On Jul 17, 2017, at 11:51 AM, Bernd Lentes bernd.lentes@helmholtz-
> muenchen.de wrote:
>
> > Hi,
> >
> > i established a two node cluster with two HP servers and SLES 11 SP4.
> I'd like
> > to start now with a test period. Resources are virtual machines. The vm's
> > reside on a FC SAN. The SAN has two power supplies, two storage
> controller, two
> > network interfaces for configuration. Each storage controller has two FC
> > connectors. On each server i have one FC controller with two connectors
> in a
> > multipath configuration. Each connector from the SAN controller inside
> the
> > server is connected to a different storage controller from the SAN. But
> isn't a
> > SAN, despite all that redundancy, a SPOF ?
> > I'm asking myself if a DRBD configuration wouldn't be more redundant and
> high
> > available. There i have two completely independent instances of the vm.
> > We have one web application with a databse which is really crucial for
> us.
> > Downtime should be maximum one or two hours, if longer we run in trouble.
> > Is DRBD in conjuction with a database (MySQL or Postgres) possible ?
> >
> >
> > Bernd
> >
>
> Is with DRBD and Virtual Machines live migration possible ?
>
>
> Bernd
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD split brain after Cluster node recovery

2017-07-12 Thread emmanuel segura

you need to configure cluster fencing and drbd fencing handler, in this
way, the cluster can recevory without manual intervention.

2017-07-12 11:33 GMT+02:00 ArekW :

> Hi,
> Can in be fixed that the drbd is entering split brain after cluster
> node recovery? After few tests I saw drbd recovered but in most
> situations (9/10) it didn't sync.
>
> 1. When a node is put to standby and than unstandby everything is
> working OK. The drbd is syncing and go to primary mode.
>
> 2. When a node is (hard)poweroff, the stonith brings it up and
> eventually the node becomes online but the drdb is in StandAlone state
> on the recovered node. I can sync it only manually but that require to
> stop the cluster.
>
> Logs:
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Handshake to
> peer 1 successful: Agreed network protocol version 112
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Feature flags
> enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Starting
> ack_recv thread (from drbd_r_storage [28960])
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage: Preparing cluster-wide
> state change 2237079084 (0->1 499/145)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage: State change
> 2237079084: primary_nodes=1, weak_nodes=FFFC
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage: Committing cluster-wide
> state change 2237079084 (1ms)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn(
> Connecting -> Connected ) peer( Unknown -> Secondary )
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: current_size:
> 14679544
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> c_size: 14679544 u_size: 0 d_size: 14679544 max_size: 14679544
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> la_size: 14679544 my_usize: 0 my_max_size: 14679544
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: my node_id: 0
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> node_id: 1 idx: 0 bm-uuid: 0x441536064ceddc92 flags: 0x10 max_size:
> 14679544 (DUnknown)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> calling drbd_determine_dev_size()
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: my node_id: 0
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> node_id: 1 idx: 0 bm-uuid: 0x441536064ceddc92 flags: 0x10 max_size:
> 14679544 (DUnknown)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> drbd_sync_handshake:
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: self
> 342BE98297943C35:441536064CEDDC92:69D98E1FCC2BB44C:E04101C6FF76D1CC
> bits:15450 flags:120
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: peer
> A8908796A7CCFF6E:CE6B672F4EDA6E78:69D98E1FCC2BB44C:E04101C6FF76D1CC
> bits:32768 flags:2
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2:
> uuid_compare()=-100 by rule 100
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm initial-split-brain
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm initial-split-brain exit code 0 (0x0)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1: Split-Brain
> detected but unresolved, dropping connection!
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm split-brain
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage/0 drbd1 nfsnode2: helper
> command: /sbin/drbdadm split-brain exit code 0 (0x0)
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn(
> Connected -> Disconnecting ) peer( Secondary -> Unknown )
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: error
> receiving P_STATE, e: -5 l: 0!
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: ack_receiver
> terminated
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Terminating
> ack_recv thread
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Connection closed
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: conn(
> Disconnecting -> StandAlone )
> Jul 12 10:26:35 nfsnode1 kernel: drbd storage nfsnode2: Terminating
> receiver thread
>
>
> Config:
> resource storage {
>   protocol C;
>   meta-disk internal;
>   device /dev/drbd1;
>   syncer {
> verify-alg sha1;
>   }
>   net {
> allow-two-primaries;
>   }
>   on nfsnode1 {
> disk   /dev/storage/drbd;
> address  10.0.2.15:7789;
>   }
>   on nfsnode2 {
> disk   /dev/storage/drbd;
> address  10.0.2.4:7789;
>   }
> }
>
> pcs resource show StorageFS-clone
>  Clone: StorageFS-clone
>   Resource: StorageFS (class=ocf provider=heartbeat type=Filesystem)
>Attributes: device=/dev/drbd1 directory=/mnt/drbd fstype=gfs2
>Operations: start interval=0s timeout=60 (StorageFS-start-interval-0s)
>stop interval=0s timeout=60 (StorageFS-stop-interval-0s)
>monitor interval=20

Re: [ClusterLabs] Oracle 12c with Pacemaker and GFS2

2017-07-07 Thread emmanuel segura

I think is a good idea, if you first show your config and cluster logs,
because I never any limitation to run active/active in pacemaker.

2017-07-06 21:52 GMT+02:00 Jesse P. Johnson :

> ALL,
>
>
> I have setup an active/passive cluster using Pacemaker, CLVM, and GFS2 for
> Oracle12c. I can fail over and the system handles as expected. When trying
> to run a second instance in Active / Active it never comes up and just
> falls over.
>
>
> This outdated documentation states that Oracle Clusterware is required
> with crm. https://access.redhat.com/documentation/en-US/Red_
> Hat_Enterprise_Linux/5/html/Configuration_Example_-_
> Oracle_HA_on_Cluster_Suite/cluster_configuration.html
>
>
> Is this true for pacemaker as well? I know symantec and other clusterware
> alternatives don't have this limitation.
>
>
> Am I making a mistake or is this a limitation of these systems?
>
>
> Thanks,
>
>
> - Jesse P. Johnson CISSP RHC{A,DS,E,SA}
>
> Sr. DevOps Engineer
>
> PlanetRisk, Inc.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] PCSD Certificate

2017-07-06 Thread emmanuel segura

I don't know what can happen, if the ssl expired, but looking in
/usr/lib/pcsd/ssl.rb I found the function.

def generate_cert_key_pair(server_name)
  name = "/C=US/ST=MN/L=Minneapolis/O=pcsd/OU=pcsd/CN=#{server_name}"
  ca   = OpenSSL::X509::Name.parse(name)
  key = OpenSSL::PKey::RSA.new(2048)
  crt = OpenSSL::X509::Certificate.new
  crt.version = 2
  crt.serial  = ((Time.now).to_f * 1000).to_i
  crt.subject = ca
  crt.issuer = ca
  crt.public_key = key.public_key
  crt.not_before = Time.now
  crt.not_after  = Time.now + 10 * 365 * 24 * 60 * 60 # 10 year
  crt.sign(key, OpenSSL::Digest::SHA256.new)
  return crt, key
end


2017-07-06 7:41 GMT+02:00 BUVI :

> Hi,
>
> I would like to know, why certiticate is created in pacemaker and what
> will happen if it expires ?
>
>
> Thanks and Regards,
>
>
> *Bhuvanesh Kumar .G*
> Linux and Email Administrator
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread emmanuel segura

you can go ahead without updates, anyway, if you don't to pay for support,
use centos or other distro.

2017-06-16 10:14 GMT+02:00 Eric Robinson :

>
>
> Ø  You could test it for free, you just need to register
>
> Ø  to https://scc.suse.com/login
>
> Ø  After that, you have an access for 60 days to SLES Repo.
>
>
>
>
>
> What happens at the end of the trial? Software stops working?
>
>
>
> I can understand how SUSE can charge for support, but not for the software
> itself. Corosync, Pacemaker, and DRBD are all open source.
>
>
>
> --
>
> Eric Robinson
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Need to replace EMC shared disk with an EMC disk from a different EMC Storage

2017-06-15 Thread emmanuel segura

please, give more information and if you are using lvm, share your lvm
cluster information and the cluster config too.

2017-06-15 9:22 GMT+02:00 :

> Hi. We need to clear an old EMC storage and the only thing that's left
> there is the shared disk of our Pacemaker cluster.
>
>
>
> Version:
>
> RHEL 7.2
> Pacemaker 1.1.13
> Corosync 2.3.4
>
>
>
> Can someone tell me how can I replace the shared disk? I believe mirror
> the logical volume and take out the old EMC disk will be the best option.
>
>
>
> Found this Article
>
>
>
> https://access.redhat.com/documentation/en-US/Red_Hat_
> Enterprise_Linux/7/html/Logical_Volume_Manager_
> Administration/mirvol_create_ex.html
>
>
>
> The problem (or maybe it's not a problem at all) is that the shared disk
> wasn't created as mirror from the beginning, and now I'm not sure how can I
> add/replace the current shared disk that is already configured and has data
> on it.
>
>
>
> Help will be very appreciate.
>
>
>
> Thanks,
>
>
>
> David.
>
>
>
>
>
>
>
>
>
> --
> הודעת הדואר האלקטרוני שהגיעה אליך נשלחה משירותי בריאות כללית. ייתכן
> שבהודעה יש מידע רפואי רגיש המוגן בחוק הגנת הפרטיות התשמ"ה 1981. אם ההודעה
> אינה מיועדת לך, אנא השב/י את ההודעה לשולח. הנך מתבקש/ת למחוק את ההודעה
> מהמחשב. יצוין בפנייך שאם ההודעה אינה מיועדת לך ואף שייתכן שהגיעה אליך
> בטעות, הרי חלה עליך חובת שמירת סודיות. בברכה, הממונה על הגנת הפרטיות ,המידע
> והסייבר בשירותי בריאות כללית.
> --
> The contents of this email was sent to you by Clalit Health Services. This
> e-mail contains confidential medical information which is legally protected
> by the 1981 privacy law. This information is intended only for the use by
> the direct addressees from the original sender of this e-mail. If you are
> not an intended recipient of the original sender, you are hereby notified
> that any disclosure, copying, and distribution of this information, is
> strictly prohibited. If you have received this email in error, please
> immediately notify the sender and delete any copies of this e-mail in your
> possession.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

2017-03-06 Thread emmanuel segura

that you say to the cluster, to not perform any action, because you
are doing an intervention.

2017-03-06 9:14 GMT+01:00 Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>:
>>>> emmanuel segura <emi2f...@gmail.com> schrieb am 03.03.2017 um 17:43 in
> Nachricht
> <cae7pj3aj8_xydkypcmyqommkm2ku3s04vsskciyj6+tvnii...@mail.gmail.com>:
>> use something like standby?
>
> Hi!
>
> What is the benefit of using "standby" compared to stopping the whole cluster 
> stack on the node when you intend to update the cluster software, the kernel, 
> and perform a reboot? The only difference I see that after reboot the node 
> wouldn't start to run services, so I'll ahve to "online" the node again.
>
> Regards,
> Ulrich
>
>>
>> 2017-03-03 16:02 GMT+01:00 Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>:
>>>>>> emmanuel segura <emi2f...@gmail.com> schrieb am 03.03.2017 um 15:35 in
>>> Nachricht
>>>

Re: [ClusterLabs] Antw: Re: Antw: Re: Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

2017-03-03 Thread emmanuel segura

use something like standby?

2017-03-03 16:02 GMT+01:00 Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>:
>>>> emmanuel segura <emi2f...@gmail.com> schrieb am 03.03.2017 um 15:35 in
> Nachricht
>

Re: [ClusterLabs] Antw: Re: Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

2017-03-03 Thread emmanuel segura

I think is a good idea to put your cluster in maintenance mode, when
you do an update.

2017-03-03 15:11 GMT+01:00 Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>:
>>>> emmanuel segura <emi2f...@gmail.com> schrieb am 03.03.2017 um 14:22 in
> Nachricht
>

Re: [ClusterLabs] Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

2017-03-03 Thread emmanuel segura

your cluster was in maintenance state?

2017-03-03 13:59 GMT+01:00 Ulrich Windl :
> Hello!
>
> After Update and reboot of 2nd of three nodes (SLES11 SP4) I see a 
> "cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying" message 
> when I expected the node to joint the cluster. What can be the reasons for 
> this?
> In fact this seems to have killed cluster communication, because I saw that 
> "DLM start" timed out. The other nodes were unable to use DLM during that 
> time (while the node could not join).
>
> I saw that corosync starts before the firewall in SLES11 SP4; maybe that's a 
> problem.
>
> I tried an "rcopenais stop" of the problem node, which in tun caused a node 
> fence (DLM stop timed out, too), and then the other nodes were able to 
> communicate again. During boot the problem node was able to join the cluster 
> as before. In the meantime I had also updated the third node without a 
> problem, so it looks like a rare race condition to me.
> ANy insights?
>
> Could the problem be related to one of these messages?
> crmd[3656]:   notice: get_node_name: Could not obtain a node name for classic 
> openais (with plugin) nodeid 739512321
> corosync[3646]:  [pcmk  ] info: update_member: 0x64bc90 Node 739512325 
> ((null)) born on: 3352
> stonith-ng[3652]:   notice: get_node_name: Could not obtain a node name for 
> classic openais (with plugin) nodeid 739512321
> crmd[3656]:   notice: get_node_name: Could not obtain a node name for classic 
> openais (with plugin) nodeid 739512330
> cib[3651]:   notice: get_node_name: Could not obtain a node name for classic 
> openais (with plugin) nodeid 739512321
> cib[3651]:   notice: crm_update_peer_state: plugin_handle_membership: Node 
> (null)[739512321] - state is now member (was (null))
>
> crmd: info: crm_get_peer: Created entry 
> 8a7d6859-5ab1-404b-95a0-ba28064763fb/0x7a81f0 for node (null)/739512321 (2 
> total)
> crmd: info: crm_get_peer: Cannot obtain a UUID for node 
> 739512321/(null)
> crmd: info: crm_update_peer:  plugin_handle_membership: Node (null): 
> id=739512321 state=member addr=r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)  (new) 
> votes=0 born=0 seen=3352 proc=
>
>
> Regards,
> Ulrich
>
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Oralsnr/Oracle resources agents

2017-02-23 Thread emmanuel segura

I think no, in /usr/lib/ocf/resource.d/heartbeat/oralsnr

start function, oralsnr_start = "output=`echo lsnrctl start $listener
| runasdba`"
stop function, oralsnr_stop = "output=`echo lsnrctl stop $listener | runasdba`"

Where listener variable is the resource agent parameter given by
pacemaker : #   OCF_RESKEY_listener (optional; defaults to LISTENER)

Why don't use one listener per instance?

2017-02-23 16:37 GMT+01:00 Jihed M'selmi :
> I was reading the oralsnr script, I found that to stop a listener the agent
> uses the lsnrctl to stop the instances.
>
> My questions,  how to configure this agent for an oracle listener attached
> the multiple instance ?
>
> My 2nd quest, is it possible to enhance the ora-common.sh and
> resource.d/oracle to take in account the flag y/n in the oratab in order to
> start the database or no ?
>
> Cheers,
>
> --
>
>
> Jihed MSELMI
> RHCE, RHCSA, VCP4
> 10 Villa Stendhal, 75020 Paris France
> Mobile: +33 (0) 753768653
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Oracle Stopping

2017-02-22 Thread emmanuel segura

The first place where you need to look is oracle log.

2017-02-22 8:43 GMT+01:00 Ulrich Windl :
 Chad Cravens  schrieb am 22.02.2017 um 02:44 in
> Nachricht
>

Re: [ClusterLabs] SBD with shared block storage (and watchdog?)

2017-02-13 Thread emmanuel segura

I missed that, the same device for partition and sbd :(, really bad idea.

2017-02-13 19:04 GMT+01:00 Klaus Wenninger :
> On 02/13/2017 06:34 PM, dur...@mgtsciences.com wrote:
>> I am working to get an active/active cluster running.
>> I have Windows 10 running 2 Fedora 25 Virtualbox VMs.
>> VMs named node1, and node2.
>>
>> I created a vdi disk and set it to shared.
>> I formatted it to gfs2 with this command.
>>
>> mkfs.gfs2 -t msicluster:msigfs2 -j 2 /dev/sdb1
>>
>> After installing 'dlm' and insuring guest additions were
>> installed, I was able to mount the gfs2 parition.
>>
>> I then followed.
>>
>> https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod
>>
>> I used this command.
>>
>> sbd -d /dev/sdb1 create
>
> To be honest I have no experience with using a partition for
> a filesystem and sbd in parallel.
> I would guess that you have to tell the filesystem at least to
> reserve some space for sbd.
> For the first experience I would go for a separate
> partition for sbd.
>
>>
>>
>> Using sbd to 'list' returns nothing, but 'dump' shows this.
>
> Did you point list to /dev/sdb1 as well?
> (sbd list -d /dev/sdb1)
> Still it might return nothing as you haven't used
> any of the slot you created by now.
> You can try to add one manually though.
> (sbd allocate test -d /dev/sdb1)
>
>>
>>
>> fc25> sbd -d /dev/sdb1 dump
>> ==Dumping header on disk /dev/sdb1
>> Header version : 2.1
>> UUID   : 6094f0f4-2a07-47db-b4f7-6d478464d56a
>> Number of slots: 255
>> Sector size: 512
>> Timeout (watchdog) : 5
>> Timeout (allocate) : 2
>> Timeout (loop) : 1
>> Timeout (msgwait)  : 10
>> ==Header on disk /dev/sdb1 is dumped
>>
>> I then tried the 'watch' command and journalctl shows error listed.
>>
>> sbd -d /dev/sdb1 -W -P watch
>>
>> Feb 13 09:54:09 node1 sbd[6908]:error: watchdog_init: Cannot open
>> watchdog device '/dev/watchdog': No such file or directory (2)
>> Feb 13 09:54:09 node1 sbd[6908]:  warning: cleanup_servant_by_pid:
>> Servant for pcmk (pid: 6910) has terminated
>> Feb 13 09:54:09 node1 sbd[6908]:  warning: cleanup_servant_by_pid:
>> Servant for /dev/sdb1 (pid: 6909) has terminated
>>
>
> well, to be expected if the kernel doesn't see a watchdog device ...
>
>>
>> From
>>
>> http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit
>>
>> I installed watchdog.
>>
>> my /etc/sysconfig/sbd is.
>>
>> SBD_DELAY_START=no
>> SBD_OPTS=
>> SBD_PACEMAKER=yes
>> SBD_STARTMODE=clean
>> SBD_WATCHDOG_DEV=/dev/watchdog
>> SBD_WATCHDOG_TIMEOUT=5
>>
>> the sbd-fun-and-profit says to use this command.
>>
>> virsh edit vmnode
>
> You would do that on the host if you were running linux as host-os
> and you were using libvirt to control virtualization.
> Haven't played with VirtualBox and watchdog-devices. But probably
> it is possible to have one or it is there by default already.
> You best go to the graphical guest definition and search amongst
> the to be added devices.
> Don't know if it is a virtual version of something that exists in physical
> world so that the linux kernel is expected to have a driver for it or
> if the driver comes with the guest additions.
> Otherwise you can go for softdog of course - but be aware that this
> won't take you out if the kernel is hanging.
>
> Regards,
> Klaus
>
>>
>>
>> But there is no vmnode and no instructions on how to create it.
>>
>> Is anyone able to piece together the missing steps?
>>
>>
>> Thank you.
>>
>> Durwin F. De La Rue
>> Management Sciences, Inc.
>> 6022 Constitution Ave. NE
>> Albuquerque, NM  87110
>> Phone (505) 255-8611
>>
>>
>> This email message and any attachments are for the sole use of the
>> intended recipient(s) and may contain proprietary and/or confidential
>> information which may be privileged or otherwise protected from
>> disclosure. Any unauthorized review, use, disclosure or distribution
>> is prohibited. If you are not the intended recipient(s), please
>> contact the sender by reply email and destroy the original message and
>> any copies of the message as well as any attachments to the original
>> message.
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:

Re: [ClusterLabs] Can't create a resource for ocf:heartbeat:oracle and oraclsnr.

2017-01-30 Thread emmanuel segura

you can start, reading the meta-data session of the resource agent
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/oracle

2017-01-31 0:21 GMT+01:00 Jihed M'selmi <jihed.mse...@gmail.com>:
> I wish I could :-/
>
> All I am asking about the requirement to use the Resource Agent
> OCF:heartbeat:oracle and ocf:heartbeat:orclsnr.
>
> Thanks
>
> Jihed M’SELMI
> Mobile: +21658433664
> http://about.me/jihed.mselmi
>
> On Tue, Jan 31, 2017 at 12:16 AM, emmanuel segura <emi2f...@gmail.com>
> wrote:
>>
>> please, if you need help, the first thing is show, your cluster
>> configuration.
>>
>> 2017-01-30 23:15 GMT+01:00 Jihed M'selmi <jihed.mse...@gmail.com>:
>> > I tried to install two resources: a resource for oracle database and
>> > oracle
>> > listener: but the pcmk can't install the resource (red hat 7.3) usint
>> > hte
>> > ocf:heartbeat:oracle and oraclsnr
>> >
>> > On the log,ti shows that the sqlplus was not installed.
>> >
>> > I installed it, but, I keep getting the same message and the resources
>> > were
>> > not installed.
>> >
>> > Is there any requirement to use OCF:HEARTBEAT:ORACLE and ORACLSNR ?
>> >
>> > In my resource group,  I have One rsc for IP, Three rsc for filesystems
>> > where I have the oracle binary, db and backup and I should have Two more
>> > rsc
>> > for database and listener.
>> >
>> > Could anyone share how to configure a peacemaker and corosync to host an
>> > Oracle database on two nodes ? (or more).
>> >
>> > Thanks in advance.
>> > Cheers,
>> > JM
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>>
>> --
>>   .~.
>>   /V\
>>  //  \\
>> /(   )\
>> ^`~'^
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Can't create a resource for ocf:heartbeat:oracle and oraclsnr.

2017-01-30 Thread emmanuel segura

please, if you need help, the first thing is show, your cluster configuration.

2017-01-30 23:15 GMT+01:00 Jihed M'selmi :
> I tried to install two resources: a resource for oracle database and oracle
> listener: but the pcmk can't install the resource (red hat 7.3) usint hte
> ocf:heartbeat:oracle and oraclsnr
>
> On the log,ti shows that the sqlplus was not installed.
>
> I installed it, but, I keep getting the same message and the resources were
> not installed.
>
> Is there any requirement to use OCF:HEARTBEAT:ORACLE and ORACLSNR ?
>
> In my resource group,  I have One rsc for IP, Three rsc for filesystems
> where I have the oracle binary, db and backup and I should have Two more rsc
> for database and listener.
>
> Could anyone share how to configure a peacemaker and corosync to host an
> Oracle database on two nodes ? (or more).
>
> Thanks in advance.
> Cheers,
> JM
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] VirtualDomain started in two hosts

2017-01-17 Thread emmanuel segura

show your cluster configuration.

2017-01-17 10:15 GMT+01:00 Oscar Segarra <oscar.sega...@gmail.com>:
> Hi,
>
> Yes, I will try to explain myself better.
>
> Initially
> On node1 (vdicnode01-priv)
>>virsh list
> ==
> vdicdb01 started
>
> On node2 (vdicnode02-priv)
>>virsh list
> ==
> vdicdb02 started
>
> --> Now, I execute the migrate command (outside the cluster <-- not using
> pcs resource move)
> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
> tcp://vdicnode02-priv
>
> Finally
> On node1 (vdicnode01-priv)
>>virsh list
> ==
> vdicdb01 started
>
> On node2 (vdicnode02-priv)
>>virsh list
> ==
> vdicdb02 started
> vdicdb01 started
>
> If I query cluster pcs status, cluster thinks resource vm-vdicdb01 is only
> started on node vdicnode01-priv.
>
> Thanks a lot.
>
>
>
> 2017-01-17 10:03 GMT+01:00 emmanuel segura <emi2f...@gmail.com>:
>>
>> sorry,
>>
>> But do you mean, when you say, you migrated the vm outside of the
>> cluster? one server out side of you cluster?
>>
>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra <oscar.sega...@gmail.com>:
>> > Hi,
>> >
>> > I have configured a two node cluster whewe run 4 kvm guests on.
>> >
>> > The hosts are:
>> > vdicnode01
>> > vdicnode02
>> >
>> > And I have created a dedicated network card for cluster management. I
>> > have
>> > created required entries in /etc/hosts:
>> > vdicnode01-priv
>> > vdicnode02-priv
>> >
>> > The four guests have collocation rules in order to make them distribute
>> > proportionally between my two nodes.
>> >
>> > The problem I have is that if I migrate a guest outside the cluster, I
>> > mean
>> > using the virsh migrate - - live...  Cluster,  instead of moving back
>> > the
>> > guest to its original node (following collocation sets),  Cluster starts
>> > again the guest and suddenly I have the same guest running on both nodes
>> > causing xfs corruption in guest.
>> >
>> > Is there any configuration applicable to avoid this unwanted behavior?
>> >
>> > Thanks a lot
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>>
>> --
>>   .~.
>>   /V\
>>  //  \\
>> /(   )\
>> ^`~'^
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] VirtualDomain started in two hosts

2017-01-17 Thread emmanuel segura

sorry,

But do you mean, when you say, you migrated the vm outside of the
cluster? one server out side of you cluster?

2017-01-17 9:27 GMT+01:00 Oscar Segarra :
> Hi,
>
> I have configured a two node cluster whewe run 4 kvm guests on.
>
> The hosts are:
> vdicnode01
> vdicnode02
>
> And I have created a dedicated network card for cluster management. I have
> created required entries in /etc/hosts:
> vdicnode01-priv
> vdicnode02-priv
>
> The four guests have collocation rules in order to make them distribute
> proportionally between my two nodes.
>
> The problem I have is that if I migrate a guest outside the cluster, I mean
> using the virsh migrate - - live...  Cluster,  instead of moving back the
> guest to its original node (following collocation sets),  Cluster starts
> again the guest and suddenly I have the same guest running on both nodes
> causing xfs corruption in guest.
>
> Is there any configuration applicable to avoid this unwanted behavior?
>
> Thanks a lot
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] setting up SBD_WATCHDOG_TIMEOUT, stonith-timeout and stonith-watchdog-timeout

2016-12-08 Thread emmanuel segura

the only thing that I can say is: sbd is a realtime process

2016-12-08 11:47 GMT+01:00 Jehan-Guillaume de Rorthais :
> Hello,
>
> While setting this various parameters, I couldn't find documentation and
> details about them. Bellow some questions.
>
> Considering the watchdog module used on a server is set up with a 30s timer
> (lets call it the wdt, the "watchdog timer"), how should
> "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be 
> set?
>
> Here is my thinking so far:
>
> "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before the
> wdt expire so the server stay alive. Online resources and default values are
> usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to 
> reset
> the timer multiple times (eg. because of excessive load, swap storm etc)? The
> server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, right?
>
> "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what is
> stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after 
> it
> asked for a node fencing before it considers the watchdog was actually
> triggered and the node reseted, even with no confirmation? I suppose
> "stonith-watchdog-timeout" is mostly useful to stonithd, right?
>
> "stonith-watchdog-timeout < stonith-timeout". I understand the stonith action
> timeout should be at least greater than the wdt so stonithd will not raise a
> timeout before the wdt had a chance to exprire and reset the node. Is it 
> right?
>
> Any other comments?
>
> Regards,
> --
> Jehan-Guillaume de Rorthais
> Dalibo
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] OS Patching Process

2016-11-22 Thread emmanuel segura

I been using this mode: iscsi_disks -> lvm volume ->
drbd_on_top_of_lvm -> filesystem

resize: add_one_iscsi_device_to_every_cluster_node_first ->
now_add_device_the_volume_group_on_every_cluster_node ->
now_resize_the_volume_on_every_cluster_node : now you have every
cluster with the same logical volume size, now you can resize drbd and
filesystem on the active node

2016-11-22 17:35 GMT+01:00 Jason A Ramsey :
> Can anyone recommend a bulletproof process for OS patching a pacemaker
> cluster that manages a drbd mirror (with LVM on top of the drbd and luns
> defined for an iscsi target cluster if that matters)? Any time I’ve tried to
> mess with the cluster, it seems like I manage to corrupt my drbd filesystem,
> and now that I have actual data on the thing, that’s kind of a scary
> proposition. Thanks in advance!
>
>
>
> --
>
>
>
> [ jR ]
>
>
>
>   there is no path to greatness; greatness is the path
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread emmanuel segura

If you want to reduce the multipath switching time, when one
controller goes down
https://www.redhat.com/archives/dm-devel/2009-April/msg00266.html

2016-10-13 10:27 GMT+02:00 Ulrich Windl :
 Eric Ren  schrieb am 13.10.2016 um 09:31 in Nachricht
> :
>> Hi,
>>
>> On 10/10/2016 10:46 PM, Ulrich Windl wrote:
>>> Hi!
>>>
>>> I observed an interesting thing: In a three node cluster (SLES11 SP4) with
>> cLVM and OCFS2 on top, one node was fenced as the OCFS2 filesystem was
>> somehow busy on unmount. We have (for paranoid reasons mainly) an excessive
>> long fencing timout for SBD: 180 seconds
>>>
>>> While one node was actually reset immediately (the cluster was still waiting
>> for the fencing to "complete" through timeout), the other nodes seemed to
>> freeze the filesystem. Thus I observed a read delay > 140 seconds on one 
>> node,
>> the other was also close to 140 seconds.
>> ocfs2 and cLVM are both depending on DLM. DLM deamon will notify them to
>> stop service (which
>> means any cluster locking
>> request would be blocked) during the fencing process.
>>
>> So I'm wondering why it takes so long to finish the fencing process?
>
> As I wrote: Using SBD this is paranoia (as fencing doesn't report back a 
> status like "completed" or "failed". Actually the fencing only needs a few 
> seconds, but the timeout is 3 minutes. Only then the cluster believes that 
> the node is down now (our servers boot so slowly that they are not up within 
> three minutes, also). Why three minutes? Writing to a SCSI disk may be 
> retried up to one minute, and reading may also be retried for a minute. So 
> for a bad SBD disk (or some strange transport problem) it could take two 
> minutes until the receiving SBD gets the fencing command. If the timeout is 
> too low, resources could be restarted before the node was actually fenced, 
> causing data corruption.
>
> Ulrich
> P.S: One common case where our SAN disks seem slow is "Online" firmware 
> update where a controller may be down 20 to 30 seconds. Multipathing is 
> expected to switch to another controller within a few seconds. However the 
> commands to test the disk in multipath are also SCSI commands that may hang 
> for a while...
>
>>
>> Eric
>>>
>>> This was not expected for a cluster filesystem (by me).
>>>
>>> I wonder: Is that expected bahavior?
>>>
>>> Regards,
>>> Ulrich
>>>
>>>
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker and Oracle ASM

2016-10-10 Thread emmanuel segura

why you don't use oracle rac with asm?

2016-10-07 18:46 GMT+02:00 Chad Cravens :

> Hello:
>
> I'm working on a project where the client is using Oracle ASM (volume
> manager) for database storage. I have implemented a cluster before using
> LVM with ext4 and understand there are resource agents (RA) already
> existing within the ocf:heartbeat group that can manage which nodes connect
> and disconnect to the filesystem and prevents data corruption. For example:
>
> pcs resource create my_lvm LVM volgrpname=my_vg \
> exclusive=true --group apachegroup
>
> pcs resource create my_fs Filesystem \
> device="/dev/my_vg/my_lv" directory="/var/www" fstype="ext4" --group \
> apachegroup
>
> I'm curious if anyone has had a situation where Oracle ASM is used instead
> of LVM? ASM seems pretty standard for Oracle databases, but not sure what
> resource agent I can use to manage the ASM manager?
>
> Thanks!
>
> --
> Kindest Regards,
> Chad Cravens
> (843) 291-8340
>
> [image: http://www.ossys.com] 
> [image: http://www.linkedin.com/company/open-source-systems-llc]
>    [image:
> https://www.facebook.com/OpenSrcSys] 
>[image: https://twitter.com/OpenSrcSys]
>   [image:
> http://www.youtube.com/OpenSrcSys]    
> [image:
> http://www.ossys.com/feed]    [image:
> cont...@ossys.com] 
> Chad Cravens
> (843) 291-8340
> chad.crav...@ossys.com
> http://www.ossys.com
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] ocf:heartbeat:LVM or /etc/lvm/lvm.conf settings question

2016-08-10 Thread emmanuel segura

you described your problem, but didn't show any logs or cluster config.

2016-08-10 23:47 GMT+02:00 Darren Kinley <dkin...@mdacorporation.com>:
> The default lvm.conf's filter includes /dev/drbdX
>
> # By default we accept every block device except udev names, floppy and 
> cdrom drives:
> filter = [ "r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|", 
> "r|/dev/fd.*|", "r|/dev/cdrom|", "a/.*/" ]
>
> -Original Message-
> From: emmanuel segura [mailto:emi2f...@gmail.com]
> Sent: Wednesday, August 10, 2016 2:33 PM
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> <users@clusterlabs.org>
> Subject: Re: [ClusterLabs] ocf:heartbeat:LVM or /etc/lvm/lvm.conf settings 
> question
>
> your lvm filter include the drbd devices /dev/drbdX ?
>
> 2016-08-10 21:38 GMT+02:00 Darren Kinley <dkin...@mdacorporation.com>:
>> Hi,
>>
>> I have an LVM logical volume and used DRBD to replicate it to another
>> server.
>> The /dev/drbd0 has PV/VG/LVs which are mostly working.
>> I have colocation and order constraints that bring up a VIP, promote
>> DRBD and start LVM plus file systems.
>>
>> The problem arises when I take the active node offline.
>> At that point the VIP and DRBD master move but the PV/VG are not
>> scanned/activated, the file systems are not mounted and “crm status”
>> reports an error for the ocf:heartbeat:LVM resource
>>
>> “Volume group [replicated] does not exist or contains an error!
>> Using volume group(s) on command line.”
>>
>> At this point the /dev/drbd0 physical volume is not known to the
>> server and the fix requires
>>
>> root# pvscan –cache /dev/drbd0
>> root# crm resource cleanup grp-ars-lvm-fs
>>
>> Is there an ocf:heartbeat:LVM setting or /etc/lvm/lvm.conf settings to
>> force the PV/VGs to come online?
>> It is not clear whether the RA script “exclusive” or “tag” settings
>> are needed or there is a corresponding lvm.conf setting.
>>
>> Is l”vm.conf write_cache_state = 0” recommended by the DRBD User Guide
>> correct?
>>
>> Thanks,
>> Darren
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
>   .~.
>   /V\
>  //  \\
> /(   )\
> ^`~'^
>
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] ocf:heartbeat:LVM or /etc/lvm/lvm.conf settings question

2016-08-10 Thread emmanuel segura

your lvm filter include the drbd devices /dev/drbdX ?

2016-08-10 21:38 GMT+02:00 Darren Kinley :
> Hi,
>
> I have an LVM logical volume and used DRBD to replicate it to another
> server.
> The /dev/drbd0 has PV/VG/LVs which are mostly working.
> I have colocation and order constraints that bring up a VIP, promote DRBD
> and start LVM plus file systems.
>
> The problem arises when I take the active node offline.
> At that point the VIP and DRBD master move but the PV/VG are not
> scanned/activated, the file systems are not mounted
> and “crm status” reports an error for the ocf:heartbeat:LVM resource
>
> “Volume group [replicated] does not exist or contains an error!
> Using volume group(s) on command line.”
>
> At this point the /dev/drbd0 physical volume is not known to the server and
> the fix requires
>
> root# pvscan –cache /dev/drbd0
> root# crm resource cleanup grp-ars-lvm-fs
>
> Is there an ocf:heartbeat:LVM setting or /etc/lvm/lvm.conf settings to force
> the PV/VGs to come online?
> It is not clear whether the RA script “exclusive” or “tag” settings are
> needed or there is a corresponding lvm.conf setting.
>
> Is l”vm.conf write_cache_state = 0” recommended by the DRBD User Guide
> correct?
>
> Thanks,
> Darren
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Bloody Newbie needs help for OCFS2 on pacemaker+corosync+pcs

2016-08-02 Thread emmanuel segura

This link can help you
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-supported.html

2016-08-02 15:37 GMT+02:00  <t...@it-hluchnik.de>:
> What do you mean? What is the "resource agent for using o2cb"? Please explain 
> this a bit closer, I am just becoming familiar with all that stuff.
>
> Thanks for any help,
>
> Thomas Hluchnik
>
>
> Am Tuesday 02 August 2016 15:28:17 schrieb emmanuel segura:
>> why you don't use the resource agent for using o2cb? This script for
>> begin used with ocfs legacy mode.
>>
>> 2016-08-02 12:39 GMT+02:00 Kyle O'Donnell <ky...@0b10.mx>:
>> > er forgot
>> >
>> > primitive p_o2cb lsb:o2cb \
>> > op monitor interval="10" timeout="30" \
>> > op start interval="0" timeout="120" \
>> > op stop interval="0" timeout="120"
>> >
>> > - Original Message -
>> > From: "Kyle O'Donnell" <ky...@0b10.mx>
>> > To: "users" <users@clusterlabs.org>
>> > Sent: Tuesday, August 2, 2016 6:38:11 AM
>> > Subject: Re: [ClusterLabs] Bloody Newbie needs help for OCFS2 on
>> > pacemaker+corosync+pcs
>> >
>> > primitive mysan ocf:heartbeat:Filesystem \
>> > params device="/dev/myocsdevice" directory="/mymount" 
>> > fstype="ocfs2" options="rw,noatime" \
>> > op monitor timeout="40" interval="20" depth="0"
>> > clone cl_ocfs2mgmt p_o2cb \
>> > meta interleave="true"
>> > clone cl_mysan mysan \
>> > meta target-role="Started"
>> > order o_myresource_fs inf: cl_mysan myresource
>> >
>> >
>> > - Original Message -
>> > From: t...@it-hluchnik.de
>> > To: "users" <users@clusterlabs.org>
>> > Sent: Tuesday, August 2, 2016 6:31:44 AM
>> > Subject: [ClusterLabs] Bloody Newbie needs help for OCFS2 on
>> > pacemaker+corosync+pcs
>> >
>> > Hello everybody,
>> > I am new to pacemaker (and to this list), trying to understand pacemaker. 
>> > For this I created three virtual hosts in my VirtualBox plus four shared 
>> > disks, attached with each of the three nodes.
>> >
>> > I installed Oracle Enterprise Linux 7.1, did a "yum update" and got OEL7.2.
>> > Then I created four OCFS2 devices, working fine on all of my three nodes. 
>> > They are started by systemd, using o2cb.service and ocfs2.service and 
>> > running fine.
>> >
>> > Now I have started with learning pacemaker by "Clusters from Scratch" and 
>> > meanwhile I have a virtual IP and a Webserver, this works fine so far.
>> >
>> > Next I want to control my OCFS2 devices by pacemaker, not by systemd. I 
>> > searched the net and found some howtos, but they rely on crmsh instead of 
>> > pcs. Most headaches come from DRBD which I don't understand at all. Why 
>> > the hell does it seem that I need DRBD for running OCFS2?
>> >
>> > Is there anybody who can explain me how to get that running (after 
>> > disabling o2cb.service & ocfs2.service):
>> >
>> > - create a resource which manages and controls o2cb stack
>> > - create another resource which manages OCFS2 mountpoints
>> > - create constraints for the Web Server (all Apache config / content shall 
>> > be copied to one of the OCFS2 filesystems)
>> >
>> > The Web Server shall be dependent from availability of a mounted OCFS2 
>> > device. If it stops working, the Web Server must switch to a node where 
>> > that mount point is OK.
>> >
>> > Thanks in advance for any help
>> >
>> > Thomas Hluchnik
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Bloody Newbie needs help for OCFS2 on pacemaker+corosync+pcs

2016-08-02 Thread emmanuel segura

why you don't use the resource agent for using o2cb? This script for
begin used with ocfs legacy mode.

2016-08-02 12:39 GMT+02:00 Kyle O'Donnell :
> er forgot
>
> primitive p_o2cb lsb:o2cb \
> op monitor interval="10" timeout="30" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120"
>
> - Original Message -
> From: "Kyle O'Donnell" 
> To: "users" 
> Sent: Tuesday, August 2, 2016 6:38:11 AM
> Subject: Re: [ClusterLabs] Bloody Newbie needs help for OCFS2 on
> pacemaker+corosync+pcs
>
> primitive mysan ocf:heartbeat:Filesystem \
> params device="/dev/myocsdevice" directory="/mymount" fstype="ocfs2" 
> options="rw,noatime" \
> op monitor timeout="40" interval="20" depth="0"
> clone cl_ocfs2mgmt p_o2cb \
> meta interleave="true"
> clone cl_mysan mysan \
> meta target-role="Started"
> order o_myresource_fs inf: cl_mysan myresource
>
>
> - Original Message -
> From: t...@it-hluchnik.de
> To: "users" 
> Sent: Tuesday, August 2, 2016 6:31:44 AM
> Subject: [ClusterLabs] Bloody Newbie needs help for OCFS2 on
> pacemaker+corosync+pcs
>
> Hello everybody,
> I am new to pacemaker (and to this list), trying to understand pacemaker. For 
> this I created three virtual hosts in my VirtualBox plus four shared disks, 
> attached with each of the three nodes.
>
> I installed Oracle Enterprise Linux 7.1, did a "yum update" and got OEL7.2.
> Then I created four OCFS2 devices, working fine on all of my three nodes. 
> They are started by systemd, using o2cb.service and ocfs2.service and running 
> fine.
>
> Now I have started with learning pacemaker by "Clusters from Scratch" and 
> meanwhile I have a virtual IP and a Webserver, this works fine so far.
>
> Next I want to control my OCFS2 devices by pacemaker, not by systemd. I 
> searched the net and found some howtos, but they rely on crmsh instead of 
> pcs. Most headaches come from DRBD which I don't understand at all. Why the 
> hell does it seem that I need DRBD for running OCFS2?
>
> Is there anybody who can explain me how to get that running (after disabling 
> o2cb.service & ocfs2.service):
>
> - create a resource which manages and controls o2cb stack
> - create another resource which manages OCFS2 mountpoints
> - create constraints for the Web Server (all Apache config / content shall be 
> copied to one of the OCFS2 filesystems)
>
> The Web Server shall be dependent from availability of a mounted OCFS2 
> device. If it stops working, the Web Server must switch to a node where that 
> mount point is OK.
>
> Thanks in advance for any help
>
> Thomas Hluchnik
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Two node Active/Active Asterisk+GFS2+DLM+fence_xvm Cluster

2016-07-15 Thread emmanuel segura

maybe you need interleave=true in your clones

2016-07-15 8:32 GMT+02:00 Ulrich Windl :
 TEG AMJG  schrieb am 14.07.2016 um 23:47 in Nachricht
> :
>> Dear list
>>
>> I am quite new to PaceMaker and i am configuring a two node active/active
>> cluster which consist basically on something like this:
>>
>> I am using pcsd Pacemaker/Corosync:
>>
>>  Clone Set: dlm-clone [dlm]
>>  Started: [ pbx1vs3 pbx2vs3 ]
>>  Clone Set: asteriskfs-clone [asteriskfs]
>>  Started: [ pbx1vs3 pbx2vs3 ]
>>  Clone Set: asterisk-clone [asterisk]
>>  Started: [ pbx1vs3 pbx2vs3 ]
>>  fence_pbx2_xvm(stonith:fence_xvm):Started pbx1vs3
>>  fence_pbx1_xvm(stonith:fence_xvm):Started pbx2vs3
>>  Clone Set: clvmd-clone [clvmd]
>>  Started: [ pbx1vs3 pbx2vs3]
>>
>> Now my problem is that, for example, when i fence one of the nodes, the
>> other one restarts every clone resource and start them back again, same
>> thing happens when i stop pacemaker and corosync in one node only (pcs
>> cluster stop). That would mean that if i have a problem in one of my
>> Asterisk (for example in DLM resource or CLVMD) that would require fencing
>> right away, for example node pbx2vs3, the other node (pbx1vs3) will restart
>> every service which will drop all my calls in a well functioning node.
>>
>> All this leads to a basic question, is this a strict way for clone
>> resources to behave?, is it possible to configure them so they would
>
> Usually you'll find in syslog or cluster's log files more details why a 
> resource is restarted. Without knowing these details anybody can just guess. 
> It could be that you made a configuration error...
>
>> behave, dare i say, in a more unique way (i know about the option
>> globally-unique but as far as i understand that doesnt do the work). I have
>> been reading about clone resources for a while but there are no many
>> examples about what it cant do.
>>
>> Thanks in advance
>>
>> Alejandro
>
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread emmanuel segura

using pcs resource unmanage leave the monitoring resource actived, I
usually set the monitor interval=0 :)

2016-07-11 10:43 GMT+02:00 Tomas Jelinek :
> Dne 9.7.2016 v 06:39 jaspal singla napsal(a):
>>
>> Hello Everyone,
>>
>> I need little help, if anyone can give some pointers, it would help me a
>> lot.
>>
>> In RHEL-7.x:
>>
>> There is concept of pacemaker and when I use the below command to freeze
>> my resource group operation, it actually stops all of the resources
>> associated under the resource group.
>>
>> # pcs cluster standby 
>>
>> # pcs cluster unstandby 
>>
>> Result:  This actually stops all of the resource group in that node
>> (ctm_service is one of the resource group, which gets stop including
>> database as well, it goes to MOUNT mode)
>
>
> Hello Jaspal,
>
> that's what it's supposed to do. Putting a node into standby means the node
> cannot host any resources.
>
>>
>> However; through clusvcadm command on RHEL-6.x, it doesn't stop the
>> ctm_service there and my database is in RW mode.
>>
>> # clusvcadm -Z ctm_service
>>
>> # clusvcadm -U ctm_service
>>
>> So my concern here is - Freezing/unfreezing should not affect the status
>> of the group. Is there any way around to achieve the same in RHEL-7.x as
>> well, that was done with clusvcadm on RHEL 6?
>
>
> Maybe you are looking for
> # pcs resource unmanage 
> and
> # pcs resource manage 
>
> Regards,
> Tomas
>
>>
>> Thanks
>>
>> Jaspal
>>
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] RES: Pacemaker and OCFS2 on stand alone mode

2016-07-07 Thread emmanuel segura

dlm_tool dump ?

2016-07-07 18:57 GMT+02:00 Carlos Xavier :
> Tank you for the fast reply
>
>>
>> have you configured the stonith and drbd stonith handler?
>>
>
> Yes. they were configured.
> The cluster was running fine for more than 4 years, until we loose one host 
> by power supply failure.
> Now I need to access the files on the host that is working.
>
>> 2016-07-07 16:43 GMT+02:00 Carlos Xavier :
>> > Hi.
>> > We had a Pacemaker cluster running OCFS2 filesystem over a DRBD device and 
>> > we completely lost one of
>> the hosts.
>> > Now I need some help to recover the data on the remaining machine.
>> > I was able to load the DRBD module by hand bring up the devices using the 
>> > drbdadm command line:
>> > apolo:~ # modprobe drbd
>> > apolo:~ # cat /proc/drbd
>> > version: 8.3.9 (api:88/proto:86-95)
>> > srcversion: A67EB2D25C5AFBFF3D8B788
>> >
>> > apolo:~ # drbd-overview
>> >   0:backup
>> >   1:export
>> > apolo:~ # drbdadm attach backup
>> > apolo:~ # drbdadm attach export
>> > apolo:~ # drbd-overview
>> >   0:backup  StandAlone Secondary/Unknown UpToDate/DUnknown r-
>> >   1:export  StandAlone Secondary/Unknown UpToDate/DUnknown r-
>> > apolo:~ # drbdadm primary backup apolo:~ # drbdadm primary export apolo:~ 
>> > # drbd-overview
>> >   0:backup  StandAlone Primary/Unknown   UpToDate/DUnknown r-
>> >   1:export  StandAlone Primary/Unknown UpToDate/DUnknown r-
>> >
>> > We have these resources and constraints configured:
>> > primitive resDLM ocf:pacemaker:controld \
>> > op monitor interval="120s"
>> > primitive resDRBD_0 ocf:linbit:drbd \
>> > params drbd_resource="backup" \
>> > operations $id="resDRBD_0-operations" \
>> > op start interval="0" timeout="240" \
>> > op stop interval="0" timeout="100" \
>> > op monitor interval="20" role="Master" timeout="20" \
>> > op monitor interval="30" role="Slave" timeout="20"
>> > primitive resDRBD_1 ocf:linbit:drbd \
>> > params drbd_resource="export" \
>> > operations $id="resDRBD_1-operations" \
>> > op start interval="0" timeout="240" \
>> > op stop interval="0" timeout="100" \
>> > op monitor interval="20" role="Master" timeout="20" \
>> > op monitor interval="30" role="Slave" timeout="20"
>> > primitive resFS_BACKUP ocf:heartbeat:Filesystem \
>> > params device="/dev/drbd/by-res/backup" directory="/backup"
>> > fstype="ocfs2" options="rw,noatime" \
>> > op monitor interval="120s"
>> > primitive resFS_EXPORT ocf:heartbeat:Filesystem \
>> > params device="/dev/drbd/by-res/export" directory="/export"
>> > fstype="ocfs2" options="rw,noatime" \
>> > op monitor interval="120s"
>> > primitive resO2CB ocf:ocfs2:o2cb \
>> > op monitor interval="120s"
>> > group DRBD_01 resDRBD_0 resDRBD_1
>> > ms msDRBD_01 DRBD_01 \
>> > meta resource-stickines="100" notify="true" master-max="2"
>> > interleave="true" target-role="Started"
>> > clone cloneDLM resDLM \
>> > meta globally-unique="false" interleave="true"
>> > target-role="Started"
>> > clone cloneFS_BACKUP resFS_BACKUP \
>> > meta interleave="true" ordered="true" target-role="Started"
>> > clone cloneFS_EXPORT resFS_EXPORT \
>> > meta interleave="true" ordered="true" target-role="Started"
>> > clone cloneO2CB resO2CB \
>> > meta globally-unique="false" interleave="true"
>> > target-role="Started"
>> > colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master colocation
>> > colFS_BACKUP-O2CB inf: cloneFS_BACKUP cloneO2CB colocation
>> > colFS_EXPORT-O2CB inf: cloneFS_EXPORT cloneO2CB colocation colO2CBDLM inf: 
>> > cloneO2CB cloneDLM order
>> ordDLMO2CB 0: cloneDLM cloneO2CB order ordDRBDDLM 0: msDRBD_01:promote 
>> cloneDLM:start order ordO2CB-
>> FS_BACKUP 0: cloneO2CB cloneFS_BACKUP order ordO2CB-FS_EXPORT 0:
>> > cloneO2CB cloneFS_EXPORT
>> >
>> > As the DRBD devices were brought up by hand, Pacemaker doesn't
>> > recognize they are up and so it doesn't start the DLM resource and all 
>> > resources that depends on it
>> stay stopped.
>> > Is there any way I can circumvent this issue?
>> > Is it possible to bring the OCFS2 resources working on standalone mode?
>> > Please, any help will be very welcome.
>> >
>> > Best regards,
>> > Carlos.
>> >
>> >
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:

Re: [ClusterLabs] Pacemaker and OCFS2 on stand alone mode

2016-07-07 Thread emmanuel segura

have you configured the stonith and drbd stonith handler?

2016-07-07 16:43 GMT+02:00 Carlos Xavier :
> Hi.
> We had a Pacemaker cluster running OCFS2 filesystem over a DRBD device and we 
> completely lost one of the hosts.
> Now I need some help to recover the data on the remaining machine.
> I was able to load the DRBD module by hand bring up the devices using the 
> drbdadm command line:
> apolo:~ # modprobe drbd
> apolo:~ # cat /proc/drbd
> version: 8.3.9 (api:88/proto:86-95)
> srcversion: A67EB2D25C5AFBFF3D8B788
>
> apolo:~ # drbd-overview
>   0:backup
>   1:export
> apolo:~ # drbdadm attach backup
> apolo:~ # drbdadm attach export
> apolo:~ # drbd-overview
>   0:backup  StandAlone Secondary/Unknown UpToDate/DUnknown r-
>   1:export  StandAlone Secondary/Unknown UpToDate/DUnknown r- apolo:~ # 
> drbdadm primary backup apolo:~ # drbdadm primary export
> apolo:~ # drbd-overview
>   0:backup  StandAlone Primary/Unknown   UpToDate/DUnknown r-
>   1:export  StandAlone Primary/Unknown UpToDate/DUnknown r-
>
> We have these resources and constraints configured:
> primitive resDLM ocf:pacemaker:controld \
> op monitor interval="120s"
> primitive resDRBD_0 ocf:linbit:drbd \
> params drbd_resource="backup" \
> operations $id="resDRBD_0-operations" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100" \
> op monitor interval="20" role="Master" timeout="20" \
> op monitor interval="30" role="Slave" timeout="20"
> primitive resDRBD_1 ocf:linbit:drbd \
> params drbd_resource="export" \
> operations $id="resDRBD_1-operations" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100" \
> op monitor interval="20" role="Master" timeout="20" \
> op monitor interval="30" role="Slave" timeout="20"
> primitive resFS_BACKUP ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/backup" directory="/backup"
> fstype="ocfs2" options="rw,noatime" \
> op monitor interval="120s"
> primitive resFS_EXPORT ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/export" directory="/export"
> fstype="ocfs2" options="rw,noatime" \
> op monitor interval="120s"
> primitive resO2CB ocf:ocfs2:o2cb \
> op monitor interval="120s"
> group DRBD_01 resDRBD_0 resDRBD_1
> ms msDRBD_01 DRBD_01 \
> meta resource-stickines="100" notify="true" master-max="2"
> interleave="true" target-role="Started"
> clone cloneDLM resDLM \
> meta globally-unique="false" interleave="true"
> target-role="Started"
> clone cloneFS_BACKUP resFS_BACKUP \
> meta interleave="true" ordered="true" target-role="Started"
> clone cloneFS_EXPORT resFS_EXPORT \
> meta interleave="true" ordered="true" target-role="Started"
> clone cloneO2CB resO2CB \
> meta globally-unique="false" interleave="true"
> target-role="Started"
> colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master colocation 
> colFS_BACKUP-O2CB inf: cloneFS_BACKUP cloneO2CB colocation
> colFS_EXPORT-O2CB inf: cloneFS_EXPORT cloneO2CB colocation colO2CBDLM inf: 
> cloneO2CB cloneDLM order ordDLMO2CB 0: cloneDLM cloneO2CB
> order ordDRBDDLM 0: msDRBD_01:promote cloneDLM:start order ordO2CB-FS_BACKUP 
> 0: cloneO2CB cloneFS_BACKUP order ordO2CB-FS_EXPORT 0:
> cloneO2CB cloneFS_EXPORT
>
> As the DRBD devices were brought up by hand, Pacemaker doesn't recognize they 
> are up and so it doesn't start the DLM resource and
> all resources that depends on it stay stopped.
> Is there any way I can circumvent this issue?
> Is it possible to bring the OCFS2 resources working on standalone mode?
> Please, any help will be very welcome.
>
> Best regards,
> Carlos.
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: RES: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it

2016-05-27 Thread emmanuel segura

Hi,

But the latest lvm version doesn't worries about the aligned?

2016-05-27 18:37 GMT+02:00 Ken Gaillot :
> On 05/27/2016 12:58 AM, Ulrich Windl wrote:
>> Hi!
>>
>> Thanks for this info. We actually run the "noop" scheduler for  the SAN
>> storage (as per menufacturer's recommendation), because on "disk" is actually
>> spread over up to 40 disks.
>> Other settings we changes was:
>> queue/rotational:0
>> queue/add_random:0
>> queue/max_sectors_kb:128 (manufacturer's recommendation, before up to 1MB
>> transfers were seen)
>> queue/read_ahead_kb:0
>>
>> And we apply those setting (where available) the the whole stack (disk
>> devices, multipath device, LV).
>>
>> Regards,
>> Ulrich
>
> I don't have anything to add about clvm specifically, but some general
> RAID tips that are often overlooked:
>
> If you're using striped RAID (i.e. >1), it's important to choose a
> stripe size wisely and make sure everything is aligned with it. Somewhat
> counterintuitively, smaller stripe sizes are better for large reads and
> writes, while larger stripe sizes are better for small reads and writes.
> There's a big performance penalty by setting a stripe size too small,
> but not much penalty from setting it too large.
>
> Things that should be aligned:
>
> * Partition sizes. A disk's first usable partition will generally start
> at (your stripe size in kilobytes * 2) sectors.
>
> * LVM physical volume metadata (via the --metadatasize option to
> pvcreate). It will set the metadata size to the next 64K boundary above
> the value, so set it to be just under the size you want, ex.
> --metadatasize 1.99M will get a metadata size of 2MB.
>
> * The filesystem creation options (varies by fs type). For example, with
> ext3/ext4, where N1 is stripe size in kilobytes / 4, and N2 is $N1 times
> the number of nonparity disks in the array, use -E
> stride=$N1,stripe-width=$N2. For xfs, where STRIPE is the stripe size in
> kilobytes and NONPARITY is the number of nonparity disks in the array,
> use -d su=${STRIPE}k,sw=${NONPARITY} -l su=${STRIPE}k.
>
> If your RAID controller has power backup (BBU or supercapacitor), mount
> filesystems with the nobarrier option.
>
> "Carlos Xavier"  schrieb am 25.05.2016 um 22:25
>> in
>> Nachricht <01da01d1b6c3$8f5c3dc0$ae14b940$@com.br>:
>>> Hi.
>>>
>>> I have been running OCFS2 on clusters for quite long time.
>>> We started running it over DRBD and now we have it running on a Dell
>>> storage.
>>> Over DRBD it showed a very poor performance, most because the way DRBD
>>> works.
>>> To improve the performance we had to change the I/O Scheduler of the disk to
>>
>>> "Deadline"
>>>
>>> When we migrate the system to the storage, the issue show up again.
>>> Sometimes the system was hanging due to disk access, to solve the issue I
>>> changed the I/O Schedule To Deadline and the trouble vanished.
>>>
>>> Regards,
>>> Carlos.
>>>
>>>
 -Mensagem original-
 De: Kristoffer Grönlund [mailto:kgronl...@suse.com]
 Enviada em: quarta-feira, 25 de maio de 2016 06:55
 Para: Ulrich Windl; users@clusterlabs.org
 Assunto: Re: [ClusterLabs] Performance of a mirrored LV (cLVM) with OCFS:
>>> Attempt to monitor it

 Ulrich Windl  writes:

> cLVM has never made a good impression regarding performance, so I wonder
>> if
>>> there's anything we
 could do to improve the4 performance. I suspect that one VM paging heavily
>>
>>> on OCFS2 kills the
 performance of the whole cluster (that hosts Xen PV guests only). Anyone
>>> with deeper insights?

 My understanding is that this is a problem inherent in the design of CLVM
>>> and there is work ongoing to
 mitigate this by handling clustering in md instead. See this LWN article
>> for
>>> more details:

 http://lwn.net/Articles/674085/

 Cheers,
 Kristoffer

 --
 // Kristoffer Grönlund
 // kgronl...@suse.com
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: ringid interface FAULTY no resource move

2016-05-04 Thread emmanuel segura

use fencing and drbd fencing handler

2016-05-04 14:46 GMT+02:00 Rafał Sanocki :
> Resources shuld move to second node when any  interface is down.
>
>
>
>
> W dniu 2016-05-04 o 14:41, Ulrich Windl pisze:
>
> Rafal Sanocki  schrieb am 04.05.2016 um 14:14
> in
>>
>> Nachricht <78d882b1-a407-31e0-2b9e-b5f8406d4...@gmail.com>:
>>>
>>> Hello,
>>> I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker
>>> , DRBD .  When i plug out cable nothing happend.
>>
>> "nothing"? The wrong cable?
>>
>> [...]
>>
>> Regards,
>> Ulrich
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Set "start-failure-is-fatal=false" on only one resource?

2016-03-25 Thread emmanuel segura

try to use on-fail for single resource.

2016-03-25 0:22 GMT+01:00 Adam Spiers :
> Sam Gardner  wrote:
>> I'm having some trouble on a few of my clusters in which the DRBD Slave 
>> resource does not want to come up after a reboot until I manually run 
>> resource cleanup.
>>
>> Setting 'start-failure-is-fatal=false' as a global cluster property and a 
>> failure-timeout works to resolve the issue, but I don't really want the 
>> start failure set everywhere.
>>
>> While I work on figuring out why the slave resource isn't coming up, is it 
>> possible to set 'start-failure-is-fatal=false'  only on the DRBDSlave 
>> resource, or does this need a patch?
>
> No, start-failure-is-fatal is a cluster-wide setting.  But IIUC you
> could also set migration-threshold=1 cluster-wide (i.e. in
> rsc_defaults), and then override it to either 0 or something higher
> just for this resource.  You may find this interesting reading:
>
> https://github.com/crowbar/crowbar-ha/pull/102/commits/de94e1e42ba52c2cdb496becbd73f07bc2501871
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Issue with Stonith Resource parameters

2016-03-08 Thread emmanuel segura

I think you should give the parameters to the stonith agent, anyway
show your config.

2016-03-09 5:29 GMT+01:00 vija ar :
> I have configured SLEHA cluster on cisco ucs boxes with ipmi configured, i
> have tested IPMI using impitool, however ipmitool to function neatly i have
> to pass parameter -y i.e.  along with username and password,
>
> however  to configure stonith there is no parameter in pacemaker to pass
> ? and due to which stonith is failing
>
> can you please let me know if there is any way to add it or is this a bug?
>
> ***
>
>
>
> Mar  9 00:26:28 server02 stonith: external_status: 'ipmi status' failed with
> rc 1
> Mar  9 00:26:28 server02 stonith: external/ipmi device not accessible.
> Mar  9 00:26:28 server02 stonith-ng[99114]:   notice: log_operation:
> Operation 'monitor' [99200] for device 'STONITH-server02' returned: -201
> (Generic Pacemaker error)
> Mar  9 00:26:28 server02 stonith-ng[99114]:  warning: log_operation:
> STONITH-server02:99200 [ Performing: stonith -t external/ipmi -S ]
> Mar  9 00:26:28 server02 stonith-ng[99114]:  warning: log_operation:
> STONITH-server02:99200 [ logd is not runningfailed:  1 ]
> Mar  9 00:26:28 server02 crmd[99118]:error: process_lrm_event: LRM
> operation STONITH-server02_start_0 (call=13, status=4, cib-update=13,
> confirmed=true) Error
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-STONITH-server02 (INFINITY)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 35: fail-count-STONITH-server02=INFINITY
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-STONITH-server02
> (1457463388)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 37: last-failure-STONITH-server02=1457463388
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-STONITH-server02 (INFINITY)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 39: fail-count-STONITH-server02=INFINITY
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-STONITH-server02
> (1457463388)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 41: last-failure-STONITH-server02=1457463388
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-STONITH-server02 (INFINITY)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 43: fail-count-STONITH-server02=INFINITY
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-STONITH-server02
> (1457463388)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 45: last-failure-STONITH-server02=1457463388
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-STONITH-server02 (INFINITY)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 47: fail-count-STONITH-server02=INFINITY
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_cs_dispatch: Update
> relayed from server01
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-STONITH-server02
> (1457463388)
> Mar  9 00:26:28 server02 attrd[99116]:   notice: attrd_perform_update: Sent
> update 49: last-failure-STONITH-server02=1457463388
> Mar  9 00:26:28 server02 crmd[99118]:   notice: process_lrm_event: LRM
> operation STONITH-server02_stop_0 (call=14, rc=0, cib-update=14,
> confirmed=true) ok
> Mar  9 00:26:28 server01 crmd[16809]:  warning: status_from_rc: Action 9
> (STONITH-server02_start_0) on server02 failed (target: 0 vs. rc: 1): Error
> Mar  9 00:26:28 server01 crmd[16809]:  warning: update_failcount: Updating
> failcount for STONITH-server02 on server02 after failed start: rc=1
> (update=INFINITY, time=1457463388)
> Mar  9 00:26:28 server01 crmd[16809]:  warning: update_failcount: Updating
>

Re: [ClusterLabs] Pacemaker issue lsb service

2016-03-05 Thread emmanuel segura

If you need help, the first thing that you need to do is show your cluster logs.

2016-03-05 15:17 GMT+01:00 Thorsten Stremetzne :
> Hello all,
>
> I have built an HA setup for a OpenVPN server.
> In my setup there are two hosts, running Ubuntu Linux, pacemaker &
> chorosync. Also both hosts have a virtual IP which migrates to the host that
> is active, when the other fails. This works well, but I also configured a
> primitive for the openvpn-server init scrip, via
>
> crm configure primitive failover-openvpnas lsb::openvpnas op monitor
> interval=15s
>
> The service will be added, but it will always fail, due to the syslog, the
> init script will be called in a wrong way.
> I'm in troubles debugging how pacemaker will try to start/stop the service
> on the hosts.
>
> Can someone please assist me with some ideas and suggestions?
>
> Thanks very much
>
> Thorsten
>
>
> Diese E-Mail kann vertrauliche und/oder rechtlich geschützte Informationen
> enthalten. Wenn Sie nicht der beabsichtigte Empfänger sind oder diese E-Mail
> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender
> telefonisch oder per E-Mail und löschen Sie diese E-Mail aus Ihrem System.
> Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht
> gestattet.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker issue when ethernet interface is pulled down

2016-02-14 Thread emmanuel segura

use fence and after you configured the fencing you need to use
iptables for testing your cluster, with iptables you can block 5404
and 5405 ports

2016-02-14 14:09 GMT+01:00 Debabrata Pani :
> Hi,
> We ran into some problems when we pull down the ethernet interface using
> “ifconfig eth0 down”
>
> Our cluster has the following configurations and resources
>
> Two  network interfaces : eth0 and lo(cal)
> 3 nodes with one node put in maintenance mode
> No-quorum-policy=stop
> Stonith-enabled=false
> Postgresql Master/Slave
> vip master and vip replication IPs
> VIPs will run on the node where Postgresql Master is running
>
>
> Two test cases that we executed are as follows
>
> Introduce delay in the ethernet interface o f the postgresql PRIMARY node
> (Command  : tc qdisc add dev eth0 root netem delay 8000ms)
> `Ifconfig eth0 down` on the postgresql PRIMARY Node
> We expected that both these test cases test for network problems in the
> cluster
>
>
> In the first case (ethernet interface delay)
>
> Cluster is divided into “partition WITH quorum” and “partition WITHOUT
> quorum”
> Partition WITHOUT quorum shuts down all the services
> Partition WITH quorum takes over as Postgresql PRIMARY and VIPs
> Everything as expected. Wow !
>
>
> In the second case (ethernet interface down)
>
> We see lots of errors like the following . On the node
>
> Feb 12 14:09:48 corosync [MAIN  ] Totem is unable to form a cluster because
> of an operating system or network fault. The most common cause of this
> message is that the local firewall is configured improperly.
> Feb 12 14:09:49 corosync [MAIN  ] Totem is unable to form a cluster because
> of an operating system or network fault. The most common cause of this
> message is that the local firewall is configured improperly.
> Feb 12 14:09:51 corosync [MAIN  ] Totem is unable to form a cluster because
> of an operating system or network fault. The most common cause of this
> message is that the local firewall is configured improperly.
>
> But the `crm_mon –Afr` (from the node whose eth0 is down)  always shows the
> cluster to be fully formed.
>
> It shows all the nodes as UP
> It shows itself as the one running the postgresql PRIMARY  (as was the case
> before putting the ethernet interface is down)
>
> `crm_mon -Afr` on the OTHER nodes show a different story
>
> They show the other node as down
> One of the other two nodes takes over the postgresql PRIMARY
>
> This leads to a split brain situation which was gracefully avoided in the
> test case where only “delay is introduced into the interface”
>
>
> Questions :
>
>  Is it a known issue with pacemaker when the ethernet interface is pulled
> down ?
> Is it an incorrect way of testing the cluster ? There is some information
> regarding the same in this thread
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/59738
>
>
> Regards,
> Deba
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] HA configuration

2016-02-04 Thread emmanuel segura

you need to be sure that your redis resources has master/slave support
and I think this colocation need to be invert

colocation resource_location1 inf: redis_clone:Master kamailio

to

colocation resource_location1 inf: kamailio redis_clone:Master

You need a order too:

order resource_order1 inf: redis_clone:promote kamailio:start

Anyway if you want to make more simple your config, make a group:

group mygroup myresource myvip

colocation resource_location1 inf: mygroup redis_clone:Master
order resource_order1 inf: redis_clone:promote mygroup:start

2016-02-04 11:14 GMT+01:00 Rishin Gangadharan :
> Hi All,
>
>  Could you please help me for  the corosync/pacemaker configuration with
> crmsh.
>
>
>
> My requirments
>
>   I have three resources
>
> 1.   VIP
>
> 2.   Kamailio
>
> 3.   Redis DB
>
> I want to configure HA for kamailo with VIP and Redis Master/Slave mode.i
> have configured VIP and kamailio and its working fine, ie when kamailio
> process fails VIP will switch to another machine and start kamailio.
>
> When kamailio fails first I want to move VIP then Redis and redis must
> switch to Master and Active node should be slave
>
>
>
> Ie  Node 1 : Active  (Running Resourcses VIP,Redis:Master ,Kamailio)
>
>  Node 2 : Passive ( Redis as slave)
>
>
>
> My aim is  when Kamailio or any resource in Node1 fails it should be like
> this
>
>
>
> Node 2 : Active  (Running Resourcses VIP,Redis:Master ,Kamailio)
>
>  Node 1 : Passive ( Redis as slave)
>
> Crm configure edit
>
>
>
> node PCSCF
>
> node PCSCF18
>
> primitive VIP IPaddr2 \
>
> params ip=10.193.30.28 nic=eth0 \
>
> op monitor interval=2s \
>
> meta is-managed=true target-role=Started
>
> primitive kamailio ocf:kamailio:kamailio_ra \
>
> op start interval=5s \
>
> op monitor interval=2s \
>
> meta migration-threshold=1 failure-timeout=5s
>
> primitive redis ocf:kamailio:redis \
>
> meta target-role=Master is-managed=true \
>
> op monitor interval=1s role=Master timeout=5s on-fail=restart \
>
> op monitor interval=1s role=Slave timeout=5s on-fail=restart
>
> ms redis_clone redis \
>
> meta notify=true is-managed=true ordered=false interleave=false
> globally-unique=false target-role=Stopped migration-threshold=1
>
> colocation resource_location inf: kamailio VIP
>
> colocation resource_location1 inf: redis_clone:Master kamailio
>
> order resource_starting_order inf: VIP kamailio
>
> property cib-bootstrap-options: \
>
> dc-version=1.1.11-97629de \
>
> cluster-infrastructure="classic openais (with plugin)" \
>
> expected-quorum-votes=3 \
>
> stonith-enabled=false \
>
> no-quorum-policy=ignore \
>
> last-lrm-refresh=1454577107
>
> property redis_replication: \
>
> redis_REPL_INFO=PCSCF
>
>
>
>
>
> 
> 
> Disclaimer: This message and the information contained herein is proprietary
> and confidential and subject to the Tech Mahindra policy statement, you may
> review the policy at http://www.techmahindra.com/Disclaimer.html externally
> http://tim.techmahindra.com/tim/disclaimer.html internally within
> TechMahindra.
> 
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DLM not working on my GFS2/pacemaker cluster

2016-01-19 Thread emmanuel segura

please share your cluster config and say if your fencing is working.

2016-01-19 3:47 GMT+01:00  :
> One of my clusters is having a problem. It's no longer able to set up its
> GFS2 mounts. I've narrowed the problem down a bit. Here's the output when I
> try to start the DLM daemon (Normally this is something corosync/pacemaker
> starts up for me, but here it is on the command line for the debug output):
>   # dlm_controld -D -q 04561 dlm_controld 4.0.1 started
>   4561 our_nodeid 168528918
>   4561 found /dev/misc/dlm-control minor 56
>   4561 found /dev/misc/dlm-monitor minor 55
>   4561 found /dev/misc/dlm_plock minor 54
>   4561 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
>   4561 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
>   4561 cmap totem.rrp_mode = 'none'
>   4561 set protocol 0
>   4561 set recover_callbacks 1
>   4561 cmap totem.cluster_name = 'cwwba'
>   4561 set cluster_name cwwba
>   4561 /dev/misc/dlm-monitor fd 11
>   4561 cluster quorum 1 seq 672 nodes 2
>   4561 cluster node 168528918 added seq 672
>   4561 set_configfs_node 168528918 10.11.140.22 local 1
>   4561 /sys/kernel/config/dlm/cluster/comms/168528918/addr: open failed: 1
>   4561 cluster node 168528919 added seq 672
>   4561 set_configfs_node 168528919 10.11.140.23 local 0
>   4561 /sys/kernel/config/dlm/cluster/comms/168528919/addr: open failed: 1
>   4561 cpg_join dlm:controld ...
>   4561 setup_cpg_daemon 13
>   4561 dlm:controld conf 1 1 0 memb 168528918 join 168528918 left
>   4561 daemon joined 168528918
>   4561 fence work wait for cluster ringid
>   4561 dlm:controld ring 168528918:672 2 memb 168528918 168528919
>   4561 fence_in_progress_unknown 0 startup
>   4561 receive_protocol 168528918 max 3.1.1.0 run 0.0.0.0
>   4561 daemon node 168528918 prot max 0.0.0.0 run 0.0.0.0
>   4561 daemon node 168528918 save max 3.1.1.0 run 0.0.0.0
>   4561 set_protocol member_count 1 propose daemon 3.1.1 kernel 1.1.1
>   4561 receive_protocol 168528918 max 3.1.1.0 run 3.1.1.0
>   4561 daemon node 168528918 prot max 3.1.1.0 run 0.0.0.0
>   4561 daemon node 168528918 save max 3.1.1.0 run 3.1.1.0
>   4561 run protocol from nodeid 168528918
>   4561 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1
>   4561 plocks 14
>   4561 receive_protocol 168528918 max 3.1.1.0 run 3.1.1.0
>
> As you can see, it's trying to configure the node addresses, but it's unable
> to write to the 'addr' file under the /sys/kernel/config configfs tree (See
> the 'open failed' lines above). I have no idea why. dmesg isn't saying
> anything. Nothing is telling me why it doesn't want me writing there. And I
> can confirm this behavior on the prompt as well.
>
> Trying to start CLVM results in complaints about the node not having an
> address set, which makes sense given the
>
> Here's the exact same command run twice. First, on a very similarly
> configured cluster (which is currently running):
>   # cat /sys/kernel/config/dlm/cluster/comms/169446438/addrcat
>   cat: /sys/kernel/config/dlm/cluster/comms/169446438/addr: Permission
> denied
> (That's what I expect to see. It's a write-only file.)
>
> And now on this messed up cluster:
>   # cat /sys/kernel/config/dlm/cluster/comms/168528918/addr
>   cat: /sys/kernel/config/dlm/cluster/comms/168528918/addr: Operation not
> permitted
>
> Why 'operation not permitted'? dmesg isn't telling me anything at all, and I
> don't see any way to get the kernel to spit out some kind of explanation for
> why it's blocking me. Can anyone help? At least point me in a direction
> where I can get the system to give me some indication why it's behaving this
> way?
>
> I'm running Ubuntu 14.04, and I've posted this on the Ubuntu forums as well:
> http://ubuntuforums.org/showthread.php?t=2310383
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DLM not working on my GFS2/pacemaker cluster

2016-01-19 Thread emmanuel segura

dlm_tool dump ?

2016-01-19 15:25 GMT+01:00  <dan...@benoy.name>:
> Yes, fencing is working, and SELinux is disabled.
>
> What configuration details do you require?
>
> Here's my corosync.conf: http://pastebin.com/SD1Gbdj0
> Here's my output from 'crm configure show': http://pastebin.com/eAiq2yJ9
>
> Another cluster is running fine with an identical configuration.
>
> On 2016-01-19 03:49, emmanuel segura wrote:
>>
>> please share your cluster config and say if your fencing is working.
>>
>> 2016-01-19 3:47 GMT+01:00  <dan...@benoy.name>:
>>>
>>> One of my clusters is having a problem. It's no longer able to set up its
>>> GFS2 mounts. I've narrowed the problem down a bit. Here's the output when
>>> I
>>> try to start the DLM daemon (Normally this is something
>>> corosync/pacemaker
>>> starts up for me, but here it is on the command line for the debug
>>> output):
>>>   # dlm_controld -D -q 04561 dlm_controld 4.0.1 started
>>>   4561 our_nodeid 168528918
>>>   4561 found /dev/misc/dlm-control minor 56
>>>   4561 found /dev/misc/dlm-monitor minor 55
>>>   4561 found /dev/misc/dlm_plock minor 54
>>>   4561 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
>>>   4561 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
>>>   4561 cmap totem.rrp_mode = 'none'
>>>   4561 set protocol 0
>>>   4561 set recover_callbacks 1
>>>   4561 cmap totem.cluster_name = 'cwwba'
>>>   4561 set cluster_name cwwba
>>>   4561 /dev/misc/dlm-monitor fd 11
>>>   4561 cluster quorum 1 seq 672 nodes 2
>>>   4561 cluster node 168528918 added seq 672
>>>   4561 set_configfs_node 168528918 10.11.140.22 local 1
>>>   4561 /sys/kernel/config/dlm/cluster/comms/168528918/addr: open failed:
>>> 1
>>>   4561 cluster node 168528919 added seq 672
>>>   4561 set_configfs_node 168528919 10.11.140.23 local 0
>>>   4561 /sys/kernel/config/dlm/cluster/comms/168528919/addr: open failed:
>>> 1
>>>   4561 cpg_join dlm:controld ...
>>>   4561 setup_cpg_daemon 13
>>>   4561 dlm:controld conf 1 1 0 memb 168528918 join 168528918 left
>>>   4561 daemon joined 168528918
>>>   4561 fence work wait for cluster ringid
>>>   4561 dlm:controld ring 168528918:672 2 memb 168528918 168528919
>>>   4561 fence_in_progress_unknown 0 startup
>>>   4561 receive_protocol 168528918 max 3.1.1.0 run 0.0.0.0
>>>   4561 daemon node 168528918 prot max 0.0.0.0 run 0.0.0.0
>>>   4561 daemon node 168528918 save max 3.1.1.0 run 0.0.0.0
>>>   4561 set_protocol member_count 1 propose daemon 3.1.1 kernel 1.1.1
>>>   4561 receive_protocol 168528918 max 3.1.1.0 run 3.1.1.0
>>>   4561 daemon node 168528918 prot max 3.1.1.0 run 0.0.0.0
>>>   4561 daemon node 168528918 save max 3.1.1.0 run 3.1.1.0
>>>   4561 run protocol from nodeid 168528918
>>>   4561 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1
>>>   4561 plocks 14
>>>   4561 receive_protocol 168528918 max 3.1.1.0 run 3.1.1.0
>>>
>>> As you can see, it's trying to configure the node addresses, but it's
>>> unable
>>> to write to the 'addr' file under the /sys/kernel/config configfs tree
>>> (See
>>> the 'open failed' lines above). I have no idea why. dmesg isn't saying
>>> anything. Nothing is telling me why it doesn't want me writing there. And
>>> I
>>> can confirm this behavior on the prompt as well.
>>>
>>> Trying to start CLVM results in complaints about the node not having an
>>> address set, which makes sense given the
>>>
>>> Here's the exact same command run twice. First, on a very similarly
>>> configured cluster (which is currently running):
>>>   # cat /sys/kernel/config/dlm/cluster/comms/169446438/addrcat
>>>   cat: /sys/kernel/config/dlm/cluster/comms/169446438/addr: Permission
>>> denied
>>> (That's what I expect to see. It's a write-only file.)
>>>
>>> And now on this messed up cluster:
>>>   # cat /sys/kernel/config/dlm/cluster/comms/168528918/addr
>>>   cat: /sys/kernel/config/dlm/cluster/comms/168528918/addr: Operation not
>>> permitted
>>>
>>> Why 'operation not permitted'? dmesg isn't telling me anything at all,
>>> and I
>>> don't see any way to get the kernel to spit out some kind of explanation
>>> for
>>> why it's blocking me. Can anyone help? At least point me in a direction
>>> where I can get the system to give me some ind

Re: [ClusterLabs] Cluster resources -restart automatically

2016-01-11 Thread emmanuel segura

you can use on-fail in the stop operation and for your other questions
you can use colocation + order or better if you use a group: for
example group mygroup resource1 resource2

When resource1 monitor fails the resource2 restarts

2016-01-11 17:09 GMT+01:00 John Gogu :
> Dear all,
>  I have following situation and I need an advice from you:
>
> 2 resources which run on 2 cluster nodes (Centos6.7 pacemaker + pcs)
> Node_ANode_B
> Resource1Resource2
>
> 1. is possible to configure an restart of Resource2 when Resource1 fail, or
> is moved to Node_B due to a failure of Node_A?
> 2. when pacemaker cannot stop a resource, default action on stop_fail is
> fence, can I configure to ignore?
>
>
> Thank you,
> John Gogu
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] SBD Latency Warnings

2015-12-30 Thread emmanuel segura

I'm not sbd expert but I try to describe one of this warnings.

sbd: WARN: Pacemaker state outdated (age: 4)

in sbd source code "./src/sbd-md.c"



good_servants = 0;
for (s = servants_leader; s; s = s->next) {
  int age = t_now.tv_sec - s->t_last.tv_sec;

  if (!s->t_last.tv_sec)
continue;

  if (age < (int)(timeout_io+timeout_loop)) {  # if the sbd
process was scheduled in timeslide < 4 nothing is printed
if (strcmp(s->devname, "pcmk") != 0) {
  good_servants++;
}
s->outdated = 0;
  } else if (!s->outdated) {
if (strcmp(s->devname, "pcmk") == 0) {
  /* If the state is outdated, we
   * override the last reported
   * state */
  pcmk_healthy = 0;
  cl_log(LOG_WARNING, "Pacemaker state outdated (age: %d)",
# but the sbd was scheduled with a timeslide > 4 seconds
age);
} else if (!s->restart_blocked) {
  cl_log(LOG_WARNING, "Servant for %s outdated (age: %d)",
s->devname, age);
}
s->outdated = 1;
  }
}



sbd --help

-5 Warn if loop latency exceeds threshold (optional, watch only)
(default is 3, set to 0 to disable)

sbd -d /dev/mydevicepath dump | grep -i loop
Timeout (loop) : 1

this is only a warning and If you want you can ignore, but looking the
source code, this means that sbd process wasn't scheduled for more
than 4 seconds

From what I know the sbd process has realtime attribute:

 ps -eo pid,class,comm | grep sbd
 6639 RR  sbd
 6640 RR  sbd
 6641 RR  sbd

So this problem is very clear, your host doesn't give cpu time to your guest.

2015-12-30 17:53 GMT+01:00 Jorge Fábregas :
> Hi,
>
> We're having some issues with a particular oversubscribed hypervisor
> (cpu-wise) where we run SLES 11 SP4 guests.  I had to increase many
> timeouts on the cluster to cope with this:
>
> - Corosync's token timeout (from the default of 5 secs to 30 seconds)
> - SBD's watchdog & msgwait (from 15/30 to 30/60 respectively)
> - Pacemaker's resource-monitoring timeouts
>
> I know the consequence for doing all this will be *slow reaction times*
>  but it's all I can do in the meantime.
>
> However, when the hypervisor is at 100% full CPU utilization I still get
> these messages:
>
> sbd: :WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_prepare_int: working on IPC channel took 220 ms (> 100 ms)
> sbd: WARN: Pacemaker state outdated (age: 4)
> sbd: info: Pacemaker health check: OK
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> logd: WARN: G_CH_check_int: working on IPC channel took 150 ms (> 100 ms)
> sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd
> sbd: WARN: Servant for /dev/mapper/clustersbd outdated (age: 5)
> sbd: WARN: Majority of devices lost - surviving on pacemaker
>
> Is this latency configurable? It keeps mentioning "threshold 3". Is that
> 3 seconds? How does it relates to the following parameters ?
>
> ==Dumping header on disk /dev/mapper/clustersbd
> Header version : 2.1
> UUID   : 54597871-2392-475f-ba2d-71bdf92c36b5
> Number of slots: 255
> Sector size: 512
> Timeout (watchdog) : 30
> Timeout (allocate) : 2
> Timeout (loop) : 1
> Timeout (msgwait)  : 60
> ==Header on disk /dev/mapper/clustersbd is dumped
>
> I'm using the -P option with sbd so I know it will not fence the system
> as long as the node's health is ok (as reported by Pacemaker).  I'd
> still like to find out if the latency mentioned is configurable or is it
> safe to ignore.
>
> Thanks!
>
> Regards,
> Jorge
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] start service after filesystemressource

2015-11-20 Thread emmanuel segura

using group is more simple

example:

group mygroup resource1 resource2 resource 3
order o_drbd_before_services inf: ms_drbd_export:promote mygroup:start

2015-11-20 15:45 GMT+01:00 Andrei Borzenkov :
> 20.11.2015 16:38, haseni...@gmx.de пишет:
>
>> Hi,
>> I want to start several services after the drbd ressource an the
>> filessystem is
>> avaiable. This is my current configuration:
>> node $id="184548773" host-1 \
>>   attributes standby="on"
>> node $id="184548774" host-2 \
>>   attributes standby="on"
>> primitive collectd lsb:collectd \
>>   op monitor interval="10" timeout="30" \
>>   op start interval="0" timeout="120" \
>>   op stop interval="0" timeout="120"
>> primitive failover-ip1 ocf:heartbeat:IPaddr \
>>   params ip="192.168.6.6" nic="eth0:0" cidr_netmask="32" \
>>   op monitor interval="10s"
>> primitive failover-ip2 ocf:heartbeat:IPaddr \
>>   params ip="192.168.6.7" nic="eth0:1" cidr_netmask="32" \
>>   op monitor interval="10s"
>> primitive failover-ip3 ocf:heartbeat:IPaddr \
>>   params ip="192.168.6.8" nic="eth0:2" cidr_netmask="32" \
>>   op monitor interval="10s"
>> primitive res_drbd_export ocf:linbit:drbd \
>>   params drbd_resource="hermes"
>> primitive res_fs ocf:heartbeat:Filesystem \
>>   params device="/dev/drbd0" directory="/mnt" fstype="ext4"
>> group mygroup failover-ip1 failover-ip2 failover-ip3 collectd
>> ms ms_drbd_export res_drbd_export \
>>   meta notify="true" master-max="1" master-node-max="1"
>> clone-max="2"
>> clone-node-max="1"
>> location cli-prefer-collectd collectd inf: host-1
>> location cli-prefer-failover-ip1 failover-ip1 inf: host-1
>> location cli-prefer-failover-ip2 failover-ip2 inf: host-1
>> location cli-prefer-failover-ip3 failover-ip3 inf: host-1
>> location cli-prefer-res_drbd_export res_drbd_export inf: hermes-1
>> location cli-prefer-res_fs res_fs inf: host-1
>> colocation c_export_on_drbd inf: mygroup res_fs ms_drbd_export:Master
>> order o_drbd_before_services inf: ms_drbd_export:promote res_fs:start
>> property $id="cib-bootstrap-options" \
>>   dc-version="1.1.10-42f2063" \
>>   cluster-infrastructure="corosync" \
>>   stonith-enabled="false" \
>>   no-quorum-policy="ignore" \
>>   last-lrm-refresh="1447686090"
>> #vim:set syntax=pcmk
>> I don't found the right way, to order the startup of new services (example
>> collectd), after the /mnt is mounted.
>
>
> Just order them after res_fs, same as you order res_fs after ms_drbd_export.
> Or may be I misunderstand your question?
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] split brain cluster

2015-11-16 Thread emmanuel segura

http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08.html
and 
https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md
anyway you can use pcs config if you are using redhat

2015-11-16 15:01 GMT+01:00 Richard Korsten <rich...@rkorsten.nl>:
> Hi Emmanuel,
>
> I'm not sure, how can i check it?
>
> Greetings Richard
>
> Op ma 16 nov. 2015 om 14:58 schreef emmanuel segura <emi2f...@gmail.com>:
>>
>> you configured the stonith?
>>
>> 2015-11-16 14:43 GMT+01:00 Richard Korsten <rich...@rkorsten.nl>:
>> > Hello Cluster guru's.
>> >
>> > I'm having a bit of trouble with a cluster of ours. After an outage of 1
>> > node it went into a split brain situation where both nodes aren't
>> > talking to
>> > each other. Both say the other node is offline. I've tried to get them
>> > both
>> > up and running again by stopping and starting the cluster services on
>> > both
>> > nodes, one at a time. with out luck.
>> >
>> > I've been trying to reproduce the problem with a set of test servers but
>> > i
>> > can't seem to get it in the same state.
>> >
>> > Because of this i'm looking for some help because i'm not that known
>> > with
>> > pacemaker/corosync.
>> >
>> > this is the output of the command pcs status:
>> > Cluster name: MXloadbalancer
>> > Last updated: Mon Nov 16 10:18:44 2015
>> > Last change: Fri Nov 6 15:35:22 2015
>> > Stack: corosync
>> > Current DC: bckilb01 (1) - partition WITHOUT quorum
>> > Version: 1.1.12-a14efad
>> > 2 Nodes configured
>> > 3 Resources configured
>> >
>> > Online: [ bckilb01 ]
>> > OFFLINE: [ bckilb02 ]
>> >
>> > Full list of resources:
>> >  haproxy (systemd:haproxy): Stopped
>> >
>> > Resource Group: MXVIP
>> > ip-192.168.250.200 (ocf::heartbeat:IPaddr2): Stopped
>> > ip-192.168.250.201 (ocf::heartbeat:IPaddr2): Stopped
>> >
>> > PCSD Status:
>> > bckilb01: Online
>> > bckilb02: Online
>> >
>> > Daemon Status:
>> > corosync: active/enabled
>> > pacemaker: active/enabled
>> > pcsd: active/enabled
>> >
>> >
>> > And the config:
>> > totem {
>> > version: 2
>> > secauth: off
>> > cluster_name: MXloadbalancer
>> > transport: udpu }
>> >
>> > nodelist {
>> > node { ring0_addr: bckilb01 nodeid: 1 }
>> > node { ring0_addr: bckilb02 nodeid: 2 } }
>> > quorum { provider: corosync_votequorum two_node: 1 }
>> > logging { to_syslog: yes }
>> >
>> > If any has an idea about how to get them working together again please
>> > let
>> > me know.
>> >
>> > Greetings Richard
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>>
>> --
>>   .~.
>>   /V\
>>  //  \\
>> /(   )\
>> ^`~'^
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] split brain cluster

2015-11-16 Thread emmanuel segura

Hi,

When you configure a cluster, the first thing you need to think about
is stonith aka fencing.

Thanks
Emmanuel

2015-11-16 15:18 GMT+01:00 Richard Korsten <rich...@rkorsten.nl>:
> Hi Emmanuel,
>
> No stonith is not enabled. And yes i'm on a redhat based system.
>
> Greetings.
>
> Op ma 16 nov. 2015 om 15:09 schreef emmanuel segura <emi2f...@gmail.com>:
>>
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08.html
>> and
>> https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md
>> anyway you can use pcs config if you are using redhat
>>
>> 2015-11-16 15:01 GMT+01:00 Richard Korsten <rich...@rkorsten.nl>:
>> > Hi Emmanuel,
>> >
>> > I'm not sure, how can i check it?
>> >
>> > Greetings Richard
>> >
>> > Op ma 16 nov. 2015 om 14:58 schreef emmanuel segura
>> > <emi2f...@gmail.com>:
>> >>
>> >> you configured the stonith?
>> >>
>> >> 2015-11-16 14:43 GMT+01:00 Richard Korsten <rich...@rkorsten.nl>:
>> >> > Hello Cluster guru's.
>> >> >
>> >> > I'm having a bit of trouble with a cluster of ours. After an outage
>> >> > of 1
>> >> > node it went into a split brain situation where both nodes aren't
>> >> > talking to
>> >> > each other. Both say the other node is offline. I've tried to get
>> >> > them
>> >> > both
>> >> > up and running again by stopping and starting the cluster services on
>> >> > both
>> >> > nodes, one at a time. with out luck.
>> >> >
>> >> > I've been trying to reproduce the problem with a set of test servers
>> >> > but
>> >> > i
>> >> > can't seem to get it in the same state.
>> >> >
>> >> > Because of this i'm looking for some help because i'm not that known
>> >> > with
>> >> > pacemaker/corosync.
>> >> >
>> >> > this is the output of the command pcs status:
>> >> > Cluster name: MXloadbalancer
>> >> > Last updated: Mon Nov 16 10:18:44 2015
>> >> > Last change: Fri Nov 6 15:35:22 2015
>> >> > Stack: corosync
>> >> > Current DC: bckilb01 (1) - partition WITHOUT quorum
>> >> > Version: 1.1.12-a14efad
>> >> > 2 Nodes configured
>> >> > 3 Resources configured
>> >> >
>> >> > Online: [ bckilb01 ]
>> >> > OFFLINE: [ bckilb02 ]
>> >> >
>> >> > Full list of resources:
>> >> >  haproxy (systemd:haproxy): Stopped
>> >> >
>> >> > Resource Group: MXVIP
>> >> > ip-192.168.250.200 (ocf::heartbeat:IPaddr2): Stopped
>> >> > ip-192.168.250.201 (ocf::heartbeat:IPaddr2): Stopped
>> >> >
>> >> > PCSD Status:
>> >> > bckilb01: Online
>> >> > bckilb02: Online
>> >> >
>> >> > Daemon Status:
>> >> > corosync: active/enabled
>> >> > pacemaker: active/enabled
>> >> > pcsd: active/enabled
>> >> >
>> >> >
>> >> > And the config:
>> >> > totem {
>> >> > version: 2
>> >> > secauth: off
>> >> > cluster_name: MXloadbalancer
>> >> > transport: udpu }
>> >> >
>> >> > nodelist {
>> >> > node { ring0_addr: bckilb01 nodeid: 1 }
>> >> > node { ring0_addr: bckilb02 nodeid: 2 } }
>> >> > quorum { provider: corosync_votequorum two_node: 1 }
>> >> > logging { to_syslog: yes }
>> >> >
>> >> > If any has an idea about how to get them working together again
>> >> > please
>> >> > let
>> >> > me know.
>> >> >
>> >> > Greetings Richard
>> >> >
>> >> > ___
>> >> > Users mailing list: Users@clusterlabs.org
>> >> > http://clusterlabs.org/mailman/listinfo/users
>> >> >
>> >> > Project Home: http://www.clusterlabs.org
>> >> > Getting started:
>> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> > Bugs: http://bugs.clusterlabs.org
>> >> >
>> >>
>

51 matches

Mail list logo