Re: [ClusterLabs] Antw: Delayed first monitoring

2015-08-13 Thread Andrei Borzenkov
On Thu, Aug 13, 2015 at 10:01 AM, Miloš Kozák milos.ko...@lejmr.com wrote: However, this does not make sense at all. Presumably, the pacemaker should get along with lsb scripts which comes from system repository, right? Let's forget about pacemaker for a moment. You have system startup where

Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-13 Thread Andrei Borzenkov
On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: And what exactly is your problem? Real life example. Database resource depends on storage resource(s). There are multiple filesystems/volumes with database files. Database admin needs to increase available

Re: [ClusterLabs] circumstances under which resources become unmanaged

2015-08-12 Thread Andrei Borzenkov
On 12.08.2015 20:46, N, Ravikiran wrote: Hi All, I have a resource added to pacemaker called 'cmsd' whose state is getting to 'unmanaged FAILED' state. Apart from manually changing the resource to unmanaged using pcs resource unmanage cmsd , I'm trying to understand under what all

Re: [ClusterLabs] Ordering constraint restart second resource group

2015-08-12 Thread Andrei Borzenkov
On 12.08.2015 19:35, John Gogu wrote: ​Hello, in my cluster configuration I have following situation: resource_group_A ip1 ip2 resource_group_B apache1 ordering constraint resource_group_A then resource_group_B symetrical=true When I add a new resource from group_A, resources

Re: [ClusterLabs] MySQL resource causes error 0_monitor_20000.

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 9:15 AM, Kiwamu Okabe kiw...@debian.or.jp wrote: Hi Andrei, On Tue, Aug 18, 2015 at 2:24 PM, Andrei Borzenkov arvidj...@gmail.com wrote: I made master-master replication on Pacemaker. But it causes error 0_monitor_2. It's not an error, it is just operation name

Re: [ClusterLabs] MySQL resource causes error 0_monitor_20000.

2015-08-17 Thread Andrei Borzenkov
Отправлено с iPhone 18 авг. 2015 г., в 7:19, Kiwamu Okabe kiw...@gmail.com написал(а): Hi all, I made master-master replication on Pacemaker. But it causes error 0_monitor_2. It's not an error, it is just operation name. If one of them boots Heartbeat and another doesn't, the

Re: [ClusterLabs] Antw: Re: MySQL resource causes error 0_monitor_20000.

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 3:34 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Kiwamu Okabe kiw...@debian.or.jp schrieb am 18.08.2015 um 11:48 in Nachricht CAEvX6dky8=_w6l2nhndfbowux+ol7ktaa44salru7a9-xed...@mail.gmail.com: Hi Andrei, On Tue, Aug 18, 2015 at 6:28 PM, Andrei

Re: [ClusterLabs] Mysql M/S, binlogs - how to delete them safely without failing over first?

2015-08-18 Thread Andrei Borzenkov
19.08.2015 00:19, Attila Megyeri пишет: Hi List, We are using M/S replication in a couple of clusters, and there is an issue that has been causing headaches for me for quite some time. My problem comes from the fact that binlog files grow very quickly on both the Master and Slave nodes.

Re: [ClusterLabs] MySQL resource causes error 0_monitor_20000.

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 11:57 AM, Kiwamu Okabe kiw...@debian.or.jp wrote: Hi, On Tue, Aug 18, 2015 at 5:07 PM, Kiwamu Okabe kiw...@debian.or.jp wrote: ``` 2015-08-18 16:50:38 7081 [ERROR] Slave I/O: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids;

Re: [ClusterLabs] SLES 11 SP4 csync2

2015-08-21 Thread Andrei Borzenkov
22.08.2015 08:12, Jorge Fábregas пишет: Hi everyone, I'm trying out SLES 11 SP4 with the High-Availability Extension on two virtual machines. I want to keep things simple I have a question regarding the csync2 tool from SUSE. Considering that: - I'll have just two nodes - I'll be using

Re: [ClusterLabs] Pacemaker tries to demote resource that isn't running and returns OCF_FAILED_MASTER

2015-08-20 Thread Andrei Borzenkov
21.08.2015 00:35, Brian Campbell пишет: I have a master/slave resource (with a custom resource agent) which, if it uncleanly shut down, will return OCF_FAILED_MASTER on the next monitor operation. This seems to be what

Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-16 Thread Andrei Borzenkov
17.08.2015 02:26, Andrew Beekhof пишет: On 13 Aug 2015, at 7:33 pm, Andrei Borzenkov arvidj...@gmail.com wrote: On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: And what exactly is your problem? Real life example. Database resource depends on storage

Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-24 Thread Andrei Borzenkov
24.08.2015 12:35, Tom Yates пишет: I've got a failover firewall pair where the external interface is ADSL; that is, PPPoE. i've defined the service thus: primitive ExternalIP lsb:hb-adsl-helper \ op monitor interval=60s and in addition written a noddy script

Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-24 Thread Andrei Borzenkov
24.08.2015 13:32, Tom Yates пишет: On Mon, 24 Aug 2015, Andrei Borzenkov wrote: 24.08.2015 12:35, Tom Yates пишет: I've got a failover firewall pair where the external interface is ADSL; that is, PPPoE. i've defined the service thus: If stop operation failed resource state is undefined

Re: [ClusterLabs] systemd: xxxx.service start request repeated too quickly

2015-08-04 Thread Andrei Borzenkov
On Tue, Aug 4, 2015 at 4:57 PM, Juha Heinanen j...@tutpro.com wrote: Andrei Borzenkov writes: Not sure I really understand the question. If service cannot run anyway, you can simply remove it from configuration. You can set target state to stopped. You can unmanage it. It all depends on what

Re: [ClusterLabs] starting of resources

2015-08-11 Thread Andrei Borzenkov
On Tue, Aug 11, 2015 at 9:44 AM, Vijay Partha vijaysarath...@gmail.com wrote: Hi, Can we statically add resources to the nodes. I mean before the pacemaker is started can we add resources to the nodes like you dont require to make use of pcs resource create. Is this possible? You better

Re: [ClusterLabs] How to cluster a service with multiple possibilities

2015-07-25 Thread Andrei Borzenkov
В Fri, 24 Jul 2015 14:43:37 + David Gersic dger...@niu.edu пишет: I have a process (OpenSLP slpd) that I'd like to cluster. Unfortunately, this process provides multiple services, depending what it finds in its configuration file on startup. I need to have the process running on all of

Re: [ClusterLabs] Resource cannot run anywhere

2015-07-21 Thread Andrei Borzenkov
On Mon, Jul 20, 2015 at 4:40 PM, Leonhardt,Christian christian.leonha...@dako.de wrote: Hello everyone, I already posted this issue at the Debian HA maintainers list (http://l ists.alioth.debian.org/pipermail/debian-ha-maintainers/2015 -July/004325.html). Unfortunately the problem still exist

Re: [ClusterLabs] Resource location node preference

2015-10-22 Thread Andrei Borzenkov
22.10.2015 18:25, Vallevand, Mark K пишет: Suppose I have a resource defined with a preference of node1 over node2. The resource is running on node1. Node1 goes away. Now the resource is running on node2. Node1 comes back and joins the cluster. Will the resource relocate to Node1?

[ClusterLabs] Failover to spare node

2015-10-22 Thread Andrei Borzenkov
Let's say I have a pool of nodes and multiple services, somehow distributed across them. I would like to keep one node as "spare", without services by default, and if any of "worker" nodes fail, services that were running there should be relocated to spare together. This ensures each service

Re: [ClusterLabs] ORACLE 12 and SLES HAE (Sles 11sp3)

2015-10-28 Thread Andrei Borzenkov
On Wed, Oct 28, 2015 at 11:45 AM, Cristiano Coltro wrote: > Hi, > most of the SLES 11 sp3 with HAE are migrating Oracle Db. > The migration will be from Oracle 11 to Oracle 12 > > They have verified that the Oracles cluster resources actually supports > - Oracle 10.2 and 11.2

Re: [ClusterLabs] required nodes for quorum policy

2015-11-09 Thread Andrei Borzenkov
On Tue, Nov 10, 2015 at 1:20 AM, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding the policy to check for cluster quorum for > corosync+pacemaker. > > As far as I know at present it is always (excpected_votes)/2 + 1. Seems like > "qdiskd" has an

Re: [ClusterLabs] 2-node cluster Postgresql problem with lock file

2015-11-13 Thread Andrei Borzenkov
On Thu, Nov 12, 2015 at 11:54 AM, Damien Bras wrote: > Hi, > > > > I have a 2-node cluster using PostgreSQL synchronous streaming replication. > I don’t have preference of the location of the master role. > > I followed this documentation : >

Re: [ClusterLabs] restarting resources

2015-11-03 Thread Andrei Borzenkov
On Mon, Nov 2, 2015 at 7:59 PM, - - wrote: > Hi, >I need to be able to restart a resource (e.g apache) whenever a > configuration > file is updated. I have been using the 'crm resource restart ' command to to > it, > which does restart the resource BUT also restarts my

Re: [ClusterLabs] Antw: Monitoring Op for LVM - Excessive Logging

2015-10-09 Thread Andrei Borzenkov
09.10.2015 19:40, Jorge Fábregas пишет: On 10/09/2015 09:06 AM, Ulrich Windl wrote: Did you try daemon_options="-d0"? (in clvmd resource) I've just found this: http://pacemaker.oss.clusterlabs.narkive.com/C5BaFych/ocf-lvm2-clvmd-resource-agent ...so apparently SUSE changed the resource

Re: [ClusterLabs] 3 nodes cluster on Centos 7

2015-07-07 Thread Andrei Borzenkov
On Tue, Jul 7, 2015 at 12:34 PM, Michael Schwartzkopff m...@sys4.de wrote: The cluster has 3 nodes : - 1 virtual machine (machine1). This machine is supposed to be high-available - 2 physical machines identical (machine2 and 3) It's not going to work. If host where this VM is running

Re: [ClusterLabs] VG activation on Active/Passive

2015-08-31 Thread Andrei Borzenkov
Отправлено с iPhone > 29 авг. 2015 г., в 21:51, Jorge Fábregas > написал(а): > >> On 08/29/2015 02:37 PM, Digimer wrote: >> No need for clustered LVM, only the active node should see the PV. When >> the passive takes over, after connecting to the PV, it should do a

Re: [ClusterLabs] a newbie --question

2015-09-15 Thread Andrei Borzenkov
On Tue, Sep 15, 2015 at 4:38 PM, wrote: > Hi, > > Thanks for reply. > The problem is Compute resource, the appY and appZ can't run on same Server. > > It is possible ? > Yes; set location constraint that appY cannot run on the same node as appZ (and vice versa).

Re: [ClusterLabs] Major problem with iSCSITarget resource on top of DRBD M/S resource.

2015-09-27 Thread Andrei Borzenkov
27.09.2015 17:40, Alex Crow пишет: Hi List, I'm trying to set up a failover iSCSI storage system for oVirt using a self-hosted engine. I've set up DRBD in Master-Slave for two iSCSI targets, one for the self-hosted engine and one for the VMs. I has this all working perfectly, then after trying

Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha wrote: > Hi, > > I want to know how to disable failover. If a node undergoes a failover the > resources running on the node should not be started on the other node in the > cluster. How can this be achieved. > What exactly

Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
01.10.2015 19:09, Vijay Partha пишет: i want pacemaker to monitor the resources running on each node and at the same time restart it. It should run on the same node. Then create single node cluster. Why do you add second node if you do not want to use it? On Thu, Oct 1, 2015 at 9:17 PM,

Re: [ClusterLabs] Antw: Re: design of a two-node cluster

2015-12-08 Thread Andrei Borzenkov
On Tue, Dec 8, 2015 at 10:44 AM, Ulrich Windl wrote: Digimer schrieb am 07.12.2015 um 22:40 in Nachricht > <5665fcdc.1030...@alteeve.ca>: > [...] >> Node 1 looks up how to fence node 2, sees no delay and fences >> immediately. Node 2

Re: [ClusterLabs] Antw: Re: Antw: Re: design of a two-node cluster

2015-12-08 Thread Andrei Borzenkov
On Tue, Dec 8, 2015 at 12:01 PM, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 08.12.2015 um 09:01 in > Nachricht > <CAA91j0Un+1EN6xRLM=dm6ck+usdzmpnyyjtha9d+btrzfcg...@mail.gmail.com>: >&g

[ClusterLabs] OCF_RESKEY_CRM_meta_notify_active_resource on master/slave?

2015-12-08 Thread Andrei Borzenkov
Documentation (pacemaker explained) says for master/slave OCF_RESKEY_CRM_meta_notify_active_resource and OCF_RESKEY_CRM_meta_notify_inactive_resource should be filled in. It is not what I see here - both are empty, instead I see OCF_RESKEY_CRM_meta_notify_master_resource and

Re: [ClusterLabs] Notice: SLES11SP4 broke exportfs!

2015-12-11 Thread Andrei Borzenkov
11.12.2015 21:27, Ulrich Windl пишет: > Hi! > > After updating from SLES11SP3 (june version) to SLES11SP4 (todays version) > exportfs fails to get the export status. I have message like this in syslog: > > Dec 11 19:22:09 h04 crmd[11128]: notice: process_lrm_event: >

Re: [ClusterLabs] Anyone successfully install Pacemaker/Corosync on Freebsd?

2015-12-19 Thread Andrei Borzenkov
20.12.2015 01:56, mike пишет: > Hi All, > > just curious if anyone has had any luck at one point installing > Pacemaker and Corosync on FreeBSD. I have to install from source of > course and I've run into an issue when running ./configure while trying > to install Corosync. The process craps out

Re: [ClusterLabs] master/slave resource agent without demote

2015-11-24 Thread Andrei Borzenkov
On Tue, Nov 24, 2015 at 5:19 PM, Waldemar Brodkorb wrote: > Hi, > > we are using a derivate of the Tomcat OCF script. > Our web application needs to be promoted (via a wget call). > But our application is not able to demote in a clean way, so > we need to stop and then

Re: [ClusterLabs] start service after filesystemressource

2015-11-20 Thread Andrei Borzenkov
20.11.2015 16:38, haseni...@gmx.de пишет: Hi, I want to start several services after the drbd ressource an the filessystem is avaiable. This is my current configuration: node $id="184548773" host-1 \ attributes standby="on" node $id="184548774" host-2 \ attributes

Re: [ClusterLabs] start service after filesystemressource

2015-11-20 Thread Andrei Borzenkov
o_drbd_before_services inf: ms_drbd_export:promote mygroup:start 2015-11-20 15:45 GMT+01:00 Andrei Borzenkov <arvidj...@gmail.com>: 20.11.2015 16:38, haseni...@gmx.de пишет: Hi, I want to start several services after the drbd ressource an the filessystem is avaiable. This is my c

Re: [ClusterLabs] Recovering after split-brain

2016-06-21 Thread Andrei Borzenkov
21.06.2016 20:05, Dimitri Maziuk пишет: > On 06/21/2016 11:47 AM, Digimer wrote: > >> If you don't need to coordinate services/access, you don't need HA. >> >> If you do need to coordinate services/access, you need fencing. > > So what you're saying is we *cannot* run a pacemaker cluster without

Re: [ClusterLabs] Recovering after split-brain

2016-06-21 Thread Andrei Borzenkov
21.06.2016 20:27, Dimitri Maziuk пишет: > On 06/21/2016 12:13 PM, Andrei Borzenkov wrote: > >> You should not run pacemaker without some sort of fencing. This need not >> be network-controlled power socket (and tiebreaker is not directly >> related to fencing). >

Re: [ClusterLabs] restarting pacemakerd

2016-06-19 Thread Andrei Borzenkov
18.06.2016 22:04, Dmitri Maziuk пишет: > On 2016-06-18 05:15, Ferenc Wágner wrote: > ... >> On the other hand, one could argue that restarting failed services >> should be the default behavior of systemd (or any init system). Still, >> it is not. > > As an off-topic snide comment, I never

Re: [ClusterLabs] problems with a CentOS7 SBD cluster

2016-06-25 Thread Andrei Borzenkov
25.06.2016 23:05, Marcin Dulak пишет: > Hi, > > I'm trying to get familiar with STONITH Block Devices (SBD) on a 3-node > CentOS7 built in VirtualBox. > The complete setup is available at > https://github.com/marcindulak/vagrant-sbd-tutorial-centos7.git > so hopefully with some help I'll be able

Re: [ClusterLabs] getting "Totem is unable to form a cluster" error

2016-04-08 Thread Andrei Borzenkov
08.04.2016 17:51, Jan Friesse пишет: >> On 04/08/16 13:01, Jan Friesse wrote: >> >> pacemaker 1.1.12-11.12 >> >> openais 1.1.4-5.24.5 >> >> corosync 1.4.7-0.23.5 >> >> >> >> Its a two node active/passive cluster and we just upgraded the >> SLES 11 >> >> SP 3 to SLES 11 SP 4(nothing else)

Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind (Ken Gaillot)

2016-03-19 Thread Andrei Borzenkov
On Wed, Mar 16, 2016 at 9:35 PM, Mike Bernhardt wrote: > I guess I have to say "never mind!" I don't know what the problem was > yesterday, but it loads just fine today, even when the named config and the > virtual ip don't match! But for your edamacation, ifconfig does NOT

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Andrei Borzenkov
On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg wrote: > On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: >> >> And some more about fencing: >> >> >> >> 3. What's the difference in cluster behavior between >> >>- stonith-enabled=FALSE (9.3.2: how often

Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-22 Thread Andrei Borzenkov
22.07.2016 09:52, Ulrich Windl пишет: > That could be. Should there be a node list to configure, or can't the agent > find out itself (for SBD)? > It apparently does it already gethosts) echo `sbd -d $sbd_device list | cut -f2 | sort | uniq` exit 0

Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Andrei Borzenkov
22.07.2016 17:43, Jason A Ramsey пишет: > Additionally (and this is just a failing on my part), I’m > unclear as to where the resource agent is fed the value for > “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one > is permitted to supply with “pcs resource create…” > It is

Re: [ClusterLabs] Antw: Re: Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-12 Thread Andrei Borzenkov
11.07.2016 09:33, Ulrich Windl пишет: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 09.07.2016 um 10:17 in > Nachricht <5780b30a.3000...@gmail.com>: >> 08.07.2016 09:11, Ulrich Windl пишет: >>>>>> "Carlos Xavier"

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-25 Thread Andrei Borzenkov
On Mon, Jul 25, 2016 at 9:07 AM, Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 22.07.2016 um 17:14 in > Nachricht <4f17c57b-7458-2ec8-cd74-3daaf9c89...@gmail.com>: >> 22.07.2016 09:52, Ulrich

Re: [ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues

2016-07-22 Thread Andrei Borzenkov
23.07.2016 00:07, TEG AMJG пишет: ... > Master: kamailioetcclone > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > Resource: kamailioetc (class=ocf provider=linbit type=drbd) >Attributes: drbd_resource=kamailioetc >Operations: start interval=0s

Re: [ClusterLabs] Previous DC fenced prior to integration

2016-07-22 Thread Andrei Borzenkov
23.07.2016 01:37, Nate Clark пишет: > Hello, > > I am running pacemaker 1.1.13 with corosync and think I may have > encountered a start up timing issue on a two node cluster. I didn't > notice anything in the changelog for 14 or 15 that looked similar to > this or open bugs. > > The rough out

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-21 Thread Andrei Borzenkov
22.07.2016 00:38, Klaus Wenninger пишет: > On 07/21/2016 06:40 PM, Andrei Borzenkov wrote: >> 19.07.2016 18:24, Klaus Wenninger пишет: >>> On 07/19/2016 04:17 PM, Ken Gaillot wrote: >>>> On 07/19/2016 09:00 AM, Andrei Borzenkov wrote: >>>>> On Tue

Re: [ClusterLabs] external/libvirt source code

2016-08-02 Thread Andrei Borzenkov
On Tue, Aug 2, 2016 at 4:58 PM, Maciej Kopczyński wrote: > Hello, > > Sorry if it is a trivial question, but I am facing a wall here. I am trying > to configure fencing on cluster running Hyper-V. I need to modify source > code for external/libvirt plugin, but I have no idea

Re: [ClusterLabs] Can Pacemaker monitor geographical separated servers

2016-08-10 Thread Andrei Borzenkov
On Tue, Aug 9, 2016 at 9:40 PM, bhargav M.P wrote: > Hi All, > I have deployment where we have two Linux servers that are geographically > separated and they are across different subnets . I want the server to work > in Active/Standby mode . I would like to use

Re: [ClusterLabs] Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-21 Thread Andrei Borzenkov
21.07.2016 09:49, Ulrich Windl пишет: Ken Gaillot schrieb am 19.07.2016 um 16:17 in Nachricht > : > > [...] >> You're right -- if not told otherwise, Pacemaker will query the device >> for the target list. In this

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-19 Thread Andrei Borzenkov
On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot wrote: ... >> >> primitive p_ston_pg1 stonith:external/ipmi \ >> params hostname=pg1 ipaddr=10.148.128.35 userid=root >> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" >> passwd_method=file interface=lan

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-19 Thread Andrei Borzenkov
19.07.2016 18:24, Klaus Wenninger пишет: > On 07/19/2016 04:17 PM, Ken Gaillot wrote: >> On 07/19/2016 09:00 AM, Andrei Borzenkov wrote: >>> On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot <kgail...@redhat.com> wrote: >>> ... >>>>> primitive p_ston_pg

Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-20 Thread Andrei Borzenkov
On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel wrote: >> > [...] >> > >> > primitive p_ston_pg1 stonith:external/ipmi \ >> > params hostname=pg1 ipaddr=10.148.128.35 userid=root >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" >> >

Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Andrei Borzenkov
20.07.2016 18:08, Jason A Ramsey пишет: > I have been struggling getting a HA iSCSI Target cluster in place for > literally weeks. I cannot, for whatever reason, get pacemaker to create an > iSCSILogicalUnit resource. The error message that I’m seeing leads me to > believe that I’m missing

Re: [ClusterLabs] Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-09 Thread Andrei Borzenkov
08.07.2016 09:11, Ulrich Windl пишет: "Carlos Xavier" schrieb am 07.07.2016 um 18:57 in > Nachricht <00e901d1d870$ae418000$0ac48000$@com.br>: >> Tank you for the fast reply >> >>> >>> have you configured the stonith and drbd stonith handler? >>> >> >> Yes.

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Andrei Borzenkov
05.08.2016 02:33, Digimer пишет: > On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >> On 2016-08-04 19:03, Digimer wrote: >>> On 04/08/16 06:56 PM, Dan Swartzendruber wrote: I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-05 Thread Andrei Borzenkov
On Fri, Aug 5, 2016 at 7:08 AM, Digimer <li...@alteeve.ca> wrote: > On 04/08/16 11:44 PM, Andrei Borzenkov wrote: >> 05.08.2016 02:33, Digimer пишет: >>> On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >>>> On 2016-08-04 19:03, Digimer wrote: >>>>

Re: [ClusterLabs] restart of one instance of a clone resource causes restart of dependent resources

2017-02-20 Thread Andrei Borzenkov
06.02.2017 13:02, Daniel пишет: > Hi All, > > I'm having issues with a ordering constraint with a clone resource in > pacemaker v1.1.14. > - I have a resourceA-clone (running on 2 nodes: node1 and node2). > - then I have 2 other resources: resourceB1 (allowed to run on node1 only) > and

Re: [ClusterLabs] Mysql slave did not start replication after failure, and read-only IP also remained active on the much outdated slave

2016-08-22 Thread Andrei Borzenkov
On Mon, Aug 22, 2016 at 12:18 PM, Attila Megyeri wrote: > Dear community, > > > > A few days ago we had an issue in our Mysql M/S replication cluster. > > We have a one R/W Master, and a one RO Slave setup. RO VIP is supposed to be > running on the slave if it is not

Re: [ClusterLabs] Oralsnr/Oracle resources agents

2017-02-25 Thread Andrei Borzenkov
25.02.2017 23:18, Jihed M'selmi пишет: > [DM] I thought that oracle listener is not consuming that many resources. > At any rate, ocf:heartbeat:oralsnr doesn't support single listener for > multiple instances. Do you have an idea how to do that? How to deal with > the tnsping then? Maybe you're

Re: [ClusterLabs] using IPMI for fencing - configuring IPMI with ipmitool - HELP

2017-02-28 Thread Andrei Borzenkov
28.02.2017 20:39, Lentes, Bernd пишет: > Hi, > > i have a HP server ML 350 G9 with an ILO4 card. The riloe stonith agent does > not work, i read in a book the recommendation to use the ipmi ressource agent > instead. > I'm trying to configure the respective ILO adapter with ipmitool. Why do

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-09 Thread Andrei Borzenkov
10.10.2016 00:42, Eric Robinson пишет: > Digimer, thanks for your thoughts. Booth is one of the solutions I > looked at, but I don't like it because it is complex and difficult to > implement HA is complex. There is no way around it. > (and perhaps costly in terms of AWS services or something >

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-21 Thread Andrei Borzenkov
14.10.2016 10:39, Vladislav Bogdanov пишет: > > use of utilization (balanced strategy) has one caveat: resources are > not moved just because of utilization of one node is less, when nodes > have the same allocation score for the resource. So, after the > simultaneus outage of two nodes in a

Re: [ClusterLabs] Help crm_master with score 0

2016-10-21 Thread Andrei Borzenkov
20.10.2016 09:20, K Aravind пишет: > Hi all > Small doubt > Let's say there are two node cluster without fencing say node 1 , node 2 > Where node 1 = active > Node 2 = passive > Now if node 1 is down > So node 2 promote is called ..however if the score given is 0 via > crm_master -l

Re: [ClusterLabs] Can Bonding Cause a Broadcast Storm?

2016-11-15 Thread Andrei Borzenkov
16.11.2016 02:48, Eric Robinson пишет: > mode 1. No special switch configuration. spanning tree not enabled. I > have 100+ Linux servers, all of which use bonding. The network has > been stable for 10 years. No changes recently. However, this is the > second time that we have seen high latency and

Re: [ClusterLabs] Unexpected Resource movement after failover

2016-10-13 Thread Andrei Borzenkov
On Thu, Oct 13, 2016 at 4:59 PM, Nikhil Utane wrote: > Hi, > > I have 5 nodes and 4 resources configured. > I have configured constraint such that no two resources can be co-located. > I brought down a node (which happened to be DC). I was expecting the > resource on

Re: [ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-11 Thread Andrei Borzenkov
On Tue, Oct 11, 2016 at 9:18 AM, Ulrich Windl wrote: > > My point is this: For a resource that can only exclusively run on one node, > it's important that the other node is down before taking action. But for cLVM > and OCFS2 the resources can run concurrently

Re: [ClusterLabs] Error performing operation: Argument list too long

2016-12-06 Thread Andrei Borzenkov
06.12.2016 20:41, Jan Pokorný пишет: > On 06/12/16 09:44 -0600, Ken Gaillot wrote: >> On 12/05/2016 02:29 PM, Shane Lawrence wrote: >>> I'm experiencing a strange issue with pacemaker. It is unable to check >>> the status of a systemd resource. >>> >>> systemctl shows that the service crashed:

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Andrei Borzenkov
22.04.2017 11:31, Klaus Wenninger пишет: >>> I wonder how SBD fits into this discussion. It is marketed as stonith >>> agent, but it is based on committing suicide so relies on well-behaving >>> nodes. Which we by definition cannot trust to behave well, otherwise >>> we'd not need stonith in

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Andrei Borzenkov
18.04.2017 10:47, Ulrich Windl пишет: ... >> >> Now let me come back to quorum vs. stonith; >> >> Said simply; Quorum is a tool for when everything is working. Fencing is >> a tool for when things go wrong. > > I'd say: Quorum is the tool to decide who'll be alive and who's going to die, > and

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Andrei Borzenkov
22.04.2017 23:33, Dmitri Maziuk пишет: > On 4/22/2017 12:02 PM, Digimer wrote: > >> Having SBD properly configured is *massively* safer than no fencing at >> all. So for people where other fence methods are not available for >> whatever reason, SBD is the way to go. > > Now you're talking. IMO

Re: [ClusterLabs] Antw: Re: Antw: Re: 2-Node Cluster Pointless?

2017-04-24 Thread Andrei Borzenkov
24.04.2017 09:15, Ulrich Windl пишет: >>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 22.04.2017 um 09:05 in > Nachricht <ede2cdd3-7020-9f59-90ad-c3b4a0c9e...@gmail.com>: >> 18.04.2017 10:47, Ulrich Windl пишет: >> ... >>>> >>>

Re: [ClusterLabs] SAP HANA resource start problem

2017-05-14 Thread Andrei Borzenkov
12.05.2017 13:30, Muhammad Sharfuddin пишет: > is there a bug in SAP HANA resource ? crm_mon shows that cluster started > the resource and keep the HANA resource in slave state, while in actual > cluster doesn't start the resources, we found following events in the logs: > SAP HANA agent

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Andrei Borzenkov
20.06.2017 02:15, Digimer пишет: > On 19/06/17 06:59 PM, Ferenc Wágner wrote: >> Digimer writes: >> >>> So we have a tool that watches for changes to clvmd by running >>> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally >>> cause trouble. >> >> What kind of

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Andrei Borzenkov
16.06.2017 11:14, Eric Robinson пишет: > > I can understand how SUSE can charge for support, but not for the software > itself. Corosync, Pacemaker, and DRBD are all open source. So why do not you download open source and compile it yourself? ___

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-27 Thread Andrei Borzenkov
Отправлено с iPhone > 27 нояб. 2017 г., в 14:36, Ferenc Wágner <wf...@niif.hu> написал(а): > > Andrei Borzenkov <arvidj...@gmail.com> writes: > >> 25.11.2017 10:05, Andrei Borzenkov пишет: >> >>> In one of guides suggested procedure to simulate

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-28 Thread Andrei Borzenkov
28.11.2017 13:01, Jan Pokorný пишет: > On 27/11/17 17:43 +0300, Andrei Borzenkov wrote: >> Отправлено с iPhone >> >>> 27 нояб. 2017 г., в 14:36, Ferenc Wágner <wf...@niif.hu> написал(а): >>> >>> Andrei Borzenkov <arvidj...@gmail.com> wri

Re: [ClusterLabs] cluster with two ESX server

2017-11-28 Thread Andrei Borzenkov
28.11.2017 10:45, Ramann, Björn пишет: > hi@all, > > in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now > I'm looking for a way to configure the cluster fence/stonith with two ESX > server - is this possible? if you have sgared storage, SBD may be an option. > > I try

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-26 Thread Andrei Borzenkov
25.11.2017 10:05, Andrei Borzenkov пишет: > In one of guides suggested procedure to simulate split brain was to kill > corosync process. It actually worked on one cluster, but on another > corosync process was restarted after being killed without cluster > noticing anything. Except a

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-26 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет: >> >> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly >> just fenced by sapprod01p for sapprod01p >> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: warning: The crmd >> process (3151) can no longer be respawned, >> Nov 22 16:04:56

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-22 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >> corosync and forcing STONITH pace

Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Andrei Borzenkov
29.11.2017 20:14, Klaus Wenninger пишет: > On 11/28/2017 07:41 PM, Andrei Borzenkov wrote: >> 28.11.2017 10:45, Ramann, Björn пишет: >>> hi@all, >>> >>> in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now >>> I'm looking for

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Andrei Borzenkov
30.11.2017 16:11, Klaus Wenninger пишет: > On 11/30/2017 01:41 PM, Ulrich Windl wrote: >> >>>>> "Gao,Yan" <y...@suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht >> <e71afccc-06e3-97dd-c66a-1b4bac550...@suse.com>: >>> On

Re: [ClusterLabs] Pacemaker resource is not tried to be recovered after failure on slave node even when failcount is less than migration-threshold

2017-11-27 Thread Andrei Borzenkov
Отправлено с iPhone > 27 нояб. 2017 г., в 14:50, Pankaj написал(а): > > Hi, > > Could you please help me with below query. > > I have a stateful resource, stateful_ms, defined as below. The > migration-threshold is defined as 4 and resource-stickiness as 100. > I have

[ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-22 Thread Andrei Borzenkov
SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with VM on VSphere using shared VMDK as SBD. During basic tests by killing corosync and forcing STONITH pacemaker was not started after reboot. In logs I see during boot Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were

[ClusterLabs] SBD stonith in 2 node cluster - how to make it prefer one side of cluster?

2017-11-24 Thread Andrei Borzenkov
Wrapping my head around how pcmk_delay_max works, my understanding is - on startup pacemaker always starts one instance of stonith/sbd; it probably randomly selects node for it. I suppose this initial start is delayed by random number within pcmk_delay_max. - when cluster is partitioned,

[ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-24 Thread Andrei Borzenkov
In one of guides suggested procedure to simulate split brain was to kill corosync process. It actually worked on one cluster, but on another corosync process was restarted after being killed without cluster noticing anything. Except after several attempts pacemaker died with stopping resources ...

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 12:42 AM, Jan Pokorný <jpoko...@redhat.com> wrote: > On 29/11/17 22:00 +0100, Jan Pokorný wrote: >> On 28/11/17 22:35 +0300, Andrei Borzenkov wrote: >>> 28.11.2017 13:01, Jan Pokorný пишет: >>>> On 27/11/17 17:43 +0300, Andrei Borzen

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot wrote: > > The same scenario is why a single node can't have quorum at start-up in > a cluster with "two_node" set. Both nodes have to see each other at > least once before they can assume it's safe to do anything. > Unless we set

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:48 PM, Gao,Yan <y...@suse.com> wrote: > On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >> >> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >> VM on VSphere using shared VMDK as SBD. During basic tests by killing >

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:39 PM, Gao,Yan <y...@suse.com> wrote: > On 11/30/2017 09:14 AM, Andrei Borzenkov wrote: >> >> On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot <kgail...@redhat.com> wrote: >>> >>> >>> The same scenario is why a sing

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Andrei Borzenkov
01.12.2017 22:36, Gao,Yan пишет: > On 11/30/2017 06:48 PM, Andrei Borzenkov wrote: >> 30.11.2017 16:11, Klaus Wenninger пишет: >>> On 11/30/2017 01:41 PM, Ulrich Windl wrote: >>>> >>>>>>> "Gao,Yan" <y...@suse.com> schrieb am 30.

Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Andrei Borzenkov
04.12.2017 18:47, Tomas Jelinek пишет: > Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a): >> Tomas Jelinek writes: >> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? >>> >>> First, it sends a request to each node to stop

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-04 Thread Andrei Borzenkov
04.12.2017 14:48, Gao,Yan пишет: > On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: >> 30.11.2017 13:48, Gao,Yan пишет: >>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: >>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with >>>> VM o

  1   2   3   4   >