Re: [ClusterLabs] 3 node cluster to 2 with quorum device

2019-01-05 Thread Andrei Borzenkov
06.01.2019 8:16, Jason Pfingstmann пишет: > I am new to corosync and pacemaker, having only used heartbeat in the > past (which is barely even comparable, now that I’m in the middle of > this). I’m working on a system for RDQM (IBM’s MQ software, > clustering solution) and it uses corosync with pa

Re: [ClusterLabs] Trying to Understanding crm-fence-peer.sh

2019-01-16 Thread Andrei Borzenkov
16.01.2019 19:49, Bryan K. Walton пишет: > On Wed, Jan 16, 2019 at 04:53:32PM +0100, Lars Ellenberg wrote: >> >> To clarify: crm-fence-peer.sh is an *example implementation* >> (even though an elaborate one) of a DRBD fencing policy handler, >> which uses pacemaker location constraints on the Maste

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-22 Thread Andrei Borzenkov
22.01.2019 20:00, Ken Gaillot пишет: > On Tue, 2019-01-22 at 16:52 +0100, Lentes, Bernd wrote: >> Hi, >> >> we have a new UPS which has enough charge to provide our 2-node >> cluster with the periphery (SAN, switches ...) for a resonable time. >> I'm currently thinking of the shutdown- and restart-

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-24 Thread Andrei Borzenkov
23.01.2019 17:20, Klaus Wenninger пишет: > > And yes dynamic-configuration of two_node should be possible - > remember that I had to implement that communication with > corosync into sbd for clusters that are expanded node-by-node > using pcs. > 'corosync-cfgtool -R' to reload the config. > Using

Re: [ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

2019-01-24 Thread Andrei Borzenkov
24.01.2019 18:01, Lentes, Bernd пишет: > - On Jan 23, 2019, at 3:20 PM, Klaus Wenninger kwenn...@redhat.com wrote: >>> I have corosync-2.3.6-9.13.1.x86_64. >>> Where can i configure this value ? >> >> speaking of two_node & wait_for_all? >> That is configured in the quorum-section of corosync.c

Re: [ClusterLabs] Is fencing really a must for Postgres failover?

2019-02-13 Thread Andrei Borzenkov
13.02.2019 15:50, Maciej S пишет: > Can you describe at least one situation when it could happen? > I see situations where data on two masters can diverge but I can't find the > one where data gets corrupted. If diverged data in two databases that are supposed to be exact copy of each other is no

Re: [ClusterLabs] Pacemaker and Stonith : passive node won't bring up resources

2015-06-26 Thread Andrei Borzenkov
В Wed, 24 Jun 2015 15:42:43 -0400 Digimer пишет: > On 24/06/15 01:00 PM, Mathieu Valois wrote: > > > > Le 24/06/2015 18:29, Ken Gaillot a écrit : > >> On 06/24/2015 10:58 AM, Mathieu Valois wrote: > >>> Hi everybody, > >>> I'm working with Pacemaker and Stonith for High-Availability with > >>> 2

Re: [ClusterLabs] Antw: CRM location specification and errors

2015-06-26 Thread Andrei Borzenkov
В Fri, 26 Jun 2015 08:28:14 +0200 "Ulrich Windl" пишет: > Hi! > > I guess the cluster is running monitor operations on the forbidden node to > make sure no resources run there: Meaning: If you had started those resources > on the forbidden node, the cluster would stop them. To find out it runs

Re: [ClusterLabs] Antw: CRM location specification and errors

2015-06-26 Thread Andrei Borzenkov
В Fri, 26 Jun 2015 10:12:00 +0200 Kristoffer Grönlund пишет: > Andrei Borzenkov writes: > > > В Fri, 26 Jun 2015 08:28:14 +0200 > > "Ulrich Windl" пишет: > > > >> Hi! > >> > >> I guess the cluster is running monitor operations on

Re: [ClusterLabs] Resource stop when another resource run on that node

2015-06-30 Thread Andrei Borzenkov
On Tue, Jun 30, 2015 at 2:50 PM, John Gogu wrote: > Hello, > i would like to ask you if you have any idea about how to accomplish > following scenario: > > 2 cluster nodes (node01 node02) > 2 different resources (ex. IP1 run on node01 and IP2 run on node02) > > I would like to setup a constraint (

Re: [ClusterLabs] [Linux-HA] file system resource becomes inaccesible when any of the node goes down

2015-07-06 Thread Andrei Borzenkov
В Tue, 07 Jul 2015 01:02:49 +0500 Muhammad Sharfuddin пишет: > On 07/06/2015 07:04 PM, Dejan Muhamedagic wrote: > > On Mon, Jul 06, 2015 at 03:14:34PM +0500, Muhammad Sharfuddin wrote: > >> On 07/06/2015 02:50 PM, Dejan Muhamedagic wrote: > >>> Hi, > >>> > >>> On Sun, Jul 05, 2015 at 09:13:56PM +

Re: [ClusterLabs] clear pending fence operation

2015-07-07 Thread Andrei Borzenkov
On Tue, Jul 7, 2015 at 10:41 AM, wrote: > hi, > > is there any way to clear/remove pending stonith operation on cluster node? > > after some internal testing i got following status: > > Jul 4 12:18:02 XXX crmd[1673]: notice: te_fence_node: Executing reboot > fencing operation (179) on XXX (tim

Re: [ClusterLabs] 3 nodes cluster on Centos 7

2015-07-07 Thread Andrei Borzenkov
On Tue, Jul 7, 2015 at 12:19 PM, Nicolas S. wrote: > Hello everybody, > > I'm posting first time on this mailing list for an advice. > > I try actually trying to build a cluster on Centos 7. > > The cluster has 3 nodes : > > - 1 virtual machine (machine1). This machine is supposed to be > high-ava

Re: [ClusterLabs] 3 nodes cluster on Centos 7

2015-07-07 Thread Andrei Borzenkov
On Tue, Jul 7, 2015 at 12:34 PM, Michael Schwartzkopff wrote: >> > The cluster has 3 nodes : >> > >> > - 1 virtual machine (machine1). This machine is supposed to be >> > high-available >> > - 2 physical machines identical (machine2 and 3) >> > >> It's not going to work. If host where this VM is r

Re: [ClusterLabs] Pacemaker failover failure

2015-07-14 Thread Andrei Borzenkov
В Wed, 15 Jul 2015 00:31:45 +0100 alex austin пишет: > Unfortunately I have nothing yet ... > > There's something I don't quite understand though. What's the role of > stonith if the other machine crashes unexpectedly and totally unclean? Is > it to reboot the machine and recreate the cluster, t

Re: [ClusterLabs] Resource cannot run anywhere

2015-07-21 Thread Andrei Borzenkov
On Mon, Jul 20, 2015 at 4:40 PM, Leonhardt,Christian wrote: > Hello everyone, > > I already posted this issue at the Debian HA maintainers list (http://l > ists.alioth.debian.org/pipermail/debian-ha-maintainers/2015 > -July/004325.html). Unfortunately the problem still exist and the > Debian maint

Re: [ClusterLabs] How to cluster a service with multiple possibilities

2015-07-25 Thread Andrei Borzenkov
В Fri, 24 Jul 2015 14:43:37 + David Gersic пишет: > I have a process (OpenSLP slpd) that I'd like to cluster. Unfortunately, this > process provides multiple services, depending what it finds in its > configuration file on startup. I need to have the process running on all of > the cluster

Re: [ClusterLabs] Resources FAIL on every host after host reboot

2015-07-30 Thread Andrei Borzenkov
On Thu, Jul 30, 2015 at 1:52 PM, Grigori Frolov wrote: > Hi everyone! > I use Pacemaker on seven nodes to manage resources. I faced up a problem: > whenever I reboot a node, every resource on every node becomes FAILED and > unmanaged. After that I need to cleanup. > I use pacemaker 1.1.12-4.el6 an

Re: [ClusterLabs] systemd: xxxx.service start request repeated too quickly

2015-08-04 Thread Andrei Borzenkov
On Tue, Aug 4, 2015 at 2:27 PM, Juha Heinanen wrote: > I have a resource group that consists of file system, virtual ip, mysql > server, and service . I removed a database from mysql server that > is required for service to start. After that I started to get huge > number of messages to

Re: [ClusterLabs] systemd: xxxx.service start request repeated too quickly

2015-08-04 Thread Andrei Borzenkov
On Tue, Aug 4, 2015 at 4:57 PM, Juha Heinanen wrote: > Andrei Borzenkov writes: > >> Not sure I really understand the question. If service cannot run >> anyway, you can simply remove it from configuration. You can set >> target state to stopped. You can unmanage it. It al

Re: [ClusterLabs] NFS - Non Mirroring

2015-08-04 Thread Andrei Borzenkov
В Tue, 4 Aug 2015 15:54:53 + "Streeter, Michelle N" пишет: > Oops and Clarification: I was wrong about the iscsi. Its Serial over scsi > or SAS. > > From: Streeter, Michelle N > Sent: Tuesday, August 04, 2015 10:39 AM > To: 'users@clusterlabs.org' > Subject: NFS - Non Mirroring > > Our c

Re: [ClusterLabs] Vagrantfile for Clusters_from_Scratch 1.1-pcs tutorial

2015-08-04 Thread Andrei Borzenkov
В Tue, 4 Aug 2015 14:26:32 +1000 Andrew Beekhof пишет: > > Having said that, you don’t need the shared disk side of things to get some > benefit from sbd (yes, i know how strange that sounds). > On Fedora/RHEL/CentOS (speaking of which, CentOS 7 would be a much better > target than Fedora) you

Re: [ClusterLabs] Disabling resources and adding apache instances

2015-08-05 Thread Andrei Borzenkov
On Wed, Aug 5, 2015 at 9:23 AM, Vijay Partha wrote: > Hi, > > I have 2 doubts. > > 1.) If i disable a resource and reboot the node, will the pacemaker restart > the service? What exactly "disable" means? There is no such operation in pacemaker. > Or how can i stop the service and after rebootin

Re: [ClusterLabs] 2 Nodes Pacemaker for Nginx can only failover 1 time

2015-08-08 Thread Andrei Borzenkov
В Sun, 09 Aug 2015 03:02:15 + jun huang пишет: > Hello Everyone, > > I setup a cluster with two nodes with pacemaker 1.1.10 on CentOS 7. Then I > downloaded aresource agent for nginx from github > > > I tested my s

Re: [ClusterLabs] starting of resources

2015-08-11 Thread Andrei Borzenkov
On Tue, Aug 11, 2015 at 9:44 AM, Vijay Partha wrote: > Hi, > > Can we statically add resources to the nodes. I mean before the pacemaker is > started can we add resources to the nodes like you dont require to make use > of pcs resource create. Is this possible? > You better explain what you are t

Re: [ClusterLabs] circumstances under which resources become unmanaged

2015-08-12 Thread Andrei Borzenkov
On 12.08.2015 20:46, N, Ravikiran wrote: Hi All, I have a resource added to pacemaker called 'cmsd' whose state is getting to 'unmanaged FAILED' state. Apart from manually changing the resource to unmanaged using "pcs resource unmanage cmsd" , I'm trying to understand under what all circums

Re: [ClusterLabs] Ordering constraint restart second resource group

2015-08-12 Thread Andrei Borzenkov
On 12.08.2015 19:35, John Gogu wrote: ​Hello, in my cluster configuration I have following situation: resource_group_A ip1 ip2 resource_group_B apache1 ordering constraint resource_group_A then resource_group_B symetrical=true When I add a new resource from group_A, resources fro

Re: [ClusterLabs] Antw: Delayed first monitoring

2015-08-13 Thread Andrei Borzenkov
On Thu, Aug 13, 2015 at 10:01 AM, Miloš Kozák wrote: > However, > this does not make sense at all. Presumably, the pacemaker should get along > with lsb scripts which comes from system repository, right? > Let's forget about pacemaker for a moment. You have system startup where service B needs s

Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-13 Thread Andrei Borzenkov
On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl wrote: > And what exactly is your problem? Real life example. Database resource depends on storage resource(s). There are multiple filesystems/volumes with database files. Database admin needs to increase available space. You add new storage, configu

Re: [ClusterLabs] stonithd: stonith_choose_peer: Couldn't find anyone to fence with

2015-08-13 Thread Andrei Borzenkov
On Thu, Aug 13, 2015 at 2:39 PM, Kostiantyn Ponomarenko wrote: > Hi, > > Brief description of the STONITH problem: > > I see two different behaviors with two different STONITH configurations. If > Pacemaker cannot find a device that can STONITH a problematic node, the node > remains up and running

Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-16 Thread Andrei Borzenkov
17.08.2015 02:26, Andrew Beekhof пишет: On 13 Aug 2015, at 7:33 pm, Andrei Borzenkov wrote: On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl wrote: And what exactly is your problem? Real life example. Database resource depends on storage resource(s). There are multiple filesystems/volumes

Re: [ClusterLabs] MySQL resource causes error "0_monitor_20000".

2015-08-17 Thread Andrei Borzenkov
Отправлено с iPhone > 18 авг. 2015 г., в 7:19, Kiwamu Okabe написал(а): > > Hi all, > > I made master-master replication on Pacemaker. > But it causes error "0_monitor_2". It's not an error, it is just operation name. > If one of them boots Heartbeat and another doesn't, the error does

Re: [ClusterLabs] MySQL resource causes error "0_monitor_20000".

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 9:15 AM, Kiwamu Okabe wrote: > Hi Andrei, > > On Tue, Aug 18, 2015 at 2:24 PM, Andrei Borzenkov wrote: >>> I made master-master replication on Pacemaker. >>> But it causes error "0_monitor_2". >> >> It's not an err

Re: [ClusterLabs] MySQL resource causes error "0_monitor_20000".

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 11:57 AM, Kiwamu Okabe wrote: > Hi, > > On Tue, Aug 18, 2015 at 5:07 PM, Kiwamu Okabe wrote: >> ``` >> 2015-08-18 16:50:38 7081 [ERROR] Slave I/O: Fatal error: The slave I/O >> thread stops because master and slave have equal MySQL server ids; >> these ids must be differen

Re: [ClusterLabs] Antw: Re: MySQL resource causes error "0_monitor_20000".

2015-08-18 Thread Andrei Borzenkov
On Tue, Aug 18, 2015 at 3:34 PM, Ulrich Windl wrote: >>>> Kiwamu Okabe schrieb am 18.08.2015 um 11:48 in >>>> Nachricht > : >> Hi Andrei, >> >> On Tue, Aug 18, 2015 at 6:28 PM, Andrei Borzenkov >> wrote: >>> You should attach full l

Re: [ClusterLabs] Mysql M/S, binlogs - how to delete them safely without failing over first?

2015-08-18 Thread Andrei Borzenkov
19.08.2015 00:19, Attila Megyeri пишет: Hi List, We are using M/S replication in a couple of clusters, and there is an issue that has been causing headaches for me for quite some time. My problem comes from the fact that binlog files grow very quickly on both the Master and Slave nodes. Let'

Re: [ClusterLabs] STONITH when both IB interfaces are down, and how to trigger Filesystem mount/umount failure to test STONITH?

2015-08-20 Thread Andrei Borzenkov
19.08.2015 13:31, Marcin Dulak пишет: > However if instead both IPoIB interfaces go down on server-02, > the mdt is moved to server-01, but no STONITH is performed on server-02. > This is expected, because there is nothing in the configuration that > triggers > STONITH in case of IB connection loss

Re: [ClusterLabs] Pacemaker tries to demote resource that isn't running and returns OCF_FAILED_MASTER

2015-08-20 Thread Andrei Borzenkov
21.08.2015 00:35, Brian Campbell пишет: I have a master/slave resource (with a custom resource agent) which, if it uncleanly shut down, will return OCF_FAILED_MASTER on the next "monitor" operation. This seems to be what http://www.linux-ha.org/doc/dev-guides/_literal_ocf_failed_master_literal_9.

Re: [ClusterLabs] SLES 11 SP4 & csync2

2015-08-21 Thread Andrei Borzenkov
22.08.2015 08:12, Jorge Fábregas пишет: Hi everyone, I'm trying out SLES 11 SP4 with the "High-Availability Extension" on two virtual machines. I want to keep things simple & I have a question regarding the csync2 tool from SUSE. Considering that: - I'll have just two nodes - I'll be using co

Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-24 Thread Andrei Borzenkov
24.08.2015 12:35, Tom Yates пишет: I've got a failover firewall pair where the external interface is ADSL; that is, PPPoE. i've defined the service thus: primitive ExternalIP lsb:hb-adsl-helper \ op monitor interval="60s" and in addition written a noddy script /etc/init.d/hb-adsl-help

Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-24 Thread Andrei Borzenkov
24.08.2015 13:32, Tom Yates пишет: On Mon, 24 Aug 2015, Andrei Borzenkov wrote: 24.08.2015 12:35, Tom Yates пишет: I've got a failover firewall pair where the external interface is ADSL; that is, PPPoE. i've defined the service thus: If stop operation failed resource state is

Re: [ClusterLabs] VG activation on Active/Passive

2015-08-31 Thread Andrei Borzenkov
Отправлено с iPhone > 29 авг. 2015 г., в 21:51, Jorge Fábregas > написал(а): > >> On 08/29/2015 02:37 PM, Digimer wrote: >> No need for clustered LVM, only the active node should see the PV. When >> the passive takes over, after connecting to the PV, it should do a >> pvscan -> vgscan -> lvsc

Re: [ClusterLabs] a newbie --question

2015-09-15 Thread Andrei Borzenkov
On Tue, Sep 15, 2015 at 4:38 PM, wrote: > Hi, > > Thanks for reply. > The problem is Compute resource, the appY and appZ can't run on same Server. > > It is possible ? > Yes; set location constraint that appY cannot run on the same node as appZ (and vice versa). Alternatively you can set locatio

Re: [ClusterLabs] Timeout - SBD's vs Watchdog's

2015-09-15 Thread Andrei Borzenkov
15.09.2015 23:32, Jorge Fábregas пишет: The problem is that I don't want to set SBD's watchdog timeout to 1 minute (so that it matches the "hardware" watchdog) because I'll have to change msgwait to 2 minutes at least (it's too much time) so I plan to leave the defaults (5 & 10 seconds). My qu

Re: [ClusterLabs] SAPDatabase is not starting Sybase database

2015-09-16 Thread Andrei Borzenkov
On Wed, Sep 16, 2015 at 2:13 PM, Muhammad Sharfuddin wrote: > On 09/16/15 16:07, Kai Dupke wrote: > > On 09/16/2015 12:52 PM, Muhammad Sharfuddin wrote: > > Error: Command execution failed. : Database user authentication failed: > No access to SecureStore or DB_CONNECT/SYB/SADB_USER or > DB_CONNEC

Re: [ClusterLabs] SAPDatabase is not starting Sybase database

2015-09-16 Thread Andrei Borzenkov
16.09.2015 16:26, Muhammad Sharfuddin пишет: On 09/16/15 17:03, Andrei Borzenkov wrote: On Wed, Sep 16, 2015 at 2:13 PM, Muhammad Sharfuddin wrote: On 09/16/15 16:07, Kai Dupke wrote: On 09/16/2015 12:52 PM, Muhammad Sharfuddin wrote: Error: Command execution failed. : Database user

Re: [ClusterLabs] Major problem with iSCSITarget resource on top of DRBD M/S resource.

2015-09-27 Thread Andrei Borzenkov
27.09.2015 17:40, Alex Crow пишет: Hi List, I'm trying to set up a failover iSCSI storage system for oVirt using a self-hosted engine. I've set up DRBD in Master-Slave for two iSCSI targets, one for the self-hosted engine and one for the VMs. I has this all working perfectly, then after trying t

Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
On Thu, Oct 1, 2015 at 5:30 PM, Vijay Partha wrote: > Hi, > > I want to know how to disable failover. If a node undergoes a failover the > resources running on the node should not be started on the other node in the > cluster. How can this be achieved. > What exactly "node undergoes failover" mea

Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
permanently? Permanently you can define constraints. Temporary you can set is-managed to false for resources on this node (do not forget to undo it later). Or set global maintenance mode (but this affects all resources on all nodes). > On Thu, Oct 1, 2015 at 8:18 PM, Andrei Borzenkov > w

Re: [ClusterLabs] disable failover

2015-10-01 Thread Andrei Borzenkov
01.10.2015 19:09, Vijay Partha пишет: i want pacemaker to monitor the resources running on each node and at the same time restart it. It should run on the same node. Then create single node cluster. Why do you add second node if you do not want to use it? On Thu, Oct 1, 2015 at 9:17 PM, Ka

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
17.02.2019 0:03, Eric Robinson пишет: > Here are the relevant corosync logs. > > It appears that the stop action for resource p_mysql_002 failed, and that > caused a cascading series of service changes. However, I don't understand > why, since no other resources are dependent on p_mysql_002. >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
ent away from its current node. In this particular case it may be argued that pacemaker reaction is unjustified. Administrator explicitly set target state to "stop" (otherwise pacemaker would not attempt to stop it) so it is unclear why it tries to restart it on other node. >> -

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-17 Thread Andrei Borzenkov
17.02.2019 0:33, Andrei Borzenkov пишет: > 17.02.2019 0:03, Eric Robinson пишет: >> Here are the relevant corosync logs. >> >> It appears that the stop action for resource p_mysql_002 failed, and that >> caused a cascading series of service changes. However, I don

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-17 Thread Andrei Borzenkov
17.02.2019 0:44, Eric Robinson пишет: > Thanks for the feedback, Andrei. > > I only want cluster failover to occur if the filesystem or drbd resources > fail, or if the cluster messaging layer detects a complete node failure. Is > there a way to tell PaceMaker not to trigger a cluster failover i

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Andrei Borzenkov
19.02.2019 23:06, Eric Robinson пишет: ... > Bottom line is, how do we configure the cluster in such a way that > there are no cascading circumstances when a MySQL resource fails? > Basically, if a MySQL resource fails, it fails. We'll deal with that > on an ad-hoc basis. I don't want the whole clu

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Andrei Borzenkov
18.02.2019 18:53, Ken Gaillot пишет: > On Sun, 2019-02-17 at 20:33 +0300, Andrei Borzenkov wrote: >> 17.02.2019 0:33, Andrei Borzenkov пишет: >>> 17.02.2019 0:03, Eric Robinson пишет: >>>> Here are the relevant corosync logs. >>>> >>>> It

Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Andrei Borzenkov
20.02.2019 21:51, Eric Robinson пишет: > > The following should show OK in a fixed font like Consolas, but the following > setup is supposed to be possible, and is even referenced in the ClusterLabs > documentation. > > > > > > +--+ > > | mysql001 +--+ > > +--+

Re: [ClusterLabs] NFS4 share not working

2019-02-22 Thread Andrei Borzenkov
23.02.2019 2:57, solarflow99 пишет: > I'm trying to have my NFS share exported via pacemaker and now it doesn't > seem to be working, it also kills off nfs-mountd. It looks like the rbd > device could have something to do with it, the nfsroot doesn't get > exported, but there's no indication why:

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
25.02.2019 11:50, Samarth Jain пишет: > Hi, > > > We have a bunch of resources running in master slave configuration with one > master and one slave instance running at any given time. > > What we observe is, that for any two given resources at a time, if say > resource Stateful_Test_1 is in mid

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
25.02.2019 22:36, Andrei Borzenkov пишет: > >> Could you please help me understand: >> 1. Why doesn't pacemaker process the failure of Stateful_Test_2 resource >> immediately after first failure? > I'm still not sure why. > I vaguely remember something

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
26.02.2019 1:08, Ken Gaillot пишет: > On Mon, 2019-02-25 at 23:00 +0300, Andrei Borzenkov wrote: >> 25.02.2019 22:36, Andrei Borzenkov пишет: >>> >>>> Could you please help me understand: >>>> 1. Why doesn't pacemaker process the failure of State

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-25 Thread Andrei Borzenkov
25.02.2019 23:13, Ken Gaillot пишет: > On Mon, 2019-02-25 at 14:20 +0530, Samarth Jain wrote: >> Hi, >> >> >> We have a bunch of resources running in master slave configuration >> with one master and one slave instance running at any given time. >> >> What we observe is, that for any two given reso

Re: [ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

2019-02-26 Thread Andrei Borzenkov
26.02.2019 18:05, Ken Gaillot пишет: > On Tue, 2019-02-26 at 06:55 +0300, Andrei Borzenkov wrote: >> 26.02.2019 1:08, Ken Gaillot пишет: >>> On Mon, 2019-02-25 at 23:00 +0300, Andrei Borzenkov wrote: >>>> 25.02.2019 22:36, Andrei Borzenkov пишет: >>>>&g

Re: [ClusterLabs] Two mode cluster VMware drbd

2019-03-12 Thread Andrei Borzenkov
12.03.2019 18:10, Adam Budziński пишет: > Hello, > > > > I’m planning to setup a two node (active-passive) HA cluster consisting of > pacemaker, corosync and DRBD. The two nodes will run on VMware VM’s and > connect to a single DB server (unfortunately for various reasons not > included in the c

Re: [ClusterLabs] Interface confusion

2019-03-15 Thread Andrei Borzenkov
16.03.2019 1:16, Adam Budziński пишет: > Hi Tomas, > > Ok but how then pacemaker or the fence agent knows which route to take to > reach the vCenter? They do not know or care at all. It is up to your underlying operating system and its routing tables. > Btw. Do I have to add the stonith resource

Re: [ClusterLabs] Interface confusion

2019-03-15 Thread Andrei Borzenkov
ode where stonith agent is not prohibited to run by (co-)location rules. My understanding is that this node is selected by DC in partition. > Thank you! > > sob., 16.03.2019, 05:37 użytkownik Andrei Borzenkov > napisał: > >> 16.03.2019 1:16, Adam Budziński пишет: >>> Hi

Re: [ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

2019-03-22 Thread Andrei Borzenkov
On Fri, Mar 22, 2019 at 1:08 PM Jan Pokorný wrote: > > Also a Friday's idea: > Perhaps we should crank up "how to ask" manual for this list Yest another one? http://www.catb.org/~esr/faqs/smart-questions.html ___ Manage your subscription: https://list

Re: [ClusterLabs] Apache graceful restart not supported by heartbeat apache control script

2019-03-25 Thread Andrei Borzenkov
25.03.2019 20:42, Cole Miller пишет: > Hi users@clusterlabs.org, > > My current project at work is a two node cluster running apache and > virtual IPs on CentOS 7. I found in my testing that apache when run > by corosync does not have a reload or graceful restart. Before the > cluster, when apache

Re: [ClusterLabs] Unable to restart resources

2019-03-26 Thread Andrei Borzenkov
26.03.2019 18:33, JCA пишет: > Making some progress with Pacemaker/DRBD, but still trying to grasp some of > the basics of this framework. Here is my current situation: > > I have a two-node cluster, pmk1 and pmk2, with resources ClusterIP and > DrbdFS. In what follows, commands preceded by '[pmk1

Re: [ClusterLabs] Colocation constraint moving resource

2019-03-26 Thread Andrei Borzenkov
26.03.2019 17:14, Ken Gaillot пишет: > On Tue, 2019-03-26 at 14:11 +0100, Thomas Singleton wrote: >> Dear all >> >> I am encountering an issue with colocation constraints. >> >> I have created a 4 nodes cluster (3 "main" and 1 "spare") with 3 >> resources and I wish to have each resource run only

[ClusterLabs] crmsh init node status in new (empty) CIB

2019-03-30 Thread Andrei Borzenkov
I tried to use crmsh on new empty CIB to play some simulations. It was almost there except there is apparently no way to initialize node state and without existing node state I cannot change or set node state (and of course cannot simulate any useful cluster transition): node_node = get_ta

Re: [ClusterLabs] Antw: Re: Colocation constraint moving resource

2019-03-30 Thread Andrei Borzenkov
26.03.2019 um >>>>>> 20:28 in >>> >>> Nachricht >>> <1d8d000ab946586783fc9adec3063a1748a5b06f.ca...@redhat.com>: >>>> On Tue, 2019-03-26 at 22:12 +0300, Andrei Borzenkov wrote: >>>>> 26.03.2019 17:14, Ken Gaillot пишет: >>>>>> On

Re: [ClusterLabs] Configuring pacemaker to migrate a group of co-located resources if any of them fail

2019-04-01 Thread Andrei Borzenkov
02.04.2019 3:24, Chris Dewbery пишет: > Hi, > > I have a two node cluster running pacemaker 2.0, which I would like to run in > an > active/standby model, that is, where all of the resources run on the active > node > and will be migrated to the standby node in the case where any resource on >

Re: [ClusterLabs] Issue with DB2 HADR cluster

2019-04-02 Thread Andrei Borzenkov
02.04.2019 19:32, Dileep V Nair пишет: > > > Hi, > > I have a two node DB2 Cluster with pacemaker and HADR. When I issue a > reboot -f on the node where Primary Database is running, I expect the > Standby database to be promoted as Primary. But what is happening is > pacemaker waits for 18

Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster

2019-04-03 Thread Andrei Borzenkov
On Wed, Apr 3, 2019 at 10:14 AM Ulrich Windl wrote: > > >>> Digimer schrieb am 02.04.2019 um 19:49 in Nachricht > <6c6302f4-844b-240d-8d0e-727dddf36...@alteeve.ca>: > > [...] > > It's worth noting that SBD fencing is "better than nothing", but slow. > > IPMI and/or PDU fencing completes a lot fas

Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster

2019-04-03 Thread Andrei Borzenkov
On Wed, Apr 3, 2019 at 10:26 AM Valentin Vidic wrote: > > On Wed, Apr 03, 2019 at 09:13:58AM +0200, Ulrich Windl wrote: > > I'm surprised: Once sbd writes the fence command, it usually takes > > less than 3 seconds until the victim is dead. If you power off a > > server, the PDU still may have one

[ClusterLabs] How to reduce SBD watchdog timeout?

2019-04-03 Thread Andrei Borzenkov
On Tue, Apr 2, 2019 at 8:49 PM Digimer wrote: > > It's worth noting that SBD fencing is "better than nothing", but slow. > IPMI and/or PDU fencing completes a lot faster. > Current GIT documentation says Set watchdog timeout to N seconds. This depends mostly on your storage latency; the majority

Re: [ClusterLabs] How to reduce SBD watchdog timeout?

2019-04-07 Thread Andrei Borzenkov
03.04.2019 13:04, Klaus Wenninger пишет: > On 4/3/19 9:47 AM, Andrei Borzenkov wrote: >> On Tue, Apr 2, 2019 at 8:49 PM Digimer wrote: >>> It's worth noting that SBD fencing is "better than nothing", but slow. >>> IPMI and/or PDU fencing completes a lo

Re: [ClusterLabs] SBD as watchdog daemon

2019-04-14 Thread Andrei Borzenkov
12.04.2019 15:30, Олег Самойлов пишет: > >> 11 апр. 2019 г., в 20:00, Klaus Wenninger >> написал(а): >> >> On 4/11/19 5:27 PM, Олег Самойлов wrote: >>> Hi all. I am developing HA PostgreSQL cluster for 2 or 3 >>> datacenters. In case of DataCenter failure (blackout) the fencing >>> will not work

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-20 Thread Andrei Borzenkov
20.04.2019 22:29, Lentes, Bernd пишет: > > > - Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com: > >> >> Simply stopping pacemaker and corosync by whatever mechanism your >> distribution uses (e.g. systemctl) should be sufficient. > > That works. But strangely is that after a r

Re: [ClusterLabs] shutdown of 2-Node cluster when power outage

2019-04-21 Thread Andrei Borzenkov
21.04.2019 16:32, Lentes, Bernd пишет: > - Am 21. Apr 2019 um 6:51 schrieb Andrei Borzenkov arvidj...@gmail.com: > >> 20.04.2019 22:29, Lentes, Bernd пишет: >>> >>> >>> - Am 18. Apr 2019 um 16:21 schrieb kgaillot kgail...@redhat.com: >>> >

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-04-26 Thread Andrei Borzenkov
27.04.2019 1:04, Danka Ivanović пишет: > Hi, here is a complete cluster configuration: > > node 1: master > node 2: secondary > primitive AWSVIP awsvip \ > params secondary_private_ip=10.x.x.x api_delay=5 > primitive PGSQL pgsqlms \ > params pgdata="/var/lib/postgresql/9.5/main" >

[ClusterLabs] Inconsistent clone $OCF_RESOURCE_INSTANCE value depending on symmetric-cluster property.

2019-04-27 Thread Andrei Borzenkov
Documentation says for clone resources OCF_RESOURCE_INSTANCE contains primitive qualified by instance number, like primitive:1. I was rather surprised that pacemaker may actually omit qualification at least in the following case: 1. *start* pacemaker with symmetric-cluster=false 2. do not add con

[ClusterLabs] Timeout stopping corosync-qdevice service

2019-04-27 Thread Andrei Borzenkov
I setup qdevice in openSUSE Tumbleweed and while it works as expected I cannot stop it - it always results in timeout and service finally gets killed by systemd. Is it a known issue? TW is having quite up-to-date versions, it usually follows upstream GIT pretty closely. ___

Re: [ClusterLabs] Fwd: Postgres pacemaker cluster failure

2019-04-29 Thread Andrei Borzenkov
29.04.2019 18:05, Ken Gaillot пишет: >> >>> Why does not it check OCF_RESKEY_CRM_meta_notify? >> >> I was just not aware of this env variable. Sadly, it is not >> documented >> anywhere :( > > It's not a Pacemaker-created value like the other notify variables -- > all user-specified meta-attribute

Re: [ClusterLabs] Timeout stopping corosync-qdevice service

2019-04-29 Thread Andrei Borzenkov
29.04.2019 14:32, Jan Friesse пишет: > Andrei, > >> I setup qdevice in openSUSE Tumbleweed and while it works as expected I > > Is it corosync-qdevice or corosync-qnetd daemon? > corosync-qdevice >> cannot stop it - it always results in timeout and service finally gets >> killed by systemd. >>

[ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-29 Thread Andrei Borzenkov
As soon as majority of nodes are stopped, the remaining nodes are out of quorum and watchdog reboot kicks in. What is the correct procedure to ensure nodes are stopped in clean way? Short of disabling stonith-watchdog-timeout before stopping cluster ...

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Andrei Borzenkov
30.04.2019 9:53, Digimer пишет: > On 2019-04-30 12:07 a.m., Andrei Borzenkov wrote: >> As soon as majority of nodes are stopped, the remaining nodes are out of >> quorum and watchdog reboot kicks in. >> >> What is the correct procedure to ensure nodes are stoppe

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Andrei Borzenkov
about dynamic cluster expansion; the question is about normal static cluster with fixed number of nodes that needs to be shut down. >> 30 апр. 2019 г., в 7:07, Andrei Borzenkov написал(а): >> >> As soon as majority of nodes are stopped, the remaining nodes are out of >> quor

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Andrei Borzenkov
30.04.2019 19:34, Олег Самойлов пишет: > >> No. I simply want reliable way to shutdown the whole cluster (for >> maintenance). > > Official way is `pcs cluster stop --all`. pcs is just one of multiple high level tools. I am interested in plumbing, not porcelain. > But it’s not always worked as

Re: [ClusterLabs] Timeout stopping corosync-qdevice service

2019-04-30 Thread Andrei Borzenkov
30.04.2019 9:51, Jan Friesse пишет: > >> Now, corosync-qdevice gets SIGTERM as "signal to terminate", but it >> installs SIGTERM handler that does not exit and only closes some socket. >> May be this should trigger termination of main loop, but somehow it does >> not. > > Yep, this is exactly how

Re: [ClusterLabs] crm_mon output to html-file - is there a way to manipulate the html-file ?

2019-05-03 Thread Andrei Borzenkov
03.05.2019 20:18, Lentes, Bernd пишет: > Hi, > > on my cluster nodes i established a systemd service which starts crm_mon > which writes cluster information into a html-file so i can see the state > of my cluster in a webbrowser. > crm_mon is started that way: > /usr/sbin/crm_mon -d -i 10 -h /

[ClusterLabs] corosync-qdevice[3772]: Heuristics worker waitpid failed (10): No child processes

2019-05-04 Thread Andrei Borzenkov
While testing corosync-qdevice I repeatedly got the above message. The reason seems to be startup sequence in corosync-qdevice. Consider: ● corosync-qdevice.service - Corosync Qdevice daemon Loaded: loaded (/etc/systemd/system/corosync-qdevice.service; disabled; vendor preset: disabled) Act

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-05-04 Thread Andrei Borzenkov
30.04.2019 19:47, Олег Самойлов пишет: > > >> 30 апр. 2019 г., в 19:38, Andrei Borzenkov >> написал(а): >> >> 30.04.2019 19:34, Олег Самойлов пишет: >>> >>>> No. I simply want reliable way to shutdown the whole cluster >>>> (for

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-05 Thread Andrei Borzenkov
05.05.2019 16:14, Arkadiy Kulev пишет: > Hello! > > I run pacemaker on 2 active/active hosts which balance the load of 2 public > IP addresses. > A few days ago we ran a very CPU/network intensive process on one of the 2 > hosts and Pacemaker failed. > > I've attached a screenshot of the terminal

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-05 Thread Andrei Borzenkov
e failed prerequisite was successful stop of resource. > Sincerely, > Ark. > > e...@ethaniel.com > > > On Sun, May 5, 2019 at 9:46 PM Andrei Borzenkov wrote: > >> 05.05.2019 16:14, Arkadiy Kulev пишет: >>> Hello! >>> >>> I run pacemaker on 2 ac

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-05 Thread Andrei Borzenkov
> On Sun, May 5, 2019 at 11:05 PM Andrei Borzenkov > wrote: > >> 05.05.2019 18:43, Arkadiy Kulev пишет: >>> Dear Andrei, >>> >>> I'm sorry for the screenshot, this is the only thing that I have left >> after >>> the crash. >>>

Re: [ClusterLabs] monitor timed out with unknown error

2019-05-05 Thread Andrei Borzenkov
On Mon, May 6, 2019 at 8:30 AM Arkadiy Kulev wrote: > > Andrei, > > I just went through the docs > (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-failure-migration.html) > and it says that the option "failure-timeout" is responsible for retrying a > failed

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-05-12 Thread Andrei Borzenkov
30.04.2019 9:53, Digimer пишет: > On 2019-04-30 12:07 a.m., Andrei Borzenkov wrote: >> As soon as majority of nodes are stopped, the remaining nodes are out of >> quorum and watchdog reboot kicks in. >> >> What is the correct procedure to ensure nodes are stoppe

Re: [ClusterLabs] Constant stop/start of resource in spite of interval=0

2019-05-18 Thread Andrei Borzenkov
18.05.2019 18:34, Kadlecsik József пишет: > Hello, > > We have a resource agent which creates IP tunnels. In spite of the > configuration setting > > primitive tunnel-eduroam ocf:local:tunnel \ > params > op start timeout=120s interval=0 \ > op stop timeout=300s inte

<    1   2   3   4   5   6   7   8   >