[ClusterLabs] Booth fail-over conditions

2018-04-13 Thread Zach Anderson
Hey all, new user to pacemaker/booth and I'm fumbling my way through my first proof of concept. I have a 2 site configuration setup with local pacemaker clusters at each site (running rabbitmq) and a booth arbitrator. I've successfully validated the base failover when the "granted" site has

Re: [ClusterLabs] General Capabilities Question

2018-04-13 Thread Ken Gaillot
On Thu, 2018-04-12 at 21:09 -0700, Cliff Burdick wrote: > Hi, I had a general question about Pacemaker to see if it would work > for a somewhat unique situation. I have a cluster of 10 active > machines + 2 standby that each have 3 interfaces (2 control, 1 > management). I want each of the control

Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!

2018-04-13 Thread Jan Friesse
Ferenc Wágner napsal(a): Jan Pokorný writes: On 12/04/18 14:33 +0200, Jan Friesse wrote: This release contains a lot of fixes, including fix for CVE-2018-1084. Security related updates would preferably provide more context Absolutely, thanks for providing that!

Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!

2018-04-13 Thread Ferenc Wágner
Jan Friesse writes: > Ferenc Wágner napsal(a): > >> I wonder if c139255 (totemsrp: Implement sanity checks of received >> msgs) has direct security relevance as well. > > Not entirely direct, but quite similar. > >> Should I include that too in the Debian security update?

Re: [ClusterLabs] No slave is promoted to be master

2018-04-13 Thread Jehan-Guillaume de Rorthais
OK, I know what happen. It seems like your standbies were not replicating when the master "crashed", you can find tons of messages like this in the log files: WARNING: No secondary connected to the master WARNING: "db2" is not connected to the primary WARNING: "db3" is not connected to the

Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

2018-04-13 Thread Oyvind Albrigtsen
On 13/04/18 11:53 +0200, Nicolas Huillard wrote: Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit : On 13/04/18 11:07 +0200, Nicolas Huillard wrote: > One of my resources is a pppd process, which is started with the > heartbeat/anything RA. That RA just spawn the pppd process

Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

2018-04-13 Thread Nicolas Huillard
Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit : > On 13/04/18 11:07 +0200, Nicolas Huillard wrote: > > One of my resources is a pppd process, which is started with the > > heartbeat/anything RA. That RA just spawn the pppd process with the > > correct parameters and return

[ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

2018-04-13 Thread Nicolas Huillard
Hello all, One of my resources is a pppd process, which is started with the heartbeat/anything RA. That RA just spawn the pppd process with the correct parameters and return OCF_SUCCESS if the process started. The problem is that the service provided by pppd is only available after some time (a

Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

2018-04-13 Thread Oyvind Albrigtsen
On 13/04/18 11:07 +0200, Nicolas Huillard wrote: Hello all, One of my resources is a pppd process, which is started with the heartbeat/anything RA. That RA just spawn the pppd process with the correct parameters and return OCF_SUCCESS if the process started. The problem is that the service

Re: [ClusterLabs] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

2018-04-13 Thread Nicolas Huillard
Le vendredi 13 avril 2018 à 11:59 +0200, Oyvind Albrigtsen a écrit : > On 13/04/18 11:53 +0200, Nicolas Huillard wrote: > > Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a > > écrit : > > The issue here is the monitor will at first return a "fail", which > > is considered fatal by

Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!

2018-04-13 Thread Jan Friesse
Ferenc Wágner napsal(a): Jan Friesse writes: Ferenc Wágner napsal(a): I wonder if c139255 (totemsrp: Implement sanity checks of received msgs) has direct security relevance as well. Not entirely direct, but quite similar. Should I include that too in the Debian

Re: [ClusterLabs] [solved] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

2018-04-13 Thread Nicolas Huillard
Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit : > On 13/04/18 11:07 +0200, Nicolas Huillard wrote: > > I figured that fixing this would require to add a monitor call > > inside the start operation, and wait for a successful monitor > > before returning OCF_SUCCESS, within the

Re: [ClusterLabs] [solved] heartbeat/anything Resource Agent : "wait for proper service before ending the start operation"

2018-04-13 Thread Oyvind Albrigtsen
On 13/04/18 14:11 +0200, Nicolas Huillard wrote: Le vendredi 13 avril 2018 à 11:15 +0200, Oyvind Albrigtsen a écrit : On 13/04/18 11:07 +0200, Nicolas Huillard wrote: > I figured that fixing this would require to add a monitor call > inside the start operation, and wait for a successful monitor

Re: [ClusterLabs] Failing operations immediately when node is known to be down

2018-04-13 Thread Ken Gaillot
On Tue, 2018-04-10 at 12:56 -0500, Ryan Thomas wrote: > I’m trying to implement a HA solution which recovers very quickly > when a node fails.  It my configuration, when I reboot a node, I see > in the logs that pacemaker realizes the node is down, and decides to > move all resources to the

[ClusterLabs] General Capabilities Question

2018-04-13 Thread Cliff Burdick
Hi, I had a general question about Pacemaker to see if it would work for a somewhat unique situation. I have a cluster of 10 active machines + 2 standby that each have 3 interfaces (2 control, 1 management). I want each of the control interfaces to use virtual IPs, such that if any of those 10

Re: [ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

2018-04-13 Thread emmanuel segura
the first thing that you need to configure is the stonith, because you have this constraint "constraint order promote DrbdResClone then start HALVM" To recover and promote drbd to master when you crash a node, configurare the drbd fencing handler. pacemaker execute monitor in both nodes, so this

[ClusterLabs] HALVM monitor action fail on slave node. Possible bug?

2018-04-13 Thread Marco Marino
Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM (ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I decided to write this long post. I need to really understand what I'm doing and where I'm doing wrong. More precisely, I'm configuring a pacemaker

[ClusterLabs] HAProxy resource agent

2018-04-13 Thread Tomer Azran
Hello, I'm planning to install an active\active HAProxy cluster on CentOS 7 I didn't found that there is any RA for HAproxy. I found some on the net but I'm not sure if I need it. For example: https://raw.githubusercontent.com/thisismitch/cluster-agents/master/haproxy I can always use the

Re: [ClusterLabs] HAProxy resource agent

2018-04-13 Thread RaSca
On 13/04/2018 17:56, Tomer Azran wrote: > Hello, > I'm planning to install an active\active HAProxy cluster on CentOS 7 > I didn't found that there is any RA for HAproxy. > I found some on the net but I'm not sure if I need it. For example: >

[ClusterLabs] Pacemaker resources are not scheduled

2018-04-13 Thread lkxjtu
My cluster version: Corosync 2.4.0 Pacemaker 1.1.16 There are many resource anomalies. Some resources are only monitored and not recovered. Some resources are not monitored or recovered. Only one resource of vnm is scheduled normally, but this resource cannot be started because other resources