Re: [ClusterLabs] cross DC cluster using public ip?

2016-10-13 Thread Jan Friesse
neeraj ch napsal(a): Hello , We are testing out corosync and pacemaker for DB high availability on the cloud. I was able to set up a cluster with in a DC using corosync 1.4 and pacemaker 1.12. It works great and I wanted to try a cross DC cluster. I was using unicast as multicast was disabled by

Re: [ClusterLabs] OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Eric Ren
Hi, On 10/10/2016 10:46 PM, Ulrich Windl wrote: Hi! I observed an interesting thing: In a three node cluster (SLES11 SP4) with cLVM and OCFS2 on top, one node was fenced as the OCFS2 filesystem was somehow busy on unmount. We have (for paranoid reasons mainly) an excessive long fencing timou

Re: [ClusterLabs] cross DC cluster using public ip?

2016-10-13 Thread Klaus Wenninger
On 10/13/2016 09:30 AM, Jan Friesse wrote: > neeraj ch napsal(a): >> Hello , >> >> We are testing out corosync and pacemaker for DB high availability on >> the >> cloud. I was able to set up a cluster with in a DC using corosync 1.4 >> and >> pacemaker 1.12. It works great and I wanted to try a cro

Re: [ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Eric Ren
Hi, On 10/11/2016 02:18 PM, Ulrich Windl wrote: { emmanuel segura schrieb am 10.10.2016 um 16:49 in Nachricht : Node h01 (old DC) was fenced at Oct 10 10:06:33 Node h01 went down around Oct 10 10:06:37. DLM noticed that on node h05: Oct 10 10:06:44 h05 cluster-dlm[12063]: dlm_process_no

[ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Ulrich Windl
>>> Eric Ren schrieb am 13.10.2016 um 09:31 in Nachricht : > Hi, > > On 10/10/2016 10:46 PM, Ulrich Windl wrote: >> Hi! >> >> I observed an interesting thing: In a three node cluster (SLES11 SP4) with > cLVM and OCFS2 on top, one node was fenced as the OCFS2 filesystem was > somehow busy on unm

Re: [ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread emmanuel segura
If you want to reduce the multipath switching time, when one controller goes down https://www.redhat.com/archives/dm-devel/2009-April/msg00266.html 2016-10-13 10:27 GMT+02:00 Ulrich Windl : Eric Ren schrieb am 13.10.2016 um 09:31 in Nachricht > : >> Hi, >> >> On 10/10/2016 10:46 PM, Ulrich W

[ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Ulrich Windl
>>> Eric Ren schrieb am 13.10.2016 um 09:48 in Nachricht <73f764d0-75e7-122f-ff4e-d0b27dbdd...@suse.com>: [...] >> When assuming node h01 still lived when communication failed, wouldn't > quorum prevent h01 from doing anything with DLM and OCFS2 anyway? > Not sure I understand you correctly. By d

Re: [ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Eric Ren
Hi, On 10/13/2016 04:36 PM, Ulrich Windl wrote: Eric Ren schrieb am 13.10.2016 um 09:48 in Nachricht <73f764d0-75e7-122f-ff4e-d0b27dbdd...@suse.com>: [...] When assuming node h01 still lived when communication failed, wouldn't quorum prevent h01 from doing anything with DLM and OCFS2 anyway?

Re: [ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Eric Ren
Hi, On 10/13/2016 04:27 PM, Ulrich Windl wrote: So I'm wondering why it takes so long to finish the fencing process? As I wrote: Using SBD this is paranoia (as fencing doesn't report back a status like "completed" or "failed". Actually the fencing only needs a few seconds, but the timeout is

[ClusterLabs] Unexpected Resource movement after failover

2016-10-13 Thread Nikhil Utane
Hi, I have 5 nodes and 4 resources configured. I have configured constraint such that no two resources can be co-located. I brought down a node (which happened to be DC). I was expecting the resource on the failed node would be migrated to the 5th waiting node (that is not running any resource). H

Re: [ClusterLabs] Unexpected Resource movement after failover

2016-10-13 Thread Nikhil Utane
Additional info, -Nikhil On Thu, Oct 13, 2016 at 7:29 PM, Nikhil Utane wrote: > Hi, > > I have 5 nodes and 4 resources configured. > I have configured constraint such that no two resources can be co-located. > I brought down a node (which happened to be DC). I was expecting the > resource on

[ClusterLabs] Antw: Unexpected Resource movement after failover

2016-10-13 Thread Ulrich Windl
Hi! Don't you need 10 constraints, excluding every possible pair of your 5 resources (named A-E here), like in this table (produced with R): [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] "A" "A" "A" "A" "B" "B" "B" "C" "C" "D" [2,] "B" "C" "D" "E" "C" "D" "E" "D"

Re: [ClusterLabs] Unexpected Resource movement after failover

2016-10-13 Thread Andrei Borzenkov
On Thu, Oct 13, 2016 at 4:59 PM, Nikhil Utane wrote: > Hi, > > I have 5 nodes and 4 resources configured. > I have configured constraint such that no two resources can be co-located. > I brought down a node (which happened to be DC). I was expecting the > resource on the failed node would be migra

Re: [ClusterLabs] Antw: Unexpected Resource movement after failover

2016-10-13 Thread Nikhil Utane
Ulrich, I have 4 resources only (not 5, nodes are 5). So then I only need 6 constraints, right? [,1] [,2] [,3] [,4] [,5] [,6] [1,] "A" "A" "A""B" "B""C" [2,] "B" "C" "D" "C" "D""D" I understand that if I configure constraint of R1 with R2 score as -infinity, th

Re: [ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-13 Thread Ken Gaillot
On 10/13/2016 03:36 AM, Ulrich Windl wrote: > That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster > is split-brain 1:2), the single node CANNOT continue due to lack of quorum, > while the remaining two nodes can. Is it still necessary to wait for > completion of stonith?

Re: [ClusterLabs] Unexpected Resource movement after failover

2016-10-13 Thread Nikhil Utane
Andrei, *"It would help if you told which node and which resources, so your configuration could be interpreted in context. "* Any resource can run on any node as long as it is not running any other resource. *"so "a not with b" does not imply "b not with a". So first pacemaker decided where to p

[ClusterLabs] Replicated PGSQL woes

2016-10-13 Thread Israel Brewster
Summary: Two-node cluster setup with latest pgsql resource agent. Postgresql starts initially, but failover never happens.Details:I'm trying to get a cluster set up with Postgresql 9.6 in a streaming replication using named slots scenario. I'm using the latest pgsql Resource Agent, which does appea

Re: [ClusterLabs] Replicated PGSQL woes

2016-10-13 Thread Ken Gaillot
On 10/13/2016 12:04 PM, Israel Brewster wrote: > Summary: Two-node cluster setup with latest pgsql resource agent. > Postgresql starts initially, but failover never happens. > > Details: > > I'm trying to get a cluster set up with Postgresql 9.6 in a streaming > replication using named slots scen

Re: [ClusterLabs] Replicated PGSQL woes

2016-10-13 Thread Israel Brewster
On Oct 13, 2016, at 9:41 AM, Ken Gaillot wrote: > > On 10/13/2016 12:04 PM, Israel Brewster wrote: >> Summary: Two-node cluster setup with latest pgsql resource agent. >> Postgresql starts initially, but failover never happens. >> >> Details: >> >> I'm trying to get a cluster set up with Postgr

Re: [ClusterLabs] cross DC cluster using public ip?

2016-10-13 Thread neeraj ch
Hello Thank you for taking the time to respond. In my setup the public IP is not on the box , the box is attached to a private network and packets to the public IP I think are just forwarded to the private IP. When I tried using the local private address as the bind address , public address as

Re: [ClusterLabs] cross DC cluster using public ip?

2016-10-13 Thread Les Green
Corosync does not work with NAT. At least I tried for AGES and could not get it to. Easiest is to set up a VPN between the sites or servers for just the corosync traffic. On 13.10.2016 22:14, neeraj ch wrote: > Hello > > Thank you for taking the time to respond. > > In my setup the public IP

Re: [ClusterLabs] Replicated PGSQL woes

2016-10-13 Thread Jehan-Guillaume de Rorthais
On Thu, 13 Oct 2016 10:05:33 -0800 Israel Brewster wrote: > On Oct 13, 2016, at 9:41 AM, Ken Gaillot wrote: > > > > On 10/13/2016 12:04 PM, Israel Brewster wrote: [...] > >> But whatever- this is a cluster, it doesn't really matter which node > >> things are running on, as long as they are

Re: [ClusterLabs] Replicated PGSQL woes

2016-10-13 Thread Israel Brewster
On Oct 13, 2016, at 1:56 PM, Jehan-Guillaume de Rorthais wrote: > > On Thu, 13 Oct 2016 10:05:33 -0800 > Israel Brewster wrote: > >> On Oct 13, 2016, at 9:41 AM, Ken Gaillot wrote: >>> >>> On 10/13/2016 12:04 PM, Israel Brewster wrote: > [...] > But whatever- this is a cluster, it do