Re: [ClusterLabs] Pacemaker PostgreSQL cluster

2018-05-29 Thread Ken Gaillot
On Tue, 2018-05-29 at 22:25 +0200, Salvatore D'angelo wrote:
> Hi,
> 
> Regarding last question about pacemaker dependencies for Ubuntu I
> found this for 1.1.18:
> https://launchpad.net/ubuntu/+source/pacemaker/1.1.18-0ubuntu2/+build
> /14818856
> 
> It’s not clear to me why pacemaker 1.1.18 is available on
> launchpad.net and not on the official Ubuntu Search Packages website.
> However, can I assume 1.1.19 and 2.2.0 have the same dependencies
> list (considering they have only removed deprecated function and
> applied some bug fixes)?

Yes, the dependencies should be the same (when corosync 2 is used)

> Thanks again for answers
> 
> 
> > On 29 May 2018, at 17:41, Jehan-Guillaume de Rorthais  > com> wrote:
> > 
> > On Tue, 29 May 2018 14:23:31 +0200
> > Salvatore D'angelo  wrote:
> > ...
> > > 2. I read some documentation about upgrade and since we want 0 ms
> > > downtime I
> > > think the Rolling Upgrade (node by node) is the better approach.
> > 
> > The 0ms upgrade is almost impossible. At some point, you will have
> > to move the
> > master somewhere else.
> > 
> > Unless you have some session management that are able to wait for
> > the
> > current sessions to finish, then hold the incoming sessions while
> > you are
> > moving the master, you will have downtime and/or xact rollback.
> > 
> > Good luck anyway :)
> > 
> > -- 
> > Jehan-Guillaume de Rorthais
> > Dalibo
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Casey & Gina
> On May 27, 2018, at 2:28 PM, Ken Gaillot  wrote:
> 
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info:
>> determine_op_status: Operation monitor found resource postgresql-10-
>> main:2 active on d-gp2-dbpg0-2
> 
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine:   notice:
>> LogActions:  Demote  postgresql-10-main:1(Master -> Slave d-gp2-
>> dbpg0-1)
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine:   notice:
>> LogActions:  Recover postgresql-10-main:1(Master d-gp2-dbpg0-1)
> 
> From the above, we can see that the initial probe after the node
> rejoined found that the resource was already running in master mode
> there (at least, that's what the agent thinks). So, the cluster wants
> to demote it, stop it, and start it again as a slave.

Are you sure you're reading the above correctly?  The first line you quoted 
says the resource is already active on node 2, which is not the node that was 
restarted, and is the node that took over as master after I powered node 1 off.

Anyways I enabled debug logging in corosync.conf, and I now see the following 
information:

May 29 20:59:28 [10583] d-gp2-dbpg0-2 crm_resource:debug: 
determine_op_status:  postgresql-10-main_monitor_0 on d-gp2-dbpg0-1 
returned 'master (failed)' (9) instead of the expected value: 'not running' (7)
May 29 20:59:28 [10583] d-gp2-dbpg0-2 crm_resource:  warning: 
unpack_rsc_op_failure:Processing failed op monitor for postgresql-10-main:1 
on d-gp2-dbpg0-1: master (failed) (9)
May 29 20:59:28 [10583] d-gp2-dbpg0-2 crm_resource:debug: 
determine_op_status:  postgresql-10-main_monitor_0 on d-gp2-dbpg0-1 
returned 'master (failed)' (9) instead of the expected value: 'not running' (7)
May 29 20:59:28 [10583] d-gp2-dbpg0-2 crm_resource:  warning: 
unpack_rsc_op_failure:Processing failed op monitor for postgresql-10-main:1 
on d-gp2-dbpg0-1: master (failed) (9)

I'm not sure why these lines appear twice (same question I've had in the past 
about some log messages), but it seems that whatever it's doing to check the 
status of the resource, it is correctly determining that PostgreSQL failed 
while in master state, rather than being shut down cleanly.  Why this results 
in the node being fenced is beyond me.

I don't feel that I'm trying to do anything complex - just have a simple 
cluster that handles PostgreSQL failover.  I'm not trying to do anything fancy 
and am pretty much following the PAF docs, plus the addition of the fencing 
resource (which it says it requires to work properly - if this is "properly" I 
don't understand what goal it is trying to achieve...).  I'm getting really 
frustrated with pacemaker as I've been fighting hard to try to get it working 
for two months now and still feel in the dark about why it's behaving the way 
it is.  I'm sorry if I seem like an idiot...this definitely makes me feel like 
one...


Here is my configuration again, in case it helps:

Cluster Name: d-gp2-dbpg0
Corosync Nodes:
 d-gp2-dbpg0-1 d-gp2-dbpg0-2 d-gp2-dbpg0-3
Pacemaker Nodes:
 d-gp2-dbpg0-1 d-gp2-dbpg0-2 d-gp2-dbpg0-3

Resources:
 Resource: postgresql-master-vip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.124.164.250 cidr_netmask=22
  Operations: start interval=0s timeout=20s 
(postgresql-master-vip-start-interval-0s)
  stop interval=0s timeout=20s 
(postgresql-master-vip-stop-interval-0s)
  monitor interval=10s (postgresql-master-vip-monitor-interval-10s)
 Master: postgresql-ha
  Meta Attrs: notify=true 
  Resource: postgresql-10-main (class=ocf provider=heartbeat type=pgsqlms)
   Attributes: bindir=/usr/lib/postgresql/10/bin 
pgdata=/var/lib/postgresql/10/main pghost=/var/run/postgresql pgport=5432 
recovery_template=/etc/postgresql/10/main/recovery.conf start_opts="-c 
config_file=/etc/postgresql/10/main/postgresql.conf"
   Operations: start interval=0s timeout=60s 
(postgresql-10-main-start-interval-0s)
   stop interval=0s timeout=60s 
(postgresql-10-main-stop-interval-0s)
   promote interval=0s timeout=30s 
(postgresql-10-main-promote-interval-0s)
   demote interval=0s timeout=120s 
(postgresql-10-main-demote-interval-0s)
   monitor interval=15s role=Master timeout=10s 
(postgresql-10-main-monitor-interval-15s)
   monitor interval=16s role=Slave timeout=10s 
(postgresql-10-main-monitor-interval-16s)
   notify interval=0s timeout=60s 
(postgresql-10-main-notify-interval-0s)

Stonith Devices:
 Resource: vfencing (class=stonith type=external/vcenter)
  Attributes: VI_SERVER=10.124.137.100 
VI_CREDSTORE=/etc/pacemaker/vicredentials.xml 
HOSTLIST=d-gp2-dbpg0-1;d-gp2-dbpg0-2;d-gp2-dbpg0-3 RESETPOWERON=1
  Operations: monitor interval=60s (vfencing-monitor-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  promote postgresql-ha then start postgresql-master-vip (kind:Mandatory) 
(non-symmetrical) (id:order-postgresql-ha-postgresql-master-vip-Mandatory)
  demote 

Re: [ClusterLabs] Live migrate a VM in a cluster group

2018-05-29 Thread Jason Gauthier
On Tue, May 29, 2018 at 11:46 AM, Ken Gaillot  wrote:
> On Tue, 2018-05-29 at 10:14 -0500, Ken Gaillot wrote:
>> On Sun, 2018-05-27 at 22:50 -0400, Jason Gauthier wrote:
>> > Greetings,
>> >
>> >  I've set up a cluster intended for VMs.  I created a VM, and have
>> > been pretty pleased with migrating it back and forth between the
>> > two
>> > nodes.  Now, I am also using the VNC console, which requires
>> > listening
>> > on port 59xx.  Of course, once the machine is migrated the IP to
>> > access the VNC console is different.
>> > So, I thought I would be clever and create a cluster IP address.  I
>> > did, and as a stand alone resource it migrates between nodes
>> > perfectly.  I then put these two primitives into a group.
>> >
>> > When I try to migrate the group nothing happens.
>> > There aren't any cluster errors, and the logs do not give me any
>> > kind
>> > of indication of error either.
>> >
>> > I'm wondering if this is even something I should expect to work. I
>> > would certainly like it to.
>> > Here's the relevant config:
>> >
>> > primitive p_Calibre VirtualDomain \
>> > params config="/vms/Calibre/Calibre.xml"
>> > hypervisor="qemu:///system" migration_transport=ssh \
>> > meta allow-migrate=true \
>> > op start timeout=120s interval=0 \
>> > op stop timeout=120s interval=0 \
>> > op monitor timeout=30 interval=10 depth=0 \
>> > utilization cpu=2 hv_memory=8192
>> > primitive p_CalibreVNC IPaddr2 \
>> > params ip=192.168.10.10 cidr_netmask=24 nic=br10 \
>> > op monitor interval=10s
>> >
>> > group g_Calibre p_CalibreVNC p_Calibre \
>> > meta target-role=Started
>> >
>> > location cli-prefer-g_Calibre g_Calibre role=Started inf: alpha
>> > location cli-prefer-p_Calibre p_Calibre role=Started inf: beta
>>
>> The group prefers alpha, but its p_Calibre member prefers beta.
>> You're
>> confusing the cluster :)
>>
>> Any constraints starting with "cli-" were added by command-line tools
>> doing move/ban/etc. They stay in effect until they are manually
>> removed
>> (the same tool will generally have a "clear" option).
>
> Oh, and IPaddr2 can't live-migrate, so putting it in a group (or
> colocation+ordering constraint) with p_Calibre will make the entire
> group unable to live-migrate.
>
> What exact relationships do you really need?
>
> Does the IP need to be up before the VM starts in order for VNC to
> work? If so, can it even work with a live migration (the IP can't be up
> on both nodes at once)?
>
> If the IP doesn't need to be up first, you could simply colocate the IP
> with the VM, without any ordering constraints. Whenever the VM moves,
> the IP will follow; the VM would live-migrate, and the IP would
> stop+start.
>
> If the IP does need to be up first, I'm thinking you could create a
> group with p_CalibreVNC and an ocf:pacemaker:attribute resource (which
> sets a node attribute when it is running), and then use a rule-based
> constraint to colocate+order p_Calibre relative to the node attribute.
> I'd make the colocation non-infinite and the order optional, so the VM
> can stay up even if the IP fails or begins to move.
>
> You would trigger a migration by moving the group. It would proceed
> like this:
> - The IP, attribute resource, node attribute, and VM start out on node
> 1
> - The attribute resource stops on node 1, removing the node attribute
> from node 1 (this is where making the constraints optional comes in
> handy)
> - The IP resource stops on node 1
> - The IP resource starts on node 2
> - The attribute resource starts on node 2, adding the node attribute to
> node 2
> - The VM live-migrates to node 2
>
> That does leave a window where the IP and VM are on different nodes,
> but that sounds acceptable in your setup in order to preserve live
> migration.
>

You and Andrei were correct.  The contraints were causing an issue.  I
am able to move them as a unit, but as you said..
The results are unpleasant.  To achieve what I really want, I will
need to execute the plan you mentioned above.

I am not sure how to do that, but your steps are perfect.  I do not
care about the IP going up and down since it's just console
management.
But I do want the machine to stay up and be live migrated. So, I will
look into the process you mentioned.
It does seem that the IP needs to be up first, or the listener can't
bind, and qemu fails.

Thanks for the advice. I have some more learning to do.



>> > location cli-prefer-p_CalibreVNC p_CalibreVNC role=Started inf:
>> > beta
>> >
>> > Any help is appreciated.
> --
> Ken Gaillot 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] Pacemaker PostgreSQL cluster

2018-05-29 Thread Salvatore D'angelo
Hi,

Regarding last question about pacemaker dependencies for Ubuntu I found this 
for 1.1.18:
https://launchpad.net/ubuntu/+source/pacemaker/1.1.18-0ubuntu2/+build/14818856 


It’s not clear to me why pacemaker 1.1.18 is available on launchpad.net and not 
on the official Ubuntu Search Packages website.
However, can I assume 1.1.19 and 2.2.0 have the same dependencies list 
(considering they have only removed deprecated function and applied some bug 
fixes)?
Thanks again for answers


> On 29 May 2018, at 17:41, Jehan-Guillaume de Rorthais  wrote:
> 
> On Tue, 29 May 2018 14:23:31 +0200
> Salvatore D'angelo  wrote:
> ...
>> 2. I read some documentation about upgrade and since we want 0 ms downtime I
>> think the Rolling Upgrade (node by node) is the better approach.
> 
> The 0ms upgrade is almost impossible. At some point, you will have to move the
> master somewhere else.
> 
> Unless you have some session management that are able to wait for the
> current sessions to finish, then hold the incoming sessions while you are
> moving the master, you will have downtime and/or xact rollback.
> 
> Good luck anyway :)
> 
> -- 
> Jehan-Guillaume de Rorthais
> Dalibo

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-29 Thread Casey & Gina
> On May 27, 2018, at 2:28 PM, Ken Gaillot  wrote:
> 
> Pacemaker isn't fencing because the start failed, at least not
> directly:
> 
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine: info:
>> determine_op_status: Operation monitor found resource postgresql-10-
>> main:2 active on d-gp2-dbpg0-2
> 
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine:   notice:
>> LogActions:  Demote  postgresql-10-main:1(Master -> Slave d-gp2-
>> dbpg0-1)
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine:   notice:
>> LogActions:  Recover postgresql-10-main:1(Master d-gp2-dbpg0-1)
> 
> From the above, we can see that the initial probe after the node
> rejoined found that the resource was already running in master mode
> there (at least, that's what the agent thinks). So, the cluster wants
> to demote it, stop it, and start it again as a slave.

Well, it was running in master node prior to being power-cycled.  However my 
understanding was that PAF always tries to initially start PostgreSQL in 
standby mode.  There would be no reason for it to promote node 1 to master 
since node 2 has already taken over the master role, and there is no location 
constraint set that would cause it to try to move this role back to node 1 
after it rejoins the cluster.

Jehan-Guillaume wrote:  "on resource start, PAF will create the 
"PGDATA/recovery.conf" file based on your template anyway. No need to create it
yourself.".  The recovery.conf file being present upon PostgreSQL startup is 
what makes it start in standby mode.

Since no new log output is ever written to the PostgreSQL log file, it does not 
seem that it's ever actually doing anything to try to start the resource.  The 
recovery.conf doesn't get copied in, and no new data appears in the PostgreSQL 
log.  As far as I can tell, nothing ever happens on the rejoined node at all, 
before it gets fenced.

How can I tell what the resource agent is trying to do behind the scenes?  Is 
there a way that I can see what command(s) it is trying to run, so that I may 
try them manually?

> But the demote failed

I reckon that it probably couldn't demote what was never started.

> But the stop fails too

I guess that it can't stop what is already stopped?  Although, I'm surprised 
that it would error in this case, instead of just realizing that it was already 
stopped...

> 
>> May 22 23:57:24 [2196] d-gp2-dbpg0-2pengine:  warning:
>> pe_fence_node:   Node d-gp2-dbpg0-1 will be fenced because of
>> resource failure(s)
> 
> which is why the cluster then wants to fence the node. (If a resource
> won't stop, the only way to recover it is to kill the entire node.)

But the resource is *never started*!?  There is never any postgres process 
running, and nothing appears in the PostgreSQL log file.  I'm really confused 
as to why pacemaker thinks it needs to fence something that's never running at 
all...  I guess what I need is to somehow figure out what the resource agent is 
doing that makes it think the resource is already active; is there a way to do 
this?

It would be really helpful, if somewhere within this verbose logging, were an 
indication of what commands were actually being run to monitor, start, stop, 
etc. as it seems like a black box.

I'm wondering if some stale PID file is getting left around after the hard 
reboot, and that is what the resource agent is checking instead of the actual 
running status, but I would hope that the resource agent would be smarter than 
that.

Thanks,
-- 
Casey
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Live migrate a VM in a cluster group

2018-05-29 Thread Andrei Borzenkov
28.05.2018 14:44, Jason Gauthier пишет:
> On Mon, May 28, 2018 at 12:03 AM, Andrei Borzenkov  
> wrote:
>> 28.05.2018 05:50, Jason Gauthier пишет:
>>> Greetings,
>>>
>>>  I've set up a cluster intended for VMs.  I created a VM, and have
>>> been pretty pleased with migrating it back and forth between the two
>>> nodes.  Now, I am also using the VNC console, which requires listening
>>> on port 59xx.  Of course, once the machine is migrated the IP to
>>> access the VNC console is different.
>>> So, I thought I would be clever and create a cluster IP address.  I
>>> did, and as a stand alone resource it migrates between nodes
>>> perfectly.  I then put these two primitives into a group.
>>>
>>> When I try to migrate the group nothing happens.
>>> There aren't any cluster errors, and the logs do not give me any kind
>>> of indication of error either.
>>>
>>
>> "migrate" is ambiguous here - it may mean moving resource or
>> live-migrating VM. Show exact command(s) you use and logs related to
>> theses commands.
>>
> Sure, okay.  The command I am using is this:
> # crm resource migrate g_Calibre alpha
> 

OK, that is "resource move" which simply creates constraint and relies
on policy engine to compute new resource placement. It does *not* remove
any constraint created previously by "crm resource move" with another
node as argument.
...
> May 28 07:42:32 [5086] alphapengine: info:
> determine_op_status: Operation monitor found resource p_CalibreVNC
> active on beta
> May 28 07:42:32 [5086] alphapengine: info:
> determine_op_status: Operation monitor found resource p_Calibre active
> on beta

Currently both resources are active on beta.

...
> May 28 07:42:32 [5086] alphapengine: info: LogActions:  Leave
>  p_CalibreVNC(Started beta)
> May 28 07:42:32 [5086] alphapengine: info: LogActions:  Leave
>  p_Calibre   (Started beta)

And policy engine decides to leave it there.
...

>>>
>>> group g_Calibre p_CalibreVNC p_Calibre \
>>> meta target-role=Started
>>>
>>> location cli-prefer-g_Calibre g_Calibre role=Started inf: alpha
>>> location cli-prefer-p_Calibre p_Calibre role=Started inf: beta
>>> location cli-prefer-p_CalibreVNC p_CalibreVNC role=Started inf: beta
>>>

You have mutually contradictory constraints. Apparently the one for
"beta" is result of previous "crm resource move p_Calibre(VNC) beta"
invocation. At this point pacemaker is actually free to select any node.
I suppose there are some implementation defined behavior but
fundamentally it is garbage in, garbage out. It decided to leave
resources where they are.

After any "crm resource move|migrate|ban" you *MUST* remove constraints
using e.g. "crm resource clear". You can set constraint lifetime so it
is done automatically.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Live migrate a VM in a cluster group

2018-05-29 Thread Ken Gaillot
On Tue, 2018-05-29 at 10:14 -0500, Ken Gaillot wrote:
> On Sun, 2018-05-27 at 22:50 -0400, Jason Gauthier wrote:
> > Greetings,
> > 
> >  I've set up a cluster intended for VMs.  I created a VM, and have
> > been pretty pleased with migrating it back and forth between the
> > two
> > nodes.  Now, I am also using the VNC console, which requires
> > listening
> > on port 59xx.  Of course, once the machine is migrated the IP to
> > access the VNC console is different.
> > So, I thought I would be clever and create a cluster IP address.  I
> > did, and as a stand alone resource it migrates between nodes
> > perfectly.  I then put these two primitives into a group.
> > 
> > When I try to migrate the group nothing happens.
> > There aren't any cluster errors, and the logs do not give me any
> > kind
> > of indication of error either.
> > 
> > I'm wondering if this is even something I should expect to work. I
> > would certainly like it to.
> > Here's the relevant config:
> > 
> > primitive p_Calibre VirtualDomain \
> > params config="/vms/Calibre/Calibre.xml"
> > hypervisor="qemu:///system" migration_transport=ssh \
> > meta allow-migrate=true \
> > op start timeout=120s interval=0 \
> > op stop timeout=120s interval=0 \
> > op monitor timeout=30 interval=10 depth=0 \
> > utilization cpu=2 hv_memory=8192
> > primitive p_CalibreVNC IPaddr2 \
> > params ip=192.168.10.10 cidr_netmask=24 nic=br10 \
> > op monitor interval=10s
> > 
> > group g_Calibre p_CalibreVNC p_Calibre \
> > meta target-role=Started
> > 
> > location cli-prefer-g_Calibre g_Calibre role=Started inf: alpha
> > location cli-prefer-p_Calibre p_Calibre role=Started inf: beta
> 
> The group prefers alpha, but its p_Calibre member prefers beta.
> You're
> confusing the cluster :)
> 
> Any constraints starting with "cli-" were added by command-line tools
> doing move/ban/etc. They stay in effect until they are manually
> removed
> (the same tool will generally have a "clear" option).

Oh, and IPaddr2 can't live-migrate, so putting it in a group (or
colocation+ordering constraint) with p_Calibre will make the entire
group unable to live-migrate.

What exact relationships do you really need?

Does the IP need to be up before the VM starts in order for VNC to
work? If so, can it even work with a live migration (the IP can't be up
on both nodes at once)?

If the IP doesn't need to be up first, you could simply colocate the IP
with the VM, without any ordering constraints. Whenever the VM moves,
the IP will follow; the VM would live-migrate, and the IP would
stop+start.

If the IP does need to be up first, I'm thinking you could create a
group with p_CalibreVNC and an ocf:pacemaker:attribute resource (which
sets a node attribute when it is running), and then use a rule-based
constraint to colocate+order p_Calibre relative to the node attribute.
I'd make the colocation non-infinite and the order optional, so the VM
can stay up even if the IP fails or begins to move.

You would trigger a migration by moving the group. It would proceed
like this:
- The IP, attribute resource, node attribute, and VM start out on node
1
- The attribute resource stops on node 1, removing the node attribute
from node 1 (this is where making the constraints optional comes in
handy)
- The IP resource stops on node 1
- The IP resource starts on node 2
- The attribute resource starts on node 2, adding the node attribute to
node 2
- The VM live-migrates to node 2

That does leave a window where the IP and VM are on different nodes,
but that sounds acceptable in your setup in order to preserve live
migration.

> > location cli-prefer-p_CalibreVNC p_CalibreVNC role=Started inf:
> > beta
> > 
> > Any help is appreciated.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker PostgreSQL cluster

2018-05-29 Thread Jehan-Guillaume de Rorthais
On Tue, 29 May 2018 14:23:31 +0200
Salvatore D'angelo  wrote:
...
> 2. I read some documentation about upgrade and since we want 0 ms downtime I
> think the Rolling Upgrade (node by node) is the better approach.

The 0ms upgrade is almost impossible. At some point, you will have to move the
master somewhere else.

Unless you have some session management that are able to wait for the
current sessions to finish, then hold the incoming sessions while you are
moving the master, you will have downtime and/or xact rollback.

Good luck anyway :)

-- 
Jehan-Guillaume de Rorthais
Dalibo
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker PostgreSQL cluster

2018-05-29 Thread Ken Gaillot
On Tue, 2018-05-29 at 14:23 +0200, Salvatore D'angelo wrote:
> Hi All,
> 
> I am new to this list. I am working on a project that uses a cluster
> composed by 3 nodes (with Ubuntu 14.04 trusty) on which we run
> PostgreSQL managed as Master/slaves.
> We uses Pacemaker/Corosync to manage this cluster. In addition, we
> have a two node GlusterFS where we store backups and Wal files.
> Currently the versions of our components are quite old, we have:
> Pacemaker 1.1.14
> Corosync 2.3.5
> 
> and we want to move to a new version of Pacemaker but I have some
> doubts.
> 
> 1. I noticed there is 2.0.0 candidate release so it could be
> convenient for us move to this release. When will be published the
> final release? Is it convenient move to 2.0.0 or 1.1.18?

2.0.0 will hopefully be out in the next couple of weeks.

1.1.19 will be released shortly after that, containing bug fixes from
2.0.0 backported to the 1.1 line. Since there were some regressions in
1.1.18, I'd use 1.1.17 or wait for 1.1.19, if staying with the 1.1
line.

The main goal of 2.0 is to remove deprecated functionality, so it
should not make a big difference in your case which one you choose.

> 2. I read some documentation about upgrade and since we want 0 ms
> downtime I think the Rolling Upgrade (node by node) is the better
> approach. We migrate a node and in the meantime the other two nodes
> are still active. The problem is that I do not know if I can have a
> mix of 1.1.14 and 1.1.18 (or 2.0.0) nodes. The documentation does not
> clarify it or at least it was not clear to me. Is this possible?
> http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemak
> er_Explained/ap-upgrade.html
> https://wiki.clusterlabs.org/wiki/Upgrade

Rolling upgrades are always supported within the same major number line
(i.e. 1.anything to 1.anything). With the major number change, rolling
upgrades will not always be supported. In the case of 2.0.0, we are
supporting rolling upgrades from 1.1.11 or later on top of corosync 2
or later. You should be fine whichever you choose.

You do want to keep a rolling upgrade window as short as practical.
Besides avoiding potential bugs in an inherently difficult to test
setup (i.e. we can't test all possible combinations of rolling
upgrades), once an older node in a mixed-version cluster is stopped, it
cannot rejoin the cluster until it is upgraded.

> 3. I need to upgrade pacemaker/corosync on Ubuntu 14.04. I noticed
> for 1.1.18 there are Ubuntu packages available. What about 2.0.0? Is
> it possible create Ubuntu packages in some way?

Debian will eventually pick up 2.0.0 in one of its releases, and then
Ubuntu will take it from there.

It's not too hard to build it from source yourself, but follow a
Debian-specific guide because there are differences from the vanilla
upstream release.

Using Ubuntu's 1.1.18 is probably the easiest and best way for you to
go -- I'm guessing the regression fixes were already backported into
those packages.

> 4. Where I can find the list of (ubuntu) dependencies required to
> pacemaker/corosync for 1.1.18 and 2.0.0?
> 
> Thanks in advance for your help.
> 
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Live migrate a VM in a cluster group

2018-05-29 Thread Ken Gaillot
On Sun, 2018-05-27 at 22:50 -0400, Jason Gauthier wrote:
> Greetings,
> 
>  I've set up a cluster intended for VMs.  I created a VM, and have
> been pretty pleased with migrating it back and forth between the two
> nodes.  Now, I am also using the VNC console, which requires
> listening
> on port 59xx.  Of course, once the machine is migrated the IP to
> access the VNC console is different.
> So, I thought I would be clever and create a cluster IP address.  I
> did, and as a stand alone resource it migrates between nodes
> perfectly.  I then put these two primitives into a group.
> 
> When I try to migrate the group nothing happens.
> There aren't any cluster errors, and the logs do not give me any kind
> of indication of error either.
> 
> I'm wondering if this is even something I should expect to work. I
> would certainly like it to.
> Here's the relevant config:
> 
> primitive p_Calibre VirtualDomain \
> params config="/vms/Calibre/Calibre.xml"
> hypervisor="qemu:///system" migration_transport=ssh \
> meta allow-migrate=true \
> op start timeout=120s interval=0 \
> op stop timeout=120s interval=0 \
> op monitor timeout=30 interval=10 depth=0 \
> utilization cpu=2 hv_memory=8192
> primitive p_CalibreVNC IPaddr2 \
> params ip=192.168.10.10 cidr_netmask=24 nic=br10 \
> op monitor interval=10s
> 
> group g_Calibre p_CalibreVNC p_Calibre \
> meta target-role=Started
> 
> location cli-prefer-g_Calibre g_Calibre role=Started inf: alpha
> location cli-prefer-p_Calibre p_Calibre role=Started inf: beta

The group prefers alpha, but its p_Calibre member prefers beta. You're
confusing the cluster :)

Any constraints starting with "cli-" were added by command-line tools
doing move/ban/etc. They stay in effect until they are manually removed
(the same tool will generally have a "clear" option).

> location cli-prefer-p_CalibreVNC p_CalibreVNC role=Started inf: beta
> 
> Any help is appreciated.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker PostgreSQL cluster

2018-05-29 Thread Salvatore D'angelo
Hi All,

I am new to this list. I am working on a project that uses a cluster composed 
by 3 nodes (with Ubuntu 14.04 trusty) on which we run PostgreSQL managed as 
Master/slaves.
We uses Pacemaker/Corosync to manage this cluster. In addition, we have a two 
node GlusterFS where we store backups and Wal files.
Currently the versions of our components are quite old, we have:
Pacemaker 1.1.14
Corosync 2.3.5

and we want to move to a new version of Pacemaker but I have some doubts.

1. I noticed there is 2.0.0 candidate release so it could be convenient for us 
move to this release. When will be published the final release? Is it 
convenient move to 2.0.0 or 1.1.18?
2. I read some documentation about upgrade and since we want 0 ms downtime I 
think the Rolling Upgrade (node by node) is the better approach. We migrate a 
node and in the meantime the other two nodes are still active. The problem is 
that I do not know if I can have a mix of 1.1.14 and 1.1.18 (or 2.0.0) nodes. 
The documentation does not clarify it or at least it was not clear to me. Is 
this possible?
http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ap-upgrade.html
 

https://wiki.clusterlabs.org/wiki/Upgrade 

3. I need to upgrade pacemaker/corosync on Ubuntu 14.04. I noticed for 1.1.18 
there are Ubuntu packages available. What about 2.0.0? Is it possible create 
Ubuntu packages in some way?
4. Where I can find the list of (ubuntu) dependencies required to 
pacemaker/corosync for 1.1.18 and 2.0.0?

Thanks in advance for your help.___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org