Re: [ClusterLabs] Migrating off CentOS

2024-01-15 Thread Ken Gaillot
On Sat, 2024-01-13 at 09:07 -0600, Billy Croan wrote:
> I'm planning to migrate a two-node cluster off CentOS 7 this year.  I
> think I'm taking it to Debian Stable, but open for suggestions if any
> distribution is better supported by pacemaker.


Debian, RHEL, SUSE, Ubuntu, and compatible distros should all have good
support.

Fedora and FreeBSD get regular builds and basic testing but have fewer
users exercising them in production.

FYI, if you want to keep the interfaces you're familiar with, the free
RHEL developer license now allows most personal and small-business
production use: https://access.redhat.com/discussions/5719451

> 
> Have any of you had success doing major upgrades (bullseye to
> bookworm on Debian) of your physical nodes one at a time while each
> node is in standby+maintenance, and rolling the vm from one to the
> other so it doesn't reboot while the hosts are upgraded?  That has
> worked well for me for minor OS updates, but I'm curious about the
> majors.  
> 
> My project this year is even more major, not just upgrading the OS
> but changing distributions.
> 
> I think I have three possible ways I can try this:
> 1) wipe all server disks and start fresh.

A variation, if you can get new hosts, is to set up a test cluster on
new hosts, and once you're comfortable that it will work, stop the old
cluster and turn the new one into production.

> 
> 2) standby and maintenance one node, then reinstall it with a new OS
> and make a New Cluster.  shutdown the vm and copy it, offline, to the
> new one-node cluster. and start it up there. Then once that's
> working, wipe and reinstall the other node, and add it to the new
> cluster.

This should be fine.

> 
> 3) standby and maintenance one node, then Remove it from the
> cluster.  Then reinstall it with the new distribution's OS.  Then re-
> add it to the Existing Cluster.  Move the vm resource to it and
> verify it's working, then do the same with the other physical node,
> and take it out of standby to finish.
> 

This would be fine as long as the corosync and pacemaker versions are
compatible. However as Michele mentioned, RHEL 7 uses Corosync 2, and
the latest of any distro will use Corosync 3, so that will sink this
option.

> (Obviously any of those methods begin with a full backup to offsite
> and local media. and end with a verification against that backup.)
> 
> #1 would be the longest outage but the "cleanest result"
> #3 would be possibly no outage, but I think the least likely to
> work.  I understand EL uses pcs and debian uses crm for example...

Debian offers both IIRC. But that won't affect the upgrade, they both
use Pacemaker command-line tools under the hood. The only difference is
what commands you run to get the same effect.

> #2 is a compromise that should(tm) have only a few seconds of
> outage.  But could blow up i suppose.  They all could blow up though
> so I'm not sure that should play a factor in the decision.
> 
> I can't be the first person to go down this path.  So what do you all
> think?  how have you done it in the past?

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Migrating off CentOS

2024-01-14 Thread Michele Baldessari
On Sat, Jan 13, 2024 at 09:07:52AM -0600, Billy Croan wrote:
> I'm planning to migrate a two-node cluster off CentOS 7 this year.  I think
> I'm taking it to Debian Stable, but open for suggestions if any
> distribution is better supported by pacemaker.
> 
> Have any of you had success doing major upgrades (bullseye to bookworm on
> Debian) of your physical nodes one at a time while each node is in
> standby+maintenance, and rolling the vm from one to the other so it doesn't
> reboot while the hosts are upgraded?  That has worked well for me for minor
> OS updates, but I'm curious about the majors.

We did it with OpenStack but since our upgrade also implied a major corosync
upgrade we *had* to bring down the whole plane. I am not sure this
changed recently, but a few years ago at least you could not mix and
match nodes with corosync 2.x and 3.x versions. So make sure you put
that aspect into consideration.
> 
> My project this year is even more major, not just upgrading the OS but
> changing distributions.
> 
> I think I have three possible ways I can try this:
> 1) wipe all server disks and start fresh.
> 
> 2) standby and maintenance one node, then reinstall it with a new OS and
> make a New Cluster.  shutdown the vm and copy it, offline, to the new
> one-node cluster. and start it up there. Then once that's working, wipe and
> reinstall the other node, and add it to the new cluster.
> 
> 3) standby and maintenance one node, then Remove it from the cluster.  Then
> reinstall it with the new distribution's OS.  Then re-add it to the
> Existing Cluster.  Move the vm resource to it and verify it's working, then
> do the same with the other physical node, and take it out of standby
> to finish.
> 
> (Obviously any of those methods begin with a full backup to offsite and
> local media. and end with a verification against that backup.)
> 
> #1 would be the longest outage but the "cleanest result"
> #3 would be possibly no outage, but I think the least likely to work.  I
> understand EL uses pcs and debian uses crm for example...
> #2 is a compromise that should(tm) have only a few seconds of outage.  But
> could blow up i suppose.  They all could blow up though so I'm not sure
> that should play a factor in the decision.
> 
> I can't be the first person to go down this path.  So what do you all
> think?  how have you done it in the past?

> -- 
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/


-- 
Michele Baldessari
C2A5 9DA3 9961 4FFB E01B  D0BC DDD4 DCCB 7515 5C6D
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/