[Bug 1874719] Re: [SRU] Use the hostname as the node name instead of hardcoded 'node1'

Andreas Hasenack Wed, 25 May 2022 11:26:15 -0700

I did some tests, to see what the cluster behavior is when changing node name 
and/or id. It all boils down to the fact that whatever is being changed, one 
has to be aware that the change is being done to a live real cluster, even 
though it's a simple one-node cluster. That's what you get right after the 
package is installed: a single node cluster:
- node name is either "node1" or `uname -n`, depending on the ubuntu release. 
In the case of focal, topic of this bug, it's "node1" currently
- node id is 1
- ring0_addr is localhost


The charm is doing 3 changes compared to the default config file:
- node name is back to being localhost
- node id is 1000 + application unit id (a juju thing: for example, 
postgresql/0 is unit 0 of application postgresql).
- ring0_addr gets the real IP of the application unit, instead of 127.0.0.1

When changing nodeid *AND* node name, this is essentially creating a new
node in the cluster. The old "node1" name and ID will remain around, but
offline because no live host is responding as that anymore.

If you change just one of the two (node name or id), then the cluster
seems to be able to coalesce them together again, and you get a plain
rename, I haven't tested this exhaustively, but it seems to be the case
by inspecting the current cib.raw xml file in each node, and diffing to
a previous one shows the rename.

Let's test a user story, showing how one could deploy 3 nodes manually
from these focal packages.

After installing pacemaker corosync in all 3 nodes, let's call them f1, f2 and 
f3, we get:
- f1: node id = 1, node name = node1, cluster name = debian
- f2: node id = 1, node name = node1, cluster name = debian
- f3: node id = 1, node name = node1, cluster name = debian

All with the identical config. These are esssentially 3 isolated
clusters called debian, with one node called node1 in each.

The following set of changes will work and not show a phantom "node1" node at 
the end:
- on f1, adjust corosync.conf with this node list:
nodelist {
  node {
    # name: node1
    nodeid: 1
    ring0_addr: f1 # (or f1's ip)
  }
  node {
    nodeid: 2
    ring0_addr: f2 # (or f2's ip)
  }
  node {
    nodeid: 3
    ring0_addr: f3 # (or f3's ip)
  }
}

Then scp this file to the other nodes, and restart corosync and
pacemaker there.

We kept the nodeid on f1 as 1, just got rid of its name. That renames that node 
to `uname -n`, because the id was kept at 1.
The other nodes also got a new name, but their ids changed. And crucially, node 
id 1 still exists in the cluster (it's f1), so it all works out.

If you were to also change the node id range together with the name,
like the charm does, then it's an entirely new node. and you will have
to get rid of node1 with a crm or pcs command, or just " crm_node
--remove node1".

All in all, it's best to either start with the correct configuration (which the 
charm does nowadays), or clear everything beforehand (with pcs cluster destroy, 
perhaps). "pcs cluster destroy" is quite comprehensive, it does:
- rm -f /etc/corosync/{corosync.conf,authkey} /etc/pacemaker/authkey 
/var/lib/pcsd/disaster-recovery
- removes many files from /var/lib/pacemaker (cib, pengine/pe*bz2, hostcache, 
cts, others)
- stops the services

One has to be very careful if changing node names and node ids in a live
cluster, and a live cluster is what you get right after installing the
packages.

I still haven't made up my mind about this focal SRU. I  definitely
prefer to have the node name default to the hostname (uname -n), but
making such a change via an SRU is debatable. We might have to "bite the
bullet" and live with this different behavior in focal only :/

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874719

Title:
  [SRU] Use the hostname as the node name instead of  hardcoded 'node1'

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1874719/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1874719] Re: [SRU] Use the hostname as the node name instead of hardcoded 'node1'

Reply via email to