I did some tests, to see what the cluster behavior is when changing node name
and/or id. It all boils down to the fact that whatever is being changed, one
has to be aware that the change is being done to a live real cluster, even
though it's a simple one-node cluster. That's what you get right after the
package is installed: a single node cluster:
- node name is either "node1" or `uname -n`, depending on the ubuntu release.
In the case of focal, topic of this bug, it's "node1" currently
- node id is 1
- ring0_addr is localhost
The charm is doing 3 changes compared to the default config file:
- node name is back to being localhost
- node id is 1000 + application unit id (a juju thing: for example,
postgresql/0 is unit 0 of application postgresql).
- ring0_addr gets the real IP of the application unit, instead of 127.0.0.1
When changing nodeid *AND* node name, this is essentially creating a new
node in the cluster. The old "node1" name and ID will remain around, but
offline because no live host is responding as that anymore.
If you change just one of the two (node name or id), then the cluster
seems to be able to coalesce them together again, and you get a plain
rename, I haven't tested this exhaustively, but it seems to be the case
by inspecting the current cib.raw xml file in each node, and diffing to
a previous one shows the rename.
Let's test a user story, showing how one could deploy 3 nodes manually
from these focal packages.
After installing pacemaker corosync in all 3 nodes, let's call them f1, f2 and
f3, we get:
- f1: node id = 1, node name = node1, cluster name = debian
- f2: node id = 1, node name = node1, cluster name = debian
- f3: node id = 1, node name = node1, cluster name = debian
All with the identical config. These are esssentially 3 isolated
clusters called debian, with one node called node1 in each.
The following set of changes will work and not show a phantom "node1" node at
the end:
- on f1, adjust corosync.conf with this node list:
nodelist {
node {
# name: node1
nodeid: 1
ring0_addr: f1 # (or f1's ip)
}
node {
nodeid: 2
ring0_addr: f2 # (or f2's ip)
}
node {
nodeid: 3
ring0_addr: f3 # (or f3's ip)
}
}
Then scp this file to the other nodes, and restart corosync and
pacemaker there.
We kept the nodeid on f1 as 1, just got rid of its name. That renames that node
to `uname -n`, because the id was kept at 1.
The other nodes also got a new name, but their ids changed. And crucially, node
id 1 still exists in the cluster (it's f1), so it all works out.
If you were to also change the node id range together with the name,
like the charm does, then it's an entirely new node. and you will have
to get rid of node1 with a crm or pcs command, or just " crm_node
--remove node1".
All in all, it's best to either start with the correct configuration (which the
charm does nowadays), or clear everything beforehand (with pcs cluster destroy,
perhaps). "pcs cluster destroy" is quite comprehensive, it does:
- rm -f /etc/corosync/{corosync.conf,authkey} /etc/pacemaker/authkey
/var/lib/pcsd/disaster-recovery
- removes many files from /var/lib/pacemaker (cib, pengine/pe*bz2, hostcache,
cts, others)
- stops the services
One has to be very careful if changing node names and node ids in a live
cluster, and a live cluster is what you get right after installing the
packages.
I still haven't made up my mind about this focal SRU. I definitely
prefer to have the node name default to the hostname (uname -n), but
making such a change via an SRU is debatable. We might have to "bite the
bullet" and live with this different behavior in focal only :/
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874719
Title:
[SRU] Use the hostname as the node name instead of hardcoded 'node1'
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1874719/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs