James said:

> Setting the pacemaker distro task back to new - it seems very odd that
a system designed to manage a cluster of servers would install on every
node with a non-unique node id, which is a change in behaviour from
older versions of the same software.

In Jammy, I think this is still the case? corosync.conf still ships with
a default nodeid of 1. It's just the name that's no longer supplied.

My understanding of the normal use of corosync in Ubuntu is that the
entire file is generally always replaced after the package is installed.
I believe the hacluster charm does this too.

So am I right in that the issue is that corosync started briefly before
being configured by the charm, and is leaving state behind? In that
case, I think the charm was possibly buggy in two ways:

1) It should use policy-rc.d to avoid corosync daemon startup before
corosync.conf is written out, or maybe write it out in advance. Looks
like Billy's commit fixed this in the charm already. FWIW, I find it
surprising that charms don't generally always override with policy-rc.d
and start services manually.

2) After rewriting corosync configuration, it should clear out corosync
state files entirely before restarting the daemon. This is no longer
necessary due to the other fix.

Both of these apply to anything configuring corosync on Ubuntu, not just
the charm. So it's not clear to me that there's a bug in the corosync
packaging in Ubuntu in Focal at all. We merely ship a default cluster of
size 1 that isn't very useful and needs to be replaced correctly in
order to be useful.

From an SRU perspective, I have further concerns for existing users.

1) It's a conffile change. Since corosync.conf is almost always modified
by users, they're are going to be prompted on upgrade if interactive.
This is a little alarming and not useful. Is there any actual case where
existing users would realistically be using the default configuration
file? Note also that since the issue is with state, changing the
configuration file for existing users wouldn't avoid the issue for them
anyway.

2) Changing the node name on an existing cluster seems dangerous to me.

For the SRU, what problem are we actually solving here then? The charm
is fixed and no longer impacted. Are we trying to avoid having dirty
state when users follow the broken installation flow of starting
corosync with the default configuration and then changing it? In that
case, it seems to me that the proposed fix only happens to work by
chance in this case. The real fix is to make sure that the state is
properly cleaned. I'm not sure how to do that in packaging except to try
to guide the user into somehow not following the broken installation
flow.

Therefore I'm soft-declining this SRU for now, but further discussion
welcome if you disagree and I'll look again.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874719

Title:
  [SRU] Use the hostname as the node name instead of  hardcoded 'node1'

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1874719/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to