Analysed the logs for an occurance of this, the problem appears to be
that pacemaker doesn't stop after 1 minute so systemd gives up and just
starts a new instance anyway, noting that all of the existing processes
are left behind.

I am awaiting the extra rotated logs to confirm but from what I can see
basically the new pacemaker fails to start because the old one is still
running, and then the old one eventually exits, leave you with no
instance of pacemaker (which is the state we found it in, pacemaker was
stopped).

06:13:44 systemd[1]: pacemaker.service: State 'stop-sigterm' timed out. 
Skipping SIGKILL.
06:13:44 pacemakerd[427]:   notice: Caught 'Terminated' signal
06:14:44 systemd[1]: pacemaker.service: State 'stop-final-sigterm' timed out. 
Skipping SIGKILL. Entering failed mode.
06:14:44 systemd[1]: pacemaker.service: Failed with result 'timeout'.
06:14:44 systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 445 (cib) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 449 (attrd) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 450 (pengine) 
in control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 451 (crmd) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 427 
(pacemakerd) in control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 447 (stonithd) 
in control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 448 (lrmd) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Failed to reset devices.list: Operation 
not permitted
06:14:45 systemd[1]: Started Pacemaker High Availability Cluster Manager.

Likely the solution here is some combination of tweaking the systemd
config to wait longer, force kill if necessary and possibly reap all
processes if it does force a restart. It's not a native systemd unit
though some of this stuff can be tweaked by comments. I'll look a little
further at that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to