To clarify, you are saying that this issue occurs any time you run
`systemctl daemon-reexec`, right? It is not isolated to that command
being run during a package upgrade?

I am not familiar with OpenVZ, but I am confused by the kernel
situation. Are you saying that the containers which are running Ubuntu
(and where you are seeing this problem) are running different kernels
than the host?

Like I said, I am not familiar with OpenVZ, but from my understanding of
other container environments like LXD, I would expect the containers to
share the host's kerenl (and therefore report the same uname -a output).

** Also affects: systemd (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: systemd (Ubuntu Jammy)
       Status: New => Incomplete

** Changed in: systemd (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2013543

Title:
  systemctl daemon-reexec forgets running services and starts everything
  new

Status in systemd package in Ubuntu:
  Incomplete
Status in systemd source package in Jammy:
  Incomplete

Bug description:
  # Our problem #

  During a regular update of our container environment, `systemd` (and
  the related packages libpam-systemd, libsystemd0, libudev1, systemd-
  sysv and udev) were updated from `249.11-0ubuntu3.6` to
  `249.11-0ubuntu3.7`. We're talking only about Ubuntu 22.04. Our Ubuntu
  20.04 is working fine with `systemctl daemon-reexec`.

  In my opinion, the update was not the problem because we've tried
  downgrading and tried these versions: (current) `249.11-0ubuntu3.7`,
  `249.11-0ubuntu3.6`, `249.11-0ubuntu3.4` and `249.11-0ubuntu3.3`. The
  symptoms were the same.

  # Symptoms #

  The `/var/lib/dpkg/info/systemd.postinst` executes a `systemctl
  daemon-reexec` and that ended in a disaster. It seems that `systemd`
  is forgetting all it started children and tries to start nearly every
  configured service again. Naturally, the old services are still
  running, and the ports can't be opened twice and `systemd` won't give
  up. Here are some(!) of the logfiles:

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Create Volatile Files and 
Directories...
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 130 (systemd-udevd) in control group while starting unit. 
Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 31475 (systemd-udevd) in control group while starting unit. 
Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 31476 (systemd-udevd) in control group while starting unit. 
Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.

  And...

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target System 
Initialization.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt download 
activities.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt upgrade and clean 
activities.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily dpkg database backup 
timer.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Periodic ext4 Online 
Metadata Check for All Filesystems.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Condition check resulted in Discard 
unused blocks once a week being skipped.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily rotation of log files.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily man-db regeneration.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Message of the Day.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Clean PHP session files 
every 30 mins.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Update the plocate database 
daily.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily Cleanup of Temporary 
Directories.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Basic System.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: System is tainted: cgroupsv1
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Timer Units.

  And...

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over 
process 206 (atd) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Deferred execution 
scheduler...
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: cron.service: Found left-over 
process 164 (cron) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Regular background program 
processing daemon.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: dbus.service: Found left-over 
process 177 (dbus-daemon) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started D-Bus System Message Bus.

  And...

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: rsyslog.service: Found left-over 
process 204 (rsyslogd) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Failed with result 
'exit-code'.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Unit process 206 (atd) 
remains running after unit stopped.

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 382 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 392 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 397 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 3052 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting The Apache HTTP Server...
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Stopped Deferred execution 
scheduler.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over 
process 206 (atd) in control group while starting unit. Ignoring.

  And...

  Mar 31 12:51:40 FQDN_REDACTED sshd[31772]: error: Bind to port 22 on
  0.0.0.0 failed: Address already in use.

  And...

  Mar 31 12:52:06 FQDN_REDACTED systemd[1]: Started The Salt Minion.
  Mar 31 12:52:06 FQDN_REDACTED salt-minion[32339]: The Salt Minion is shutdown.
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Main process 
exited, code=exited, status=1/FAILURE
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Failed with 
result 'exit-code'.
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 
2808 (/opt/saltstack/) remains running after unit stopped.
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 
2848 (/opt/saltstack/) remains running after unit stopped.

  Other internal `systemd` process were started again:

  root           1  0.0  0.1 101204 12444 ?        Ss   10:19   0:03 
/lib/systemd/systemd -z        --system --deserialize 16
  root          75  0.0  0.1  31440 13484 ?        Ss   10:19   0:00 
/lib/systemd/systemd-journald
  systemd+     159  0.0  0.0  16124  8004 ?        Ss   10:19   0:00 
/lib/systemd/systemd-networkd
  message+     177  0.0  0.0   8252  4440 ?        Ss   10:19   0:00 
@dbus-daemon --system --address=systemd: --nofork --nopidfile 
--systemd-activation --syslog-only
  root         205  0.0  0.0  14908  6464 ?        Ss   10:19   0:00 
/lib/systemd/systemd-logind
  systemd+     223  0.0  0.1  25268 12592 ?        Ss   10:19   0:00 
/lib/systemd/systemd-resolved
  root       31424  0.0  0.1  31424 13636 ?        Ss   12:51   0:00 
/lib/systemd/systemd-journald
  systemd+   31636  0.0  0.0  16124  6588 ?        Ss   12:51   0:00 
/lib/systemd/systemd-networkd
  message+   31639  0.0  0.0   8124  3804 ?        Ss   12:51   0:00 
@dbus-daemon --system --address=systemd: --nofork --nopidfile 
--systemd-activation --syslog-only
  root       31682  0.0  0.0  14908  6480 ?        Ss   12:51   0:00 
/lib/systemd/systemd-logind
  systemd+   31686  0.0  0.1  25268 12580 ?        Ss   12:51   0:00 
/lib/systemd/systemd-resolved
  root       32087  0.0  0.0  21436  5252 ?        Ss   12:51   0:00 
/lib/systemd/systemd-udevd

  You can either kill all the old processes and restart them, and then
  everything is fine. Or you can reboot the container. Besides that
  `systemctl daemon-reexec` the `systemd` version is running fine.
  `systemctl daemon-reload` is working like a charme.

  # Normal case #

  In the normal case a `systemctl daemon-reexec` just prints only a few
  lines:

  Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Reexecuting.
  Mar 31 14:21:58 FQDN_REDACTED systemd[1]: systemd 249.11-0ubuntu3.7 running 
in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT 
+GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD 
+LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ 
+ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
  Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Detected architecture x86-64.

  # Testcase #

  Doing a `systemctl daemon-restart` and `ssh localhost` shows the
  problem. `systemctl` removes the directory `/run/sshd` during the
  reexec and `ssh` will refuse further connects because the directory is
  missing.

  $ systemctl daemon-restart
  $ ssh root@localhost
  kex_exchange_identification: read: Connection reset by peer
  Connection reset by 127.0.0.1 port 22
  $

  Killing the old instance of SSH and restarting it will work.

  # Some details to the hardware #

  Our metal runs OpenVZ/Virtuozzo with this kernel (without any
  problems):

  > Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20
  21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux

  The container with the `systemctl daemon-reexec` problem reports the
  following kernel:

  Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64
  x86_64 x86_64 GNU/Linux

  # Upshot #

  * Can somebody help me with this issue?
  * Why is `systemctl` losing its internal state about the running 
processes/services?
  * Why is `systemctl` restarting everything?

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2013543/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to