Clarification: Yes, we've isolated the behavior down to the call of
`system daemon-reexec` from the `/var/lib/dpkg/info/systemd.postinst`
file.

But every call of `system daemon-reexec` will trigger this "bug", if you
have installed the packages in version `249.11-0ubuntu3.4`.

The host machine and the container are running the same kernel. But you
can fake the kernel version in the container.

So, the host machine reports the real version of the kernel:

> Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20
21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux

And the container reports this:

> Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64
x86_64 x86_64 GNU/Linux

This was implemented because some applications in the container are
checking for a newer kernel version, so you have to fake it.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2013543

Title:
  systemctl daemon-reexec forgets running services and starts everything
  new

Status in systemd package in Ubuntu:
  Incomplete
Status in systemd source package in Jammy:
  Incomplete

Bug description:
  # Our problem #

  During a regular update of our container environment, `systemd` (and
  the related packages libpam-systemd, libsystemd0, libudev1, systemd-
  sysv and udev) were updated from `249.11-0ubuntu3.6` to
  `249.11-0ubuntu3.7`. We're talking only about Ubuntu 22.04. Our Ubuntu
  20.04 is working fine with `systemctl daemon-reexec`.

  In my opinion, the update was not the problem because we've tried
  downgrading and tried these versions: (current) `249.11-0ubuntu3.7`,
  `249.11-0ubuntu3.6`, `249.11-0ubuntu3.4` and `249.11-0ubuntu3.3`. The
  symptoms were the same.

  # Symptoms #

  The `/var/lib/dpkg/info/systemd.postinst` executes a `systemctl
  daemon-reexec` and that ended in a disaster. It seems that `systemd`
  is forgetting all it started children and tries to start nearly every
  configured service again. Naturally, the old services are still
  running, and the ports can't be opened twice and `systemd` won't give
  up. Here are some(!) of the logfiles:

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Create Volatile Files and 
Directories...
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 130 (systemd-udevd) in control group while starting unit. 
Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 31475 (systemd-udevd) in control group while starting unit. 
Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: systemd-udevd.service: Found 
left-over process 31476 (systemd-udevd) in control group while starting unit. 
Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.

  And...

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target System 
Initialization.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt download 
activities.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily apt upgrade and clean 
activities.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily dpkg database backup 
timer.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Periodic ext4 Online 
Metadata Check for All Filesystems.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Condition check resulted in Discard 
unused blocks once a week being skipped.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily rotation of log files.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily man-db regeneration.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Message of the Day.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Clean PHP session files 
every 30 mins.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Update the plocate database 
daily.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Daily Cleanup of Temporary 
Directories.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Basic System.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: System is tainted: cgroupsv1
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Reached target Timer Units.

  And...

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over 
process 206 (atd) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting Deferred execution 
scheduler...
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: cron.service: Found left-over 
process 164 (cron) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started Regular background program 
processing daemon.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: dbus.service: Found left-over 
process 177 (dbus-daemon) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Started D-Bus System Message Bus.

  And...

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: rsyslog.service: Found left-over 
process 204 (rsyslogd) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Failed with result 
'exit-code'.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Unit process 206 (atd) 
remains running after unit stopped.

  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 382 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 392 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 397 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: apache2.service: Found left-over 
process 3052 (apache2) in control group while starting unit. Ignoring.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Starting The Apache HTTP Server...
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: Stopped Deferred execution 
scheduler.
  Mar 31 12:51:39 FQDN_REDACTED systemd[1]: atd.service: Found left-over 
process 206 (atd) in control group while starting unit. Ignoring.

  And...

  Mar 31 12:51:40 FQDN_REDACTED sshd[31772]: error: Bind to port 22 on
  0.0.0.0 failed: Address already in use.

  And...

  Mar 31 12:52:06 FQDN_REDACTED systemd[1]: Started The Salt Minion.
  Mar 31 12:52:06 FQDN_REDACTED salt-minion[32339]: The Salt Minion is shutdown.
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Main process 
exited, code=exited, status=1/FAILURE
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Failed with 
result 'exit-code'.
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 
2808 (/opt/saltstack/) remains running after unit stopped.
  Mar 31 12:52:11 FQDN_REDACTED systemd[1]: salt-minion.service: Unit process 
2848 (/opt/saltstack/) remains running after unit stopped.

  Other internal `systemd` process were started again:

  root           1  0.0  0.1 101204 12444 ?        Ss   10:19   0:03 
/lib/systemd/systemd -z        --system --deserialize 16
  root          75  0.0  0.1  31440 13484 ?        Ss   10:19   0:00 
/lib/systemd/systemd-journald
  systemd+     159  0.0  0.0  16124  8004 ?        Ss   10:19   0:00 
/lib/systemd/systemd-networkd
  message+     177  0.0  0.0   8252  4440 ?        Ss   10:19   0:00 
@dbus-daemon --system --address=systemd: --nofork --nopidfile 
--systemd-activation --syslog-only
  root         205  0.0  0.0  14908  6464 ?        Ss   10:19   0:00 
/lib/systemd/systemd-logind
  systemd+     223  0.0  0.1  25268 12592 ?        Ss   10:19   0:00 
/lib/systemd/systemd-resolved
  root       31424  0.0  0.1  31424 13636 ?        Ss   12:51   0:00 
/lib/systemd/systemd-journald
  systemd+   31636  0.0  0.0  16124  6588 ?        Ss   12:51   0:00 
/lib/systemd/systemd-networkd
  message+   31639  0.0  0.0   8124  3804 ?        Ss   12:51   0:00 
@dbus-daemon --system --address=systemd: --nofork --nopidfile 
--systemd-activation --syslog-only
  root       31682  0.0  0.0  14908  6480 ?        Ss   12:51   0:00 
/lib/systemd/systemd-logind
  systemd+   31686  0.0  0.1  25268 12580 ?        Ss   12:51   0:00 
/lib/systemd/systemd-resolved
  root       32087  0.0  0.0  21436  5252 ?        Ss   12:51   0:00 
/lib/systemd/systemd-udevd

  You can either kill all the old processes and restart them, and then
  everything is fine. Or you can reboot the container. Besides that
  `systemctl daemon-reexec` the `systemd` version is running fine.
  `systemctl daemon-reload` is working like a charme.

  # Normal case #

  In the normal case a `systemctl daemon-reexec` just prints only a few
  lines:

  Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Reexecuting.
  Mar 31 14:21:58 FQDN_REDACTED systemd[1]: systemd 249.11-0ubuntu3.7 running 
in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT 
+GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD 
+LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ 
+ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
  Mar 31 14:21:58 FQDN_REDACTED systemd[1]: Detected architecture x86-64.

  # Testcase #

  Doing a `systemctl daemon-restart` and `ssh localhost` shows the
  problem. `systemctl` removes the directory `/run/sshd` during the
  reexec and `ssh` will refuse further connects because the directory is
  missing.

  $ systemctl daemon-restart
  $ ssh root@localhost
  kex_exchange_identification: read: Connection reset by peer
  Connection reset by 127.0.0.1 port 22
  $

  Killing the old instance of SSH and restarting it will work.

  # Some details to the hardware #

  Our metal runs OpenVZ/Virtuozzo with this kernel (without any
  problems):

  > Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20
  21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux

  The container with the `systemctl daemon-reexec` problem reports the
  following kernel:

  Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64
  x86_64 x86_64 GNU/Linux

  # Upshot #

  * Can somebody help me with this issue?
  * Why is `systemctl` losing its internal state about the running 
processes/services?
  * Why is `systemctl` restarting everything?

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2013543/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to