Hi, This mailing list is the only place where I expect to have some helpful feedback. But feel free to suggest other places. I'd like to investigate situation I have now, find out what went wrong and prevent it from happening again if possible. Your help is appreciated.
Like I said, a server reports that it's going down, when I ssh to it as root. As a non-root user, it says that and closes the connection. In the journal I see a lot of this: Nov 28 16:22:01 st2 systemd-journal[353]: Journal stopped Nov 28 16:22:01 st2 systemd-journal[494]: Runtime journal is using 624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G available → current limit 642.1M). Nov 28 16:22:01 st2 systemd-journal[494]: Runtime journal is using 624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G available → current limit 642.1M). Nov 28 16:22:01 st2 systemd-journal[494]: Journal started Nov 28 16:22:01 st2 systemd[1]: systemd-journald.service watchdog timeout (limit 1min)! Nov 28 16:22:01 st2 systemd-journald[353]: Received SIGTERM from PID 1 (systemd). Nov 28 16:22:01 st2 systemd[1]: Unit systemd-journald.service entered failed state. Nov 28 16:22:01 st2 systemd[1]: systemd-journald.service has no holdoff time, scheduling restart. Nov 28 16:22:01 st2 systemd[1]: Stopping Journal Service... Nov 28 16:22:01 st2 systemd[1]: Starting Journal Service... Nov 28 16:22:01 st2 systemd[1]: Started Journal Service. Nov 28 16:22:01 st2 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage... Nov 28 16:22:01 st2 systemd[1]: systemd-journal-flush.service: main process exited, code=exited, status=1/FAILURE Nov 28 16:22:01 st2 systemd[1]: Failed to start Trigger Flushing of Journal to Persistent Storage. Nov 28 16:22:01 st2 systemd[1]: Unit systemd-journal-flush.service entered failed state. Nov 28 16:22:52 st2 systemd[1]: systemd-timesyncd.service start operation timed out. Terminating. Nov 28 16:22:52 st2 systemd[1]: Failed to start Network Time Synchronization. Nov 28 16:22:52 st2 systemd[1]: Unit systemd-timesyncd.service entered failed state. Nov 28 16:22:53 st2 systemd[1]: systemd-timesyncd.service has no holdoff time, scheduling restart. Nov 28 16:22:53 st2 systemd[1]: Stopping Network Time Synchronization... Nov 28 16:22:53 st2 systemd[1]: Starting Network Time Synchronization... Nov 28 16:23:02 st2 systemd-journal[494]: Journal stopped Nov 28 16:23:02 st2 systemd-journal[632]: Runtime journal is using 624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G available → current limit 642.1M). Nov 28 16:23:02 st2 systemd-journal[632]: Runtime journal is using 624.0M (max allowed 642.1M, trying to leave 963.1M free of 5.6G available → current limit 642.1M). Nov 28 16:23:02 st2 systemd-journal[632]: Journal started Nov 28 16:23:02 st2 systemd[1]: systemd-journald.service watchdog timeout (limit 1min)! Nov 28 16:23:02 st2 systemd-journald[494]: Received SIGTERM from PID 1 (systemd). Nov 28 16:23:02 st2 systemd[1]: Unit systemd-journald.service entered failed state. Nov 28 16:23:02 st2 systemd[1]: systemd-journald.service has no holdoff time, scheduling restart. Nov 28 16:23:02 st2 systemd[1]: Stopping Journal Service... Nov 28 16:23:02 st2 systemd[1]: Starting Journal Service... Nov 28 16:23:02 st2 systemd[1]: Started Journal Service. Nov 28 16:23:02 st2 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage... Nov 28 16:23:02 st2 systemd[1]: systemd-journal-flush.service: main process exited, code=exited, status=1/FAILURE Nov 28 16:23:02 st2 systemd[1]: Failed to start Trigger Flushing of Journal to Persistent Storage. Nov 28 16:23:02 st2 systemd[1]: Unit systemd-journal-flush.service entered failed state. It repeats itself every minute. systemctl doesn't work: # systemctl Failed to get D-Bus connection: Connection refused I have 16 lxc containers running on the server: # lxc-ls -f | grep RUNNING | wc -l 16 and 16 dbus-daemon's (so supposedly one dbus-daemon is missing): # ps -ef | grep dbus message+ 845 1 0 Feb15 ? 00:09:56 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 1615 579 0 Jun13 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 1673 28602 0 16:26 pts/31 00:00:00 grep dbus systemd+ 3761 3461 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 4635 3436 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 4767 3527 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 5344 3597 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 5714 3664 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 5793 3750 0 Feb15 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 7856 7198 0 Oct18 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 9477 8848 0 Oct18 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 10930 10322 0 Oct18 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 13130 10717 0 Jun27 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 13300 11339 0 Apr03 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 19689 19360 0 Jul28 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation systemd+ 21045 20562 0 Oct19 ? 00:00:01 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation # ps -ef | grep dbus | wc -l 16 My conjecture is that the first dbus-daemon is of the physical host, since it has ppid == 1, and user messagebus. On Nov 21 in the log I can see supposedly restart, starting with: Nov 21 19:55:27 st2 systemd[320]: systemd 215 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR) https://gist.github.com/x-yuri/8dfe9e561327ad445b1713749cd83252 But I don't understand what triggered it. Different tools report different time of last reboot: # last reboot reboot system boot 3.16.0-4-amd64 Tue Nov 21 19:55 - 19:56 (00:01) wtmp begins Thu Nov 2 17:22:02 2017 # who -b system boot 2017-11-21 19:55 # journalctl --list-boots 0 606cc0c448794f2a8573fcdc2ba8d163 Fri 2017-10-13 05:09:18 EEST—Tue 2017-11-28 16:56:18 EET # uptime 16:57:21 up 286 days, 13:22, 1 user, load average: 3.19, 3.32, 3.33 Is there anything I can check? Any suggestions are welcome. P.S., # cat /etc/issue Debian GNU/Linux 8 \n \l Regards, Yuri _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel