[systemd-devel] Boot does not complete: Failed to get D-Bus connection: Failed to authenticate in time.
Dear Folks, I am running Fedora 15 on a 4-core x86_64 machine which received a power surge recently, and spontaneously rebooted. Strange behaviour followed, so I touched /forcefsck and /.autorelabel on separate occasions, rebooting each time. I observe the following behaviour regardless of whether I run setenforce 0 or setenforce 1. 1. Graphical boot does not complete; the Nagios server starts, but systemd announced that cups had failed, and gdm did not start up. 2. I cannot log in on the text consoles; I can type in the username, but no prompt appears for the password. 3. I can log in via ssh as either myself or as root. 4. Attempts to use su fail, but can be interrupted. 5. Attempts to use sudo hang, and require that I kill the process 6. Many attempts to use systemctl (for example, by itself to list the state of all the state of all services) result in a message after a time out, complaining about dbus timing out. 7. A cupsd and a dbus process are both present, but apparently not responding. # systemctl kill cups.service Failed to get D-Bus connection: Failed to authenticate in time. # systemctl Failed to get D-Bus connection: Failed to authenticate in time. I have experience with debugging the old /etc/init.d/script approach, but I am not sure how to proceed with this problem: I am also hampered by insufficient knowledge of dbus. This is beyond my previous experience, and would really appreciate some pointers on how to proceed. I am reading Lennart Poettering's systemd for Administrators now to try to understand how this all works. I have posted to us...@lists.fedoraproject.org, but this list seems to be more appropriate. -- Nick Urbanik http://nicku.org ni...@nicku.org GPG: 7FFA CDC7 5A77 0558 DC7A 790A 16DF EC5B BB9D 2C24 ID: BB9D2C24 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd hangs on shutdown
On Wed, 28.09.11 11:09, Daniel Drake (d...@laptop.org) wrote: On Tue, Sep 27, 2011 at 9:50 PM, Lennart Poettering lenn...@poettering.net wrote: Boot with systemd.log_level=debug and systemd.log_target=kmsg on the kernel cmdline. Then shutdown and look at the output that is generated then. If you find nothing useful in it, please paste it somewhere so that I can have a look. Nothing jumps out at me so I'd like to take you up on that offer :) http://dev.laptop.org/~dsd/20110928/systemd-debug-shutdown-hang.txt Hmm, so I don't see any obvious either. It might make sense to figure out which of the services hang on shutdown. Starting at 60.141079 you'll find the jobs that systemd determines it needs to execute for shut down. Each time one of these jobs is finished you see this logged, for example in 87.920164. It would make sense to see which jobs from the initial transaction never are finished, i.e. never get the matching Job ... finished message. What's going on with those serial8250: too much work for irq4 message? Note that by default systemd will log short pretty status messages to /dev/console. If that points to your serial driver and that driver is hosed then maybe that causes systemd to freeze? Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Boot does not complete: Failed to get D-Bus connection: Failed to authenticate in time.
On Thu, 29.09.11 00:16, Nick Urbanik (ni...@nicku.org) wrote: Dear Folks, I am running Fedora 15 on a 4-core x86_64 machine which received a power surge recently, and spontaneously rebooted. Strange behaviour followed, so I touched /forcefsck and /.autorelabel on separate occasions, rebooting each time. I observe the following behaviour regardless of whether I run setenforce 0 or setenforce 1. 1. Graphical boot does not complete; the Nagios server starts, but systemd announced that cups had failed, and gdm did not start up. 2. I cannot log in on the text consoles; I can type in the username, but no prompt appears for the password. 3. I can log in via ssh as either myself or as root. 4. Attempts to use su fail, but can be interrupted. 5. Attempts to use sudo hang, and require that I kill the process 6. Many attempts to use systemctl (for example, by itself to list the state of all the state of all services) result in a message after a time out, complaining about dbus timing out. 7. A cupsd and a dbus process are both present, but apparently not responding. # systemctl kill cups.service Failed to get D-Bus connection: Failed to authenticate in time. # systemctl Failed to get D-Bus connection: Failed to authenticate in time. This looks as if D-Bus is borked for some reason. Is there anything interesting in dmesg or syslog? Do you get a shell if you specify 1 and emergency on the kernel cmdline? Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 1/4] WIP: service: add watchdog timestamp
--- src/dbus-service.c |4 src/service.c | 20 src/service.h |2 ++ 3 files changed, 26 insertions(+), 0 deletions(-) diff --git a/src/dbus-service.c b/src/dbus-service.c index 3486623..5a740de 100644 --- a/src/dbus-service.c +++ b/src/dbus-service.c @@ -43,6 +43,8 @@ property name=\NotifyAccess\ type=\s\ access=\read\/\n \ property name=\RestartUSec\ type=\t\ access=\read\/\n \ property name=\TimeoutUSec\ type=\t\ access=\read\/\n \ + property name=\WatchdogTimestamp\ type=\t\ access=\read\/\n \ + property name=\WatchdogTimestampMonotonic\ type=\t\ access=\read\/\n \ BUS_EXEC_COMMAND_INTERFACE(ExecStartPre) \ BUS_EXEC_COMMAND_INTERFACE(ExecStart) \ BUS_EXEC_COMMAND_INTERFACE(ExecStartPost) \ @@ -104,6 +106,8 @@ DBusHandlerResult bus_service_message_handler(Unit *u, DBusConnection *connectio { org.freedesktop.systemd1.Service, NotifyAccess, bus_service_append_notify_access, s, u-service.notify_access }, { org.freedesktop.systemd1.Service, RestartUSec, bus_property_append_usec, t, u-service.restart_usec }, { org.freedesktop.systemd1.Service, TimeoutUSec, bus_property_append_usec, t, u-service.timeout_usec }, +{ org.freedesktop.systemd1.Service, WatchdogTimestamp, bus_property_append_usec, t, u-service.watchdog_timestamp.realtime }, +{ org.freedesktop.systemd1.Service, WatchdogTimestampMonotonic,bus_property_append_usec,t, u-service.watchdog_timestamp.monotonic }, BUS_EXEC_COMMAND_PROPERTY(org.freedesktop.systemd1.Service, u-service.exec_command[SERVICE_EXEC_START_PRE], ExecStartPre), BUS_EXEC_COMMAND_PROPERTY(org.freedesktop.systemd1.Service, u-service.exec_command[SERVICE_EXEC_START], ExecStart), BUS_EXEC_COMMAND_PROPERTY(org.freedesktop.systemd1.Service, u-service.exec_command[SERVICE_EXEC_START_POST], ExecStartPost), diff --git a/src/service.c b/src/service.c index c2053ce..6fbc023 100644 --- a/src/service.c +++ b/src/service.c @@ -194,6 +194,19 @@ static void service_connection_unref(Service *s) { s-accept_socket = NULL; } +static void service_stop_watchdog(Service *s) { +assert(s); + +s-watchdog_timestamp.realtime = 0; +s-watchdog_timestamp.monotonic = 0; +} + +static void service_reset_watchdog(Service *s) { +assert(s); + +dual_timestamp_get(s-watchdog_timestamp); +} + static void service_done(Unit *u) { Service *s = SERVICE(u); @@ -1514,6 +1527,9 @@ static void service_set_state(Service *s, ServiceState state) { service_connection_unref(s); } +if (state == SERVICE_STOP) +service_stop_watchdog(s); + /* For the inactive states unit_notify() will trim the cgroup, * but for exit we have to do that ourselves... */ if (state == SERVICE_EXITED s-meta.manager-n_reloading = 0) @@ -2985,6 +3001,10 @@ static void service_notify_message(Unit *u, pid_t pid, char **tags) { } } +if (strv_find(tags, WATCHDOG=1)) { +log_debug(%s: got WATCHDOG=1, u-meta.id); +service_reset_watchdog(s); +} /* Notify clients about changed status or main pid */ unit_add_to_dbus_queue(u); diff --git a/src/service.h b/src/service.h index e28f74b..3801e84 100644 --- a/src/service.h +++ b/src/service.h @@ -98,6 +98,8 @@ struct Service { usec_t restart_usec; usec_t timeout_usec; +dual_timestamp watchdog_timestamp; + ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX]; ExecContext exec_context; -- 1.7.5.4 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Restart=always and ExecStartPre
On Wed, 28 Sep 2011 17:49:54 +0200 Reindl Harald wrote: why does systemd not restart a killed service if the ExecStartPre-process is still running, see below - at my opinion after killall afpd the service should be restarted and in a perfect case even if ExecStartPre-process dies systemd-26-10.fc15.x86_64 I see at least three issues here. See below. [root@testserver:~]$ cat /lib/systemd/system/netatalk.service [Unit] Description=Apple-File-Server After=syslog.target network.target avahi-daemon.service [Service] Type=forking PIDFile=/var/run/netatalk.pid ExecStartPre=/usr/sbin/cnid_metad -l log_note issue #1: ExecStartPre is not supposed to fork off daemons. A future version of systemd might even enforce this rule. The service should be split into two. ... [root@testserver:~]$ systemctl status netatalk.service netatalk.service - Apple-File-Server Loaded: loaded (/lib/systemd/system/netatalk.service) Active: active (running) since Wed, 28 Sep 2011 17:45:55 ... Main PID: 1812 (code=exited, status=0/SUCCESS) issue #2: This Main PID looks like stale information from a previous run of the service. This is a minor bug in systemd that it does not reset it. There are two reasons why systemd failed to detect the new main PID: - issue #3: afpd has a racy daemonization sequence. It writes its PID file too late. My recently proposed patch service: delayed main PID guessing should be able to workaround it. - Given that systemd could not read the information from the PID file, it tried to guess the main PID, but it also failed, because there are two top-level processes in the cgroup: CGroup: name=systemd:/system/netatalk.service ├ 1999 /usr/sbin/cnid_metad -l log_note └ 2002 /usr/sbin/afpd -P /var/run/netatalk.pid -F /etc/netatalk/afpd.conf systemd could tell which one of them is the main PID. When the main PID is not known, the only way to detect the death of the service is to watch for the cgroup getting empty. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel