[systemd-devel] Boot does not complete: Failed to get D-Bus connection: Failed to authenticate in time.

2011-09-28 Thread Nick Urbanik

Dear Folks,

I am running Fedora 15 on a 4-core x86_64 machine which received a
power surge recently, and spontaneously rebooted.  Strange behaviour
followed, so I touched /forcefsck and /.autorelabel on separate
occasions, rebooting each time.  I observe the following behaviour
regardless of whether I run setenforce 0 or setenforce 1.

1. Graphical boot does not complete; the Nagios server starts, but
   systemd announced that cups had failed, and gdm did not start up.
2. I cannot log in on the text consoles; I can type in the username,
   but no prompt appears for the password.
3. I can log in via ssh as either myself or as root.
4. Attempts to use su fail, but can be interrupted.
5. Attempts to use sudo hang, and require that I kill the process
6. Many attempts to use systemctl (for example, by itself to list the
   state of all the state of all services) result in a message after a
   time out, complaining about dbus timing out.
7. A cupsd and a dbus process are both present, but apparently not
   responding.

# systemctl kill cups.service
Failed to get D-Bus connection: Failed to authenticate in time.
# systemctl
Failed to get D-Bus connection: Failed to authenticate in time.

I have experience with debugging the old /etc/init.d/script approach,
but I am not sure how to proceed with this problem: I am also hampered
by insufficient knowledge of dbus.  This is beyond my previous
experience, and would really appreciate some pointers on how to
proceed.

I am reading Lennart Poettering's systemd for Administrators now to
try to understand how this all works.  I have posted to
us...@lists.fedoraproject.org, but this list seems to be more
appropriate.
--
Nick Urbanik http://nicku.org   ni...@nicku.org
GPG: 7FFA CDC7 5A77 0558 DC7A 790A 16DF EC5B BB9D 2C24 ID: BB9D2C24
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd hangs on shutdown

2011-09-28 Thread Lennart Poettering
On Wed, 28.09.11 11:09, Daniel Drake (d...@laptop.org) wrote:

 
 On Tue, Sep 27, 2011 at 9:50 PM, Lennart Poettering
 lenn...@poettering.net wrote:
  Boot with systemd.log_level=debug and systemd.log_target=kmsg on the
  kernel cmdline. Then shutdown and look at the output that is generated
  then. If you find nothing useful in it, please paste it somewhere so
  that I can have a look.
 
 Nothing jumps out at me so I'd like to take you up on that offer :)
 http://dev.laptop.org/~dsd/20110928/systemd-debug-shutdown-hang.txt

Hmm, so I don't see any obvious either.

It might make sense to figure out which of the services hang on
shutdown. Starting at 60.141079 you'll find the jobs that systemd
determines it needs to execute for shut down. Each time one of these
jobs is finished you see this logged, for example in 87.920164. It would
make sense to see which jobs from the initial transaction never are
finished, i.e. never get the matching Job ... finished message.

What's going on with those serial8250: too much work for irq4 message?
Note  that by default systemd will log short pretty status messages to
/dev/console. If that points to your serial driver and that driver is
hosed then maybe that causes systemd to freeze?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Boot does not complete: Failed to get D-Bus connection: Failed to authenticate in time.

2011-09-28 Thread Lennart Poettering
On Thu, 29.09.11 00:16, Nick Urbanik (ni...@nicku.org) wrote:

 Dear Folks,
 
 I am running Fedora 15 on a 4-core x86_64 machine which received a
 power surge recently, and spontaneously rebooted.  Strange behaviour
 followed, so I touched /forcefsck and /.autorelabel on separate
 occasions, rebooting each time.  I observe the following behaviour
 regardless of whether I run setenforce 0 or setenforce 1.
 
 1. Graphical boot does not complete; the Nagios server starts, but
systemd announced that cups had failed, and gdm did not start up.
 2. I cannot log in on the text consoles; I can type in the username,
but no prompt appears for the password.
 3. I can log in via ssh as either myself or as root.
 4. Attempts to use su fail, but can be interrupted.
 5. Attempts to use sudo hang, and require that I kill the process
 6. Many attempts to use systemctl (for example, by itself to list the
state of all the state of all services) result in a message after a
time out, complaining about dbus timing out.
 7. A cupsd and a dbus process are both present, but apparently not
responding.
 
 # systemctl kill cups.service
 Failed to get D-Bus connection: Failed to authenticate in time.
 # systemctl
 Failed to get D-Bus connection: Failed to authenticate in time.

This looks as if D-Bus is borked for some reason.

Is there anything interesting in dmesg or syslog?

Do you get a shell if you specify 1 and emergency on the kernel
cmdline?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/4] WIP: service: add watchdog timestamp

2011-09-28 Thread Michael Olbrich
---
 src/dbus-service.c |4 
 src/service.c  |   20 
 src/service.h  |2 ++
 3 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/src/dbus-service.c b/src/dbus-service.c
index 3486623..5a740de 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,8 @@
   property name=\NotifyAccess\ type=\s\ access=\read\/\n \
   property name=\RestartUSec\ type=\t\ access=\read\/\n \
   property name=\TimeoutUSec\ type=\t\ access=\read\/\n \
+  property name=\WatchdogTimestamp\ type=\t\ 
access=\read\/\n \
+  property name=\WatchdogTimestampMonotonic\ type=\t\ 
access=\read\/\n \
 BUS_EXEC_COMMAND_INTERFACE(ExecStartPre)  \
 BUS_EXEC_COMMAND_INTERFACE(ExecStart) \
 BUS_EXEC_COMMAND_INTERFACE(ExecStartPost) \
@@ -104,6 +106,8 @@ DBusHandlerResult bus_service_message_handler(Unit *u, 
DBusConnection *connectio
 { org.freedesktop.systemd1.Service, NotifyAccess,  
 bus_service_append_notify_access, s, u-service.notify_access   },
 { org.freedesktop.systemd1.Service, RestartUSec,   
 bus_property_append_usec,   t, u-service.restart_usec  },
 { org.freedesktop.systemd1.Service, TimeoutUSec,   
 bus_property_append_usec,   t, u-service.timeout_usec  },
+{ org.freedesktop.systemd1.Service, WatchdogTimestamp, 
 bus_property_append_usec,   t, u-service.watchdog_timestamp.realtime },
+{ org.freedesktop.systemd1.Service, 
WatchdogTimestampMonotonic,bus_property_append_usec,t, 
u-service.watchdog_timestamp.monotonic },
 BUS_EXEC_COMMAND_PROPERTY(org.freedesktop.systemd1.Service, 
u-service.exec_command[SERVICE_EXEC_START_PRE],  ExecStartPre),
 BUS_EXEC_COMMAND_PROPERTY(org.freedesktop.systemd1.Service, 
u-service.exec_command[SERVICE_EXEC_START],  ExecStart),
 BUS_EXEC_COMMAND_PROPERTY(org.freedesktop.systemd1.Service, 
u-service.exec_command[SERVICE_EXEC_START_POST], ExecStartPost),
diff --git a/src/service.c b/src/service.c
index c2053ce..6fbc023 100644
--- a/src/service.c
+++ b/src/service.c
@@ -194,6 +194,19 @@ static void service_connection_unref(Service *s) {
 s-accept_socket = NULL;
 }
 
+static void service_stop_watchdog(Service *s) {
+assert(s);
+
+s-watchdog_timestamp.realtime = 0;
+s-watchdog_timestamp.monotonic = 0;
+}
+
+static void service_reset_watchdog(Service *s) {
+assert(s);
+
+dual_timestamp_get(s-watchdog_timestamp);
+}
+
 static void service_done(Unit *u) {
 Service *s = SERVICE(u);
 
@@ -1514,6 +1527,9 @@ static void service_set_state(Service *s, ServiceState 
state) {
 service_connection_unref(s);
 }
 
+if (state == SERVICE_STOP)
+service_stop_watchdog(s);
+
 /* For the inactive states unit_notify() will trim the cgroup,
  * but for exit we have to do that ourselves... */
 if (state == SERVICE_EXITED  s-meta.manager-n_reloading = 0)
@@ -2985,6 +3001,10 @@ static void service_notify_message(Unit *u, pid_t pid, 
char **tags) {
 }
 
 }
+if (strv_find(tags, WATCHDOG=1)) {
+log_debug(%s: got WATCHDOG=1, u-meta.id);
+service_reset_watchdog(s);
+}
 
 /* Notify clients about changed status or main pid */
 unit_add_to_dbus_queue(u);
diff --git a/src/service.h b/src/service.h
index e28f74b..3801e84 100644
--- a/src/service.h
+++ b/src/service.h
@@ -98,6 +98,8 @@ struct Service {
 usec_t restart_usec;
 usec_t timeout_usec;
 
+dual_timestamp watchdog_timestamp;
+
 ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
 ExecContext exec_context;
 
-- 
1.7.5.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Restart=always and ExecStartPre

2011-09-28 Thread Michal Schmidt
On Wed, 28 Sep 2011 17:49:54 +0200 Reindl Harald wrote:
 why does systemd not restart a killed service if the
 ExecStartPre-process is still running, see below - at my opinion
 after killall afpd the service should be restarted and in a perfect
 case even if ExecStartPre-process dies
 
 systemd-26-10.fc15.x86_64

I see at least three issues here. See below.

 [root@testserver:~]$ cat /lib/systemd/system/netatalk.service
 [Unit]
 Description=Apple-File-Server
 After=syslog.target network.target avahi-daemon.service
 [Service]
 Type=forking
 PIDFile=/var/run/netatalk.pid
 ExecStartPre=/usr/sbin/cnid_metad -l log_note

issue #1:
ExecStartPre is not supposed to fork off daemons. A future version of
systemd might even enforce this rule. The service should be split into
two.

 ...
 [root@testserver:~]$ systemctl status netatalk.service
 netatalk.service - Apple-File-Server
   Loaded: loaded (/lib/systemd/system/netatalk.service)
   Active: active (running) since Wed, 28 Sep 2011 17:45:55
...
 Main PID: 1812 (code=exited, status=0/SUCCESS)

issue #2:
This Main PID looks like stale information from a previous run of the
service. This is a minor bug in systemd that it does not reset it.

There are two reasons why systemd failed to detect the new main PID:
 - issue #3: afpd has a racy daemonization sequence. It writes its PID
   file too late. My recently proposed patch service: delayed main PID
   guessing should be able to workaround it.
 - Given that systemd could not read the information from the PID file,
   it tried to guess the main PID, but it also failed, because there
   are two top-level processes in the cgroup:

 CGroup: name=systemd:/system/netatalk.service
 ├ 1999 /usr/sbin/cnid_metad -l log_note
 └ 2002 /usr/sbin/afpd -P /var/run/netatalk.pid -F /etc/netatalk/afpd.conf

systemd could tell which one of them is the main PID.

When the main PID is not known, the only way to detect the death of the
service is to watch for the cgroup getting empty.

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel