On Wed, Sep 28, 2011 at 4:03 PM, Lennart Poettering <lenn...@poettering.net> wrote: > Hmm, so I don't see any obvious either.
I dug a little further. I have enabled lockdep to see if it can find any deadlocks, and nothing comes up. I also turned to sysrq-D (show locks held) - no locks are held at the point of hang. sysrq-T (show task state) shows the following userspace processes running: systemd, udevd (several threads/instances), systemd-stdout-, systemd-kmsg-sy, plymouthd. (they are all epolling stuff) Can you confirm that at this point, those tasks should still be running (and should not have been killed by the mass kill of processes that just happened)? Just wondering if systemd is waiting for them to die, or is waiting on some other condition. Another observation: after waiting several minutes, systemd-stdout-syslog-bridge.service generates a notification message and seems to nudge systemd a little closer to completing the shutdown routine (please see 387.600571 in the full logs below). Full log of startup, shutdown, and sysrq dumps at point of first hang (before systemd-stdout-syslog-bridge.service wakeup), and second hang, then sysrq dumps again: http://dev.laptop.org/~dsd/20110929/systemd-shutdown-hang-debug.txt (Just FYI: A couple of things have changed since the last mails: I updated to systemd-36 and I updated the kernel to linus master, and patched in https://lkml.org/lkml/2011/8/9/318 which fixed warnings caused when I enabled lockdep) > It might make sense to figure out which of the services hang on > shutdown. Starting at 60.141079 you'll find the jobs that systemd > determines it needs to execute for shut down. Each time one of these > jobs is finished you see this logged, for example in 87.920164. It would > make sense to see which jobs from the initial transaction never are > finished, i.e. never get the matching "Job ... finished" message. OK, done this for the new log. here are my observations (sorry if any of them are stupid!): NetworkManager job is marked done at 60.850964 but logs messages afterwards? Weird ordering of messages here: [ 61.831071] systemd[1]: Child 1464 belongs to var-lib-random\x2dseed.mount [ 61.837978] systemd[1]: var-lib-random\x2dseed.mount mount process exited, code=exited status=0 [ 61.847778] systemd[1]: var-lib-random\x2dseed.mount changed unmounting -> dead [ 61.890550] systemd[1]: Job var-lib-random\x2dseed.mount/stop finished, result=done snip [ 62.298212] systemd[1]: Got SIGCHLD for process 1477 (umount) [ 62.305426] systemd[1]: Child 1477 died (code=exited, status=0/SUCCESS) [ 62.312138] systemd[1]: Child 1477 belongs to var-lib-dbus.mount [ 62.318182] systemd[1]: var-lib-dbus.mount mount process exited, code=exited status=0 [ 62.327267] systemd[1]: var-lib-dbus.mount changed unmounting -> dead [ 62.370552] systemd[1]: Job var-lib-dbus.mount/stop finished, result=done snip [ 62.955961] systemd[1]: var-lib-random\x2dseed.mount changed dead -> mounted [ 62.967152] systemd[1]: var-lib-dbus.mount changed dead -> mounted Why do they get marked as mounted after being unmounted? The other similar mounts all got 'collected', these 2 did not. Is it normal for the systemd-kmsg-syslogd service to be *started* quite late in the shutdown process? [ 63.062468] systemd[1]: Incoming traffic on syslog.socket [ 63.067936] systemd[1]: Trying to enqueue job systemd-kmsg-syslogd.service/start/replace [ 63.076120] systemd[1]: Installed new job systemd-kmsg-syslogd.service/start as 343 [ 63.083842] systemd[1]: Enqueued job systemd-kmsg-syslogd.service/start as 343 [ 63.091135] systemd[1]: syslog.socket changed listening -> running [ 63.097566] systemd[1]: About to execute: /lib/systemd/systemd-kmsg-syslogd [ 63.140299] systemd[1]: Forked /lib/systemd/systemd-kmsg-syslogd as 1510 [ 63.148024] systemd[1]: systemd-kmsg-syslogd.service changed dead -> running Every single "Installed new job" line has a corresponding "Job foo finished, result=done" line (I checked carefully). However, there are some other inconsistencies (not sure how relevant): Some services have a "cgroup is empty" message logged before or after they are marked as done (such as prefdm.service, avahi-daemon.service_ bu some services have no such line logged (e.g. sshd-keygen.service, console-kit-daemon.service) Some mounts/services are "collected" (e.g. console-kit-daemon.service, var-spool.mount) and others are not (e.g. crond.service, var-log.mount, var-lib-dbus.mount) I can't see anything that tries to shut down systemd-stdout-syslog-bridge.service. Is that normal? Might systemd be waiting for it? Thats all I see. > What's going on with those "serial8250: too much work for irq4" message? > Note that by default systemd will log short pretty status messages to > /dev/console. If that points to your serial driver and that driver is > hosed then maybe that causes systemd to freeze? Not sure. Its the standard x86 legacy serial driver - nothing special. The default console is indeed a serial console (thats how I'm logging). Those messages go away if I use tty0 as the default console (but the hang doesn't). And in fact they don't always appear (e.g. no occurrences in the log referenced in this mail). Any new ideas? Thanks a lot for your help on this. Daniel _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel