On Wed, Oct 22, 2014 at 4:59 PM, Lennart Poettering <lenn...@poettering.net> wrote: > On Wed, 22.10.14 13:10, Damien Robert (damien.olivier.robert+gm...@gmail.com) > wrote: > >> On one boot I got a watchdog timeout on systemd-journald: >> >> Oct 21 20:08:21 feanor systemd-journal[213]: Permanent journal is using >> 68.7M (m >> Oct 21 20:08:21 feanor systemd-journal[213]: Time spent on flushing to /var >> is 2 >> Oct 21 20:08:25 feanor kernel: IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not >> rea >> Oct 21 20:08:25 feanor kernel: wlan0: authenticate with f4:ca:e5:cc:c7:40 >> Oct 21 20:08:25 feanor kernel: wlan0: send auth to f4:ca:e5:cc:c7:40 (try >> 1/3) >> Oct 21 20:08:25 feanor kernel: wlan0: authenticated >> Oct 21 20:08:25 feanor kernel: wlan0: associate with f4:ca:e5:cc:c7:40 (try >> 1/3) >> Oct 21 20:08:25 feanor kernel: wlan0: RX AssocResp from f4:ca:e5:cc:c7:40 >> (capab >> Oct 21 20:08:25 feanor kernel: wlan0: associated >> Oct 21 20:08:25 feanor kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link >> become >> Oct 21 20:08:20 feanor systemd[1]: Started Trigger Flushing of Journal to >> Persis >> Oct 21 20:08:20 feanor systemd[1]: Started Create Volatile Files and >> Directories >> Oct 21 20:08:20 feanor systemd[1]: Starting Network Time Synchronization... >> Oct 21 20:08:20 feanor systemd[1]: Starting Update UTMP about System >> Boot/Shutdo >> Oct 21 20:08:20 feanor systemd[1]: Started Update UTMP about System >> Boot/Shutdow >> Oct 21 20:08:20 feanor systemd[1]: Started Network Time Synchronization. >> Oct 21 20:08:20 feanor systemd[1]: Starting System Initialization. >> Oct 21 20:08:21 feanor systemd[1]: Reached target System Initialization. >> Oct 21 20:08:21 feanor systemd[1]: Starting sshd.socket. >> Oct 21 20:08:21 feanor systemd[1]: Listening on sshd.socket. >> Oct 21 20:08:21 feanor systemd[1]: Starting D-Bus System Message Bus Socket. >> Oct 21 20:08:25 feanor systemd-logind[516]: New seat seat0. >> Oct 21 20:08:25 feanor systemd-networkd[518]: lo : gained >> carrier >> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on >> /dev/inpu >> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on >> /dev/inpu >> Oct 21 20:08:25 feanor systemd-logind[516]: Watching system buttons on >> /dev/inpu >> Oct 21 20:08:25 feanor systemd[1]: Started Network Service. >> Oct 21 20:08:25 feanor systemd[1]: Starting Network. >> Oct 21 20:08:25 feanor systemd[1]: Reached target Network. >> Oct 21 20:08:25 feanor systemd[1]: Starting Network Name Resolution... >> Oct 21 20:11:20 feanor systemd-journal[574]: Permanent journal is using >> 75.1M (m >> Oct 21 20:11:20 feanor kernel: e1000e 0000:00:19.0: irq 55 for MSI/MSI-X >> Oct 21 20:11:20 feanor kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not >> read >> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service watchdog timeout >> (li >> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service... >> Oct 21 20:11:20 feanor systemd[1]: Started Getty on tty2. >> Oct 21 20:11:20 feanor systemd[1]: lightdm.service start operation timed >> out. Te >> Oct 21 20:11:20 feanor systemd[1]: systemd-timesyncd.service watchdog >> timeout (l >> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service stop-sigterm >> timed o >> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service... >> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service: main process >> exited >> Oct 21 20:11:20 feanor systemd[1]: Unit systemd-journald.service entered >> failed >> Oct 21 20:11:20 feanor systemd[1]: systemd-journald.service has no holdoff >> time, >> Oct 21 20:11:20 feanor systemd[1]: Stopping Journal Service... >> Oct 21 20:11:20 feanor systemd[1]: Starting Journal Service... >> Oct 21 20:11:20 feanor systemd[1]: Started Journal Service. >> >> As you can see at 20:08:20 things appears to look fine, but everything >> get stuck during 3 minutes until there is a watchdog timeout for the >> journal and it gets restarted. Then the boot finished without any further >> problem. The next time everything worked fine, so I am wondering what could >> have happened? > > That's difficult to say just from these logs. Can you reliably > reproduce this? If so, can you attach strace to journald before this > happens and see what it is doing?
I am wondering if we should send another signal that generates core when the watchdog is not fed. It would be nice to get the coredump of a frozen process. Umut > > What I find interesting is that after the 3min pause suddenly the > network driver allocates irq 55. This kinda makes me wonder if this > might be a kernel/driver problem of some kind, where for some reason > userspace (and hence journald) don't get scheduled? > > What distro/version/arch is this? > > Lennart > > -- > Lennart Poettering, Red Hat > _______________________________________________ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/systemd-devel _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel