Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
On Mon, 27.04.15 17:49, Christian Hesse (l...@eworm.de) wrote: As for the actual lockup, I'm afraid I don't understand at all what is happening (I'm anot familiar at all with how journald interacts with other services and D-Bus/logind). So from my POV my best recommendation would be to revert commit 13790add4 upstream for now until this gets understood and fixed properly, especially if/when version 220 should be released. Breaking booting is much worse than not being able to restart journald. Any news about this one? Looks like everybody is waiting for a fix and nobody is working on it... See: http://lists.freedesktop.org/archives/systemd-devel/2015-April/031327.html Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Martin Pitt martin.p...@ubuntu.com on Sat, 2015/04/11 10:38: Hello Tobias, Tobias Hunger [2015-04-11 2:17 +0200]: did you make any progress with this bug? Apparently the same issue is blocking systemd-219 from getting into arch linux ( https://bugs.archlinux.org/task/44016 ), so this seems to be a wide-spread issue. Is anyone taking a serious look into this issue? Sorry, no, I was pretty busy with making systemd work good enough for the impending Debian and Ubuntu releases. A few weeks ago I mostly wanted to see whether this was specific to Debian/Ubuntu somehow, and I couldn't reproduce it in a VM with Fedora 21 plus dbus and systemd from rawhide. But in the meantime we got plenty of confirmations that it affects Fedora and now Arch, so I don't believe this is actually related to d-bus or something such. As for the actual lockup, I'm afraid I don't understand at all what is happening (I'm anot familiar at all with how journald interacts with other services and D-Bus/logind). So from my POV my best recommendation would be to revert commit 13790add4 upstream for now until this gets understood and fixed properly, especially if/when version 220 should be released. Breaking booting is much worse than not being able to restart journald. Any news about this one? Looks like everybody is waiting for a fix and nobody is working on it... I do not know how to debug this. If I can help let me know. -- main(a){char*c=/*Schoene Gruesse */B?IJj;MEH CX:;,b;for(a/*Chris get my mail address:*/=0;b=c[a++];) putchar(b-1/(/* gcc -o sig sig.c ./sig*/b/42*2-3)*42);} pgpCZgV7v7BLX.pgp Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Hello Tobias, Tobias Hunger [2015-04-11 2:17 +0200]: did you make any progress with this bug? Apparently the same issue is blocking systemd-219 from getting into arch linux ( https://bugs.archlinux.org/task/44016 ), so this seems to be a wide-spread issue. Is anyone taking a serious look into this issue? Sorry, no, I was pretty busy with making systemd work good enough for the impending Debian and Ubuntu releases. A few weeks ago I mostly wanted to see whether this was specific to Debian/Ubuntu somehow, and I couldn't reproduce it in a VM with Fedora 21 plus dbus and systemd from rawhide. But in the meantime we got plenty of confirmations that it affects Fedora and now Arch, so I don't believe this is actually related to d-bus or something such. As for the actual lockup, I'm afraid I don't understand at all what is happening (I'm anot familiar at all with how journald interacts with other services and D-Bus/logind). So from my POV my best recommendation would be to revert commit 13790add4 upstream for now until this gets understood and fixed properly, especially if/when version 220 should be released. Breaking booting is much worse than not being able to restart journald. Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Hi Martin, did you make any progress with this bug? Apparently the same issue is blocking systemd-219 from getting into arch linux ( https://bugs.archlinux.org/task/44016 ), so this seems to be a wide-spread issue. Is anyone taking a serious look into this issue? Best Regards, Tobias On Mon, Mar 2, 2015 at 11:29 AM, Martin Pitt martin.p...@ubuntu.com wrote: Hey Lennart, Lennart Poettering [2015-02-28 13:05 +0100]: Any idea about the details of this? For the record, I'm still working on this on-and-off (I got some other urgent things to work on, though). It took me a while to install Fedora, as the rawhide images and upgrade are both broken ATM, but I now have F21 with rawhide's dbus and systemd in a VM. That's fairly close to Debian sid plus systemd from experimental. We did get reports about that hang under Debian, so it's at least likely that Fedora is affected too. But so far my simple reproducer doesn't trigger it under either, so I need to keep digging and experimenting, and probably go with the full reboot test iterations. Of course I'll follow up here once I find out more. Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Hey Lennart, Lennart Poettering [2015-02-28 13:05 +0100]: Any idea about the details of this? For the record, I'm still working on this on-and-off (I got some other urgent things to work on, though). It took me a while to install Fedora, as the rawhide images and upgrade are both broken ATM, but I now have F21 with rawhide's dbus and systemd in a VM. That's fairly close to Debian sid plus systemd from experimental. We did get reports about that hang under Debian, so it's at least likely that Fedora is affected too. But so far my simple reproducer doesn't trigger it under either, so I need to keep digging and experimenting, and probably go with the full reboot test iterations. Of course I'll follow up here once I find out more. Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
On Sun, 22.02.15 17:55, Martin Pitt (martin.p...@ubuntu.com) wrote: Lennart Poettering [2015-02-20 17:02 +0100]: To me this appears as if dbus is hanging for some reason. Have you checked what dbus is doing? D-Bus itself seems to be fine. There are services on it, busctl works, etc. Anyway, I now have a capable enough arsenal to reproduce that hang fully automatically and a git bisect run script, which after a few hours of grinding spat out that this is the culprit: http://cgit.freedesktop.org/systemd/systemd/commit/?id=13790add4bf64 (journald: allow restarting journald without losing stream connections) and indeed reverting that on top of current git master (reverts cleanly) makes things work perfectly again. I haven't drilled down into the patch itself yet, that's not something I want to start doing on a Sunday :-) Any idea about the details of this? Is this reproducible with unpatched 219 (in particular without the fsckd patches applied) I have never seen this issue, can you tell me how to reproduce this on my machine? Does your boot process restart journald or so? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Lennart Poettering [2015-02-20 17:02 +0100]: To me this appears as if dbus is hanging for some reason. Have you checked what dbus is doing? D-Bus itself seems to be fine. There are services on it, busctl works, etc. Anyway, I now have a capable enough arsenal to reproduce that hang fully automatically and a git bisect run script, which after a few hours of grinding spat out that this is the culprit: http://cgit.freedesktop.org/systemd/systemd/commit/?id=13790add4bf64 (journald: allow restarting journald without losing stream connections) and indeed reverting that on top of current git master (reverts cleanly) makes things work perfectly again. I haven't drilled down into the patch itself yet, that's not something I want to start doing on a Sunday :-) Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) signature.asc Description: Digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Hello all, Since we updated to 219 in Ubuntu, several people reported boot failures. Booting hangs a long time after starting D-Bus, in the journal you get a lot of error messages like systemd[1]: Failed to register match for Disconnected message: Connection timed out systemd-logind[749]: Failed to fully start up daemon: Connection timed out dbus[800]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi, etc.) to get stuck in an eternal retry loop. Unfortunately reproducing this is a real nuisance, classic heisenbug. I'm now able to trigger it (sometimes) in a VM, but I still haven't found a reliable recipe for reproducing it, so that bisecting just takes ages. I'm keeping debug log, notes, and progress in https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for other distros in case they also get reports like this, to shortcut the debugging exercise (I already wasted 7 hours on this, and I'm not even close to the solution). Quite surprisingly it's somewhere in journald. Running 218 with journald from 219 causes the hang, 219 with journald from 218 is fine. Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
2015-02-20 15:36 GMT+01:00 Martin Pitt martin.p...@ubuntu.com: Hello all, Since we updated to 219 in Ubuntu, several people reported boot failures. Booting hangs a long time after starting D-Bus, in the journal you get a lot of error messages like systemd[1]: Failed to register match for Disconnected message: Connection timed out systemd-logind[749]: Failed to fully start up daemon: Connection timed out dbus[800]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi, etc.) to get stuck in an eternal retry loop. Unfortunately reproducing this is a real nuisance, classic heisenbug. I'm now able to trigger it (sometimes) in a VM, but I still haven't found a reliable recipe for reproducing it, so that bisecting just takes ages. I'm keeping debug log, notes, and progress in https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for other distros in case they also get reports like this, to shortcut the debugging exercise (I already wasted 7 hours on this, and I'm not even close to the solution). Quite surprisingly it's somewhere in journald. Running 218 with journald from 219 causes the hang, 219 with journald from 218 is fine. I noticed this as well. Interestingly, it only ever happened after applying the fsckd patches and running with plymouth enabled. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Le 20/02/2015 15:41, Michael Biebl a écrit : 2015-02-20 15:36 GMT+01:00 Martin Pitt martin.p...@ubuntu.com: Hello all, Since we updated to 219 in Ubuntu, several people reported boot failures. Booting hangs a long time after starting D-Bus, in the journal you get a lot of error messages like systemd[1]: Failed to register match for Disconnected message: Connection timed out systemd-logind[749]: Failed to fully start up daemon: Connection timed out dbus[800]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi, etc.) to get stuck in an eternal retry loop. Unfortunately reproducing this is a real nuisance, classic heisenbug. I'm now able to trigger it (sometimes) in a VM, but I still haven't found a reliable recipe for reproducing it, so that bisecting just takes ages. I'm keeping debug log, notes, and progress in https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for other distros in case they also get reports like this, to shortcut the debugging exercise (I already wasted 7 hours on this, and I'm not even close to the solution). Quite surprisingly it's somewhere in journald. Running 218 with journald from 219 causes the hang, 219 with journald from 218 is fine. I noticed this as well. Interestingly, it only ever happened after applying the fsckd patches and running with plymouth enabled. We get it with quiet splash removed as well. It just that it's random on the machine load. We already ruled out the fsckd patch yesterday, but I did retry today again after this comment on my own vms and with systemd/udev/… ubuntu package 218-8ubuntu2 which doesn't contain the fsckd patch (and have the 218 and 219 systemd-fsck writing to /dev/console) + systemd-journal binary copied from 219-1ubuntu1, I was able to reproduce the hang after 15 boots. Cheers, Didier ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
On Fri, Feb 20, 2015 at 04:01:34PM +0100, Didier Roche wrote: Le 20/02/2015 15:41, Michael Biebl a écrit : 2015-02-20 15:36 GMT+01:00 Martin Pitt martin.p...@ubuntu.com: Hello all, Since we updated to 219 in Ubuntu, several people reported boot failures. Booting hangs a long time after starting D-Bus, in the journal you get a lot of error messages like systemd[1]: Failed to register match for Disconnected message: Connection timed out systemd-logind[749]: Failed to fully start up daemon: Connection timed out dbus[800]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi, etc.) to get stuck in an eternal retry loop. Unfortunately reproducing this is a real nuisance, classic heisenbug. I'm now able to trigger it (sometimes) in a VM, but I still haven't found a reliable recipe for reproducing it, so that bisecting just takes ages. I'm keeping debug log, notes, and progress in https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for other distros in case they also get reports like this, to shortcut the debugging exercise (I already wasted 7 hours on this, and I'm not even close to the solution). Quite surprisingly it's somewhere in journald. Running 218 with journald from 219 causes the hang, 219 with journald from 218 is fine. I noticed this as well. Interestingly, it only ever happened after applying the fsckd patches and running with plymouth enabled. We get it with quiet splash removed as well. It just that it's random on the machine load. We already ruled out the fsckd patch yesterday, but I did retry today again after this comment on my own vms and with systemd/udev/… ubuntu package 218-8ubuntu2 which doesn't contain the fsckd patch (and have the 218 and 219 systemd-fsck writing to /dev/console) + systemd-journal binary copied from 219-1ubuntu1, I was able to reproduce the hang after 15 boots. Anything interesetingif you attach gdb to systemd-jouranld?Can you paste bt and the Server variable (IIRC, it's *s in main)? How many open fds does sd-journald have? Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
On Fri, 20.02.15 15:36, Martin Pitt (martin.p...@ubuntu.com) wrote: Hello all, Since we updated to 219 in Ubuntu, several people reported boot failures. Booting hangs a long time after starting D-Bus, in the journal you get a lot of error messages like systemd[1]: Failed to register match for Disconnected message: Connection timed out systemd-logind[749]: Failed to fully start up daemon: Connection timed out dbus[800]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi, etc.) to get stuck in an eternal retry loop. Unfortunately reproducing this is a real nuisance, classic heisenbug. I'm now able to trigger it (sometimes) in a VM, but I still haven't found a reliable recipe for reproducing it, so that bisecting just takes ages. I'm keeping debug log, notes, and progress in https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for other distros in case they also get reports like this, to shortcut the debugging exercise (I already wasted 7 hours on this, and I'm not even close to the solution). Quite surprisingly it's somewhere in journald. Running 218 with journald from 219 causes the hang, 219 with journald from 218 is fine. Umm, please don't mix things like this... To me this appears as if dbus is hanging for some reason. Have you checked what dbus is doing? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail
Hey Zbigniew, Zbigniew Jędrzejewski-Szmek [2015-02-20 16:08 +0100]: Anything interesetingif you attach gdb to systemd-jouranld?Can you paste bt It's in __epoll_wait_nocancel (glibc) → sd_event_wait() - sd_event_run() - main(), nothing surprising I'd say. and the Server variable (IIRC, it's *s in main)? http://imagebin.org/33 gdb and other commands are a bit awkward to use in the debug shell, as systemd keeps painting its (1 of 5) a startup job is running... messages over the screen. (Is there a trick to stop that? I don't want to boot with quiet, that makes it much harder to see if/when it's failing) How many open fds does sd-journald have? Highest fd is 25, so I'd say trivial. Thanks, Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel