Bug#781151: boot often stalls with two "A start job is running" messages: binfmt and schroot sessions
Dear Michael, The machine on which I was seeing this bug recently died. In addition, the situation had improved and I can't remember whether I had seen the bug recently before the machine started to decay about six months ago. If I remember correctly, what did help quite a lot was to empty /tmp from a recovery shell when the machine was failing to boot. I don't know whether systemd could do it automatically early in the boot process. Anyway, I cannot test anymore. Feel free to close this bug. Kind regards, Thibaut.
Bug#781151: boot often stalls with two "A start job is running" messages: binfmt and schroot sessions
Hi Thibaut, sorry for not responding earlier. On Tue, 31 Mar 2015 09:45:43 +0200 Thibaut Paumardwrote: > Hi Michael, > > Le 30/03/2015 09:55, Thibaut Paumard a écrit : > > In the meanwhile I think I have found the culprit (butI can not be sure > > for a bug that is not systematic): I had installed and removed, but not > > purged, munge. After purging munge, the system rebooted fine two times, > > with some time working in between. > > Turns out munge was not the culprit either. I have re-enabled > binfmt-support and schroot and boot failed. > > Attached are the output of the four commands. A few services were still > struggling to start at that point, but logind had failed and it was > clear boot would not succeed. Do you still run into this issue today on an up-to-date sid/stretch system? If so, could you resend the information? Regards, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Le 27/03/2015 17:44, Michael Biebl a écrit : Am 27.03.2015 um 17:40 schrieb Thibaut Paumard: Le 27/03/2015 10:33, Thibaut Paumard a écrit : I'm going to disable binfmt-support, just for checking, and report when boot stalls again. I confirm that even with binfmt-support disabled, boot stalls. Actually the system is booted, but unusable because core services failed to start (including logind). It is then impossible to start those services from the debug shell, and impossible to halt the machine from the debug shell (halt, reboot don't return and don't halt the system). Can you boot with systemd.log_level=debug on the kernel command line and attach the output of journalctl -alb to the bug report. I guess you want that one day when booting fails? In the meanwhile I think I have found the culprit (butI can not be sure for a bug that is not systematic): I had installed and removed, but not purged, munge. After purging munge, the system rebooted fine two times, with some time working in between. I'll add the log_level stuff to my command line and report if boot fails again. Kind regards, Thibaut. Thanks, Michael -- signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Le 25/03/2015 18:25, Thibaut Paumard a écrit : Dear Michael, Thanks, indeed it does look like a race condition between this script and systemd support for binfmt. I guess the schroot service should somehow depend on binfmt support to have terminated. It is not so easy for me to check though, because the boot process tends to run smoothly when I reboot several times in a row. Any solution will take several days at least to be confirmed. I rebooted several times with no problem after disabling schroot, but then again with no problem after re-enabling it. Dear Michael, schroot is not the culprit. Boot stalled again today with schroot init script disabled. Actually, I was wrongly focusing on the last failing events. binfmt-support is the last bit of failure remaining because it happens to have no timeout. Let's review the symptoms: - boot often stalls when I reboot after working for some time, but not when rebooting several times in a row; - systemd issues messages of several services taking unusual long time to complete; - boot overall feels slower than usual; - at least on certain occasions (perhaps always), two important services fail to start: * systemd-logind * network-manager. Logind fails very early, that's usually the first message after the few kernel messages. Now I think the impression that boot stalls is due to logind failing to start. This is (obviously) the reason why I never see a login prompt. Today, using the debug shell, I manually stopped binfmt support that was failing to start, and tried starting logind manually, which did no work. The following command never returned, I killed it with ^C after approx. 30-60s: systemctl start systemd-logind.service I'm going to disable binfmt-support, just for checking, and report when boot stalls again. Kind regards, Thibaut. signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Am 27.03.2015 um 17:40 schrieb Thibaut Paumard: Le 27/03/2015 10:33, Thibaut Paumard a écrit : I'm going to disable binfmt-support, just for checking, and report when boot stalls again. I confirm that even with binfmt-support disabled, boot stalls. Actually the system is booted, but unusable because core services failed to start (including logind). It is then impossible to start those services from the debug shell, and impossible to halt the machine from the debug shell (halt, reboot don't return and don't halt the system). Can you boot with systemd.log_level=debug on the kernel command line and attach the output of journalctl -alb to the bug report. Thanks, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Le 27/03/2015 10:33, Thibaut Paumard a écrit : I'm going to disable binfmt-support, just for checking, and report when boot stalls again. I confirm that even with binfmt-support disabled, boot stalls. Actually the system is booted, but unusable because core services failed to start (including logind). It is then impossible to start those services from the debug shell, and impossible to halt the machine from the debug shell (halt, reboot don't return and don't halt the system). Regards, Thibaut. signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Control: tags -1 moreinfo Dear Thibaud, Am 25.03.2015 um 10:39 schrieb Thibaut Paumard: About each time I reboot my computer, boot stalls with two start jobs unable to complete: A start job is running for Enable support for additional binary formats ([...] / no limit) A start job is running for LSB: Recover schroot sessions ([...] / no limit) When that happens, I have to forcibly halt the computer (an Apple MacBook Pro) by holding the power button. Next boot usually goes fine. This has been happening since at least end of December 2014. It looks random, with a fairly high probability (~50%). Can you boot with the following added to your kernel command line (man kerne-command-line) systemd.debug-shell. This will start a debug shell on tty9. If your system hang during boot, please switch to tty9, then save the output of ps aux systemctl list-jobs systemd-cgls Thanks, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Am 25.03.2015 um 16:55 schrieb Thibaut Paumard: Le 25/03/2015 14:52, Michael Biebl a écrit : If your system hang during boot, please switch to tty9, then save the output of ps aux systemctl list-jobs systemd-cgls Thanks Michael, the output of each command is attached in the corresponding file. Looks like a some kind of bug in schroot to me, which causes a dead lock. I assume, if you disable the schroot.service (update-rc.d disable schroot), the problem is gone? -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Le 25/03/2015 14:52, Michael Biebl a écrit : If your system hang during boot, please switch to tty9, then save the output of ps aux systemctl list-jobs systemd-cgls Thanks Michael, the output of each command is attached in the corresponding file. Kind regards, Thibaut. USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 1 5.8 0.0 30840 6588 ?Ss 16:01 0:11 /sbin/init root 2 0.0 0.0 0 0 ?S16:01 0:00 [kthreadd] root 3 1.2 0.0 0 0 ?S16:01 0:02 [ksoftirqd/0] root 4 0.0 0.0 0 0 ?S16:01 0:00 [kworker/0:0] root 5 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/0:0H] root 6 0.1 0.0 0 0 ?S16:01 0:00 [kworker/u16:0] root 7 0.2 0.0 0 0 ?S16:01 0:00 [rcu_sched] root 8 0.0 0.0 0 0 ?S16:01 0:00 [rcu_bh] root 9 0.0 0.0 0 0 ?S16:01 0:00 [migration/0] root10 0.0 0.0 0 0 ?S16:01 0:00 [watchdog/0] root11 0.2 0.0 0 0 ?S16:01 0:00 [watchdog/1] root12 0.0 0.0 0 0 ?S16:01 0:00 [migration/1] root13 0.0 0.0 0 0 ?S16:01 0:00 [ksoftirqd/1] root14 0.0 0.0 0 0 ?S16:01 0:00 [kworker/1:0] root15 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/1:0H] root16 0.0 0.0 0 0 ?S16:01 0:00 [watchdog/2] root17 0.0 0.0 0 0 ?S16:01 0:00 [migration/2] root18 0.0 0.0 0 0 ?S16:01 0:00 [ksoftirqd/2] root19 0.0 0.0 0 0 ?S16:01 0:00 [kworker/2:0] root20 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/2:0H] root21 0.0 0.0 0 0 ?S16:01 0:00 [watchdog/3] root22 0.0 0.0 0 0 ?S16:01 0:00 [migration/3] root23 0.2 0.0 0 0 ?S16:01 0:00 [ksoftirqd/3] root24 0.0 0.0 0 0 ?S16:01 0:00 [kworker/3:0] root25 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/3:0H] root26 0.0 0.0 0 0 ?S16:01 0:00 [watchdog/4] root27 0.0 0.0 0 0 ?S16:01 0:00 [migration/4] root28 0.1 0.0 0 0 ?S16:01 0:00 [ksoftirqd/4] root29 0.0 0.0 0 0 ?S16:01 0:00 [kworker/4:0] root30 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/4:0H] root31 0.0 0.0 0 0 ?S16:01 0:00 [watchdog/5] root32 0.0 0.0 0 0 ?S16:01 0:00 [migration/5] root33 0.0 0.0 0 0 ?S16:01 0:00 [ksoftirqd/5] root34 0.0 0.0 0 0 ?S16:01 0:00 [kworker/5:0] root35 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/5:0H] root36 0.0 0.0 0 0 ?S16:01 0:00 [watchdog/6] root37 0.0 0.0 0 0 ?S16:01 0:00 [migration/6] root38 0.0 0.0 0 0 ?S16:01 0:00 [ksoftirqd/6] root39 0.0 0.0 0 0 ?S16:01 0:00 [kworker/6:0] root40 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/6:0H] root41 0.0 0.0 0 0 ?S16:01 0:00 [watchdog/7] root42 0.0 0.0 0 0 ?S16:01 0:00 [migration/7] root43 0.2 0.0 0 0 ?S16:01 0:00 [ksoftirqd/7] root44 0.0 0.0 0 0 ?S16:01 0:00 [kworker/7:0] root45 0.0 0.0 0 0 ?S 16:01 0:00 [kworker/7:0H] root46 0.0 0.0 0 0 ?S 16:01 0:00 [khelper] root47 0.0 0.0 0 0 ?S16:01 0:00 [kdevtmpfs] root48 0.0 0.0 0 0 ?S 16:01 0:00 [netns] root49 0.0 0.0 0 0 ?S16:01 0:00 [khungtaskd] root50 0.0 0.0 0 0 ?S 16:01 0:00 [writeback] root51 0.0 0.0 0 0 ?SN 16:01 0:00 [ksmd] root52 0.0 0.0 0 0 ?SN 16:01 0:00 [khugepaged] root53 0.0 0.0 0 0 ?S 16:01 0:00 [crypto] root54 0.0 0.0 0 0 ?S 16:01 0:00 [kintegrityd] root55 0.0 0.0 0 0 ?S 16:01 0:00 [bioset] root56 0.0 0.0 0 0 ?S 16:01 0:00 [kblockd] root57 0.0 0.0 0 0 ?S16:01 0:00 [kworker/7:1] root58 0.0 0.0 0 0 ?S16:01 0:00 [kworker/6:1] root59 0.0 0.0 0 0 ?S16:01 0:00 [kworker/5:1] root60 0.0 0.0 0 0 ?
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Le 25/03/2015 17:22, Michael Biebl a écrit : Am 25.03.2015 um 17:10 schrieb Michael Biebl: Am 25.03.2015 um 16:55 schrieb Thibaut Paumard: Le 25/03/2015 14:52, Michael Biebl a écrit : If your system hang during boot, please switch to tty9, then save the output of ps aux systemctl list-jobs systemd-cgls Thanks Michael, the output of each command is attached in the corresponding file. Looks like a some kind of bug in schroot to me, which causes a dead lock. I assume, if you disable the schroot.service (update-rc.d disable schroot), the problem is gone? https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677811 looks related, although this bug is marked as fixed. You could try disabling the /etc/schroot/setup.d/15binfmt hook script, to narrow down the problem. Dear Michael, Thanks, indeed it does look like a race condition between this script and systemd support for binfmt. I guess the schroot service should somehow depend on binfmt support to have terminated. It is not so easy for me to check though, because the boot process tends to run smoothly when I reboot several times in a row. Any solution will take several days at least to be confirmed. I rebooted several times with no problem after disabling schroot, but then again with no problem after re-enabling it. Kind regards, Thibaut. -- signature.asc Description: OpenPGP digital signature
Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions
Am 25.03.2015 um 17:10 schrieb Michael Biebl: Am 25.03.2015 um 16:55 schrieb Thibaut Paumard: Le 25/03/2015 14:52, Michael Biebl a écrit : If your system hang during boot, please switch to tty9, then save the output of ps aux systemctl list-jobs systemd-cgls Thanks Michael, the output of each command is attached in the corresponding file. Looks like a some kind of bug in schroot to me, which causes a dead lock. I assume, if you disable the schroot.service (update-rc.d disable schroot), the problem is gone? https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677811 looks related, although this bug is marked as fixed. You could try disabling the /etc/schroot/setup.d/15binfmt hook script, to narrow down the problem. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature