Bug#781151: boot often stalls with two "A start job is running" messages: binfmt and schroot sessions

2017-11-22 Thread Thibaut Paumard

Dear Michael,

The machine on which I was seeing this bug recently died. In addition, 
the situation had improved and I can't remember whether I had seen the 
bug recently before the machine started to decay about six months ago.


If I remember correctly, what did help quite a lot was to empty /tmp 
from a recovery shell when the machine was failing to boot. I don't know 
whether systemd could do it automatically early in the boot process.


Anyway, I cannot test anymore. Feel free to close this bug.

Kind regards, Thibaut.



Bug#781151: boot often stalls with two "A start job is running" messages: binfmt and schroot sessions

2016-12-19 Thread Michael Biebl
Hi Thibaut,

sorry for not responding earlier.

On Tue, 31 Mar 2015 09:45:43 +0200 Thibaut Paumard 
wrote:
> Hi Michael,
> 
> Le 30/03/2015 09:55, Thibaut Paumard a écrit :
> > In the meanwhile I think I have found the culprit (butI can not be sure
> > for a bug that is not systematic): I had installed and removed, but not
> > purged, munge. After purging munge, the system rebooted fine two times,
> > with some time working in between.
> 
> Turns out munge was not the culprit either. I have re-enabled
> binfmt-support and schroot and boot failed.
> 
> Attached are the output of the four commands. A few services were still
> struggling to start at that point, but logind had failed and it was
> clear boot would not succeed.

Do you still run into this issue today on an up-to-date sid/stretch
system? If so, could you resend the information?

Regards,
Michael
-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-30 Thread Thibaut Paumard
Le 27/03/2015 17:44, Michael Biebl a écrit :
 Am 27.03.2015 um 17:40 schrieb Thibaut Paumard:
 Le 27/03/2015 10:33, Thibaut Paumard a écrit :

 I'm going to disable binfmt-support, just for checking, and report when
 boot stalls again.


 I confirm that even with binfmt-support disabled, boot stalls.
 Actually the system is booted, but unusable because core services failed
 to start (including logind).

 It is then impossible to start those services from the debug shell, and
 impossible to halt the machine from the debug shell (halt, reboot
 don't return and don't halt the system).

 
 Can you boot with systemd.log_level=debug on the kernel command line
 and attach the output of journalctl -alb to the bug report.

I guess you want that one day when booting fails?

In the meanwhile I think I have found the culprit (butI can not be sure
for a bug that is not systematic): I had installed and removed, but not
purged, munge. After purging munge, the system rebooted fine two times,
with some time working in between.

I'll add the log_level stuff to my command line and report if boot fails
again.

Kind regards, Thibaut.


 
 Thanks,
 Michael
 


-- 




signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-27 Thread Thibaut Paumard
Le 25/03/2015 18:25, Thibaut Paumard a écrit :
 
 Dear Michael,
 
 Thanks, indeed it does look like a race condition between this script
 and systemd support for binfmt. I guess the schroot service should
 somehow depend on binfmt support to have terminated.
 
 It is not so easy for me to check though, because the boot process tends
 to run smoothly when I reboot several times in a row. Any solution will
 take several days at least to be confirmed. I rebooted several times
 with no problem after disabling schroot, but then again with no problem
 after re-enabling it.


Dear Michael,

schroot is not the culprit. Boot stalled again today with schroot init
script disabled.

Actually, I was wrongly focusing on the last failing events.
binfmt-support is the last bit of failure remaining because it happens
to have no timeout.

Let's review the symptoms:

  - boot often stalls when I reboot after working for some time, but not
when rebooting several times in a row;

  - systemd issues messages of several services taking unusual long time
to complete;

  - boot overall feels slower than usual;

  - at least on certain occasions (perhaps always), two important
services fail to start:
* systemd-logind
* network-manager.

Logind fails very early, that's usually the first message after the few
kernel messages.

Now I think the impression that boot stalls is due to logind failing to
start. This is (obviously) the reason why I never see a login prompt.

Today, using the debug shell, I manually stopped binfmt support that was
failing to start, and tried starting logind manually, which did no work.
The following command never returned, I killed it with ^C after approx.
30-60s:
 systemctl start systemd-logind.service

I'm going to disable binfmt-support, just for checking, and report when
boot stalls again.

Kind regards, Thibaut.



signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-27 Thread Michael Biebl
Am 27.03.2015 um 17:40 schrieb Thibaut Paumard:
 Le 27/03/2015 10:33, Thibaut Paumard a écrit :

 I'm going to disable binfmt-support, just for checking, and report when
 boot stalls again.

 
 I confirm that even with binfmt-support disabled, boot stalls.
 Actually the system is booted, but unusable because core services failed
 to start (including logind).
 
 It is then impossible to start those services from the debug shell, and
 impossible to halt the machine from the debug shell (halt, reboot
 don't return and don't halt the system).
 

Can you boot with systemd.log_level=debug on the kernel command line
and attach the output of journalctl -alb to the bug report.

Thanks,
Michael

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-27 Thread Thibaut Paumard
Le 27/03/2015 10:33, Thibaut Paumard a écrit :

 I'm going to disable binfmt-support, just for checking, and report when
 boot stalls again.
 

I confirm that even with binfmt-support disabled, boot stalls.
Actually the system is booted, but unusable because core services failed
to start (including logind).

It is then impossible to start those services from the debug shell, and
impossible to halt the machine from the debug shell (halt, reboot
don't return and don't halt the system).

Regards, Thibaut.




signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-25 Thread Michael Biebl
Control: tags -1 moreinfo

Dear Thibaud,

Am 25.03.2015 um 10:39 schrieb Thibaut Paumard:
 About each time I reboot my computer, boot stalls with two start jobs unable 
 to
 complete:
 
 A start job is running for Enable support for additional binary formats ([...]
 / no limit)
 
 A start job is running for LSB: Recover schroot sessions ([...] / no limit)
 
 When that happens, I have to forcibly halt the computer (an Apple MacBook Pro)
 by holding the power button. Next boot usually goes fine.
 
 This has been happening since at least end of December 2014. It looks random,
 with a fairly high probability (~50%).

Can you boot with the following added to your kernel command line (man
kerne-command-line) systemd.debug-shell.

This will start a debug shell on tty9.

If your system hang during boot, please switch to tty9, then save the
output of
ps aux
systemctl list-jobs
systemd-cgls

Thanks,
Michael



-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-25 Thread Michael Biebl
Am 25.03.2015 um 16:55 schrieb Thibaut Paumard:
 Le 25/03/2015 14:52, Michael Biebl a écrit :
 

 If your system hang during boot, please switch to tty9, then save the
 output of
 ps aux
 systemctl list-jobs
 systemd-cgls
 
 Thanks Michael,
 
 the output of each command is attached in the corresponding file.
 

Looks like a some kind of bug in schroot to me, which causes a dead lock.
I assume, if you disable the schroot.service (update-rc.d disable
schroot), the problem is gone?


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-25 Thread Thibaut Paumard
Le 25/03/2015 14:52, Michael Biebl a écrit :

 
 If your system hang during boot, please switch to tty9, then save the
 output of
 ps aux
 systemctl list-jobs
 systemd-cgls

Thanks Michael,

the output of each command is attached in the corresponding file.

Kind regards, Thibaut.



USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 1  5.8  0.0  30840  6588 ?Ss   16:01   0:11 /sbin/init
root 2  0.0  0.0  0 0 ?S16:01   0:00 [kthreadd]
root 3  1.2  0.0  0 0 ?S16:01   0:02 [ksoftirqd/0]
root 4  0.0  0.0  0 0 ?S16:01   0:00 [kworker/0:0]
root 5  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/0:0H]
root 6  0.1  0.0  0 0 ?S16:01   0:00 [kworker/u16:0]
root 7  0.2  0.0  0 0 ?S16:01   0:00 [rcu_sched]
root 8  0.0  0.0  0 0 ?S16:01   0:00 [rcu_bh]
root 9  0.0  0.0  0 0 ?S16:01   0:00 [migration/0]
root10  0.0  0.0  0 0 ?S16:01   0:00 [watchdog/0]
root11  0.2  0.0  0 0 ?S16:01   0:00 [watchdog/1]
root12  0.0  0.0  0 0 ?S16:01   0:00 [migration/1]
root13  0.0  0.0  0 0 ?S16:01   0:00 [ksoftirqd/1]
root14  0.0  0.0  0 0 ?S16:01   0:00 [kworker/1:0]
root15  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/1:0H]
root16  0.0  0.0  0 0 ?S16:01   0:00 [watchdog/2]
root17  0.0  0.0  0 0 ?S16:01   0:00 [migration/2]
root18  0.0  0.0  0 0 ?S16:01   0:00 [ksoftirqd/2]
root19  0.0  0.0  0 0 ?S16:01   0:00 [kworker/2:0]
root20  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/2:0H]
root21  0.0  0.0  0 0 ?S16:01   0:00 [watchdog/3]
root22  0.0  0.0  0 0 ?S16:01   0:00 [migration/3]
root23  0.2  0.0  0 0 ?S16:01   0:00 [ksoftirqd/3]
root24  0.0  0.0  0 0 ?S16:01   0:00 [kworker/3:0]
root25  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/3:0H]
root26  0.0  0.0  0 0 ?S16:01   0:00 [watchdog/4]
root27  0.0  0.0  0 0 ?S16:01   0:00 [migration/4]
root28  0.1  0.0  0 0 ?S16:01   0:00 [ksoftirqd/4]
root29  0.0  0.0  0 0 ?S16:01   0:00 [kworker/4:0]
root30  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/4:0H]
root31  0.0  0.0  0 0 ?S16:01   0:00 [watchdog/5]
root32  0.0  0.0  0 0 ?S16:01   0:00 [migration/5]
root33  0.0  0.0  0 0 ?S16:01   0:00 [ksoftirqd/5]
root34  0.0  0.0  0 0 ?S16:01   0:00 [kworker/5:0]
root35  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/5:0H]
root36  0.0  0.0  0 0 ?S16:01   0:00 [watchdog/6]
root37  0.0  0.0  0 0 ?S16:01   0:00 [migration/6]
root38  0.0  0.0  0 0 ?S16:01   0:00 [ksoftirqd/6]
root39  0.0  0.0  0 0 ?S16:01   0:00 [kworker/6:0]
root40  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/6:0H]
root41  0.0  0.0  0 0 ?S16:01   0:00 [watchdog/7]
root42  0.0  0.0  0 0 ?S16:01   0:00 [migration/7]
root43  0.2  0.0  0 0 ?S16:01   0:00 [ksoftirqd/7]
root44  0.0  0.0  0 0 ?S16:01   0:00 [kworker/7:0]
root45  0.0  0.0  0 0 ?S   16:01   0:00 [kworker/7:0H]
root46  0.0  0.0  0 0 ?S   16:01   0:00 [khelper]
root47  0.0  0.0  0 0 ?S16:01   0:00 [kdevtmpfs]
root48  0.0  0.0  0 0 ?S   16:01   0:00 [netns]
root49  0.0  0.0  0 0 ?S16:01   0:00 [khungtaskd]
root50  0.0  0.0  0 0 ?S   16:01   0:00 [writeback]
root51  0.0  0.0  0 0 ?SN   16:01   0:00 [ksmd]
root52  0.0  0.0  0 0 ?SN   16:01   0:00 [khugepaged]
root53  0.0  0.0  0 0 ?S   16:01   0:00 [crypto]
root54  0.0  0.0  0 0 ?S   16:01   0:00 [kintegrityd]
root55  0.0  0.0  0 0 ?S   16:01   0:00 [bioset]
root56  0.0  0.0  0 0 ?S   16:01   0:00 [kblockd]
root57  0.0  0.0  0 0 ?S16:01   0:00 [kworker/7:1]
root58  0.0  0.0  0 0 ?S16:01   0:00 [kworker/6:1]
root59  0.0  0.0  0 0 ?S16:01   0:00 [kworker/5:1]
root60  0.0  0.0  0 0 ? 

Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-25 Thread Thibaut Paumard
Le 25/03/2015 17:22, Michael Biebl a écrit :
 Am 25.03.2015 um 17:10 schrieb Michael Biebl:
 Am 25.03.2015 um 16:55 schrieb Thibaut Paumard:
 Le 25/03/2015 14:52, Michael Biebl a écrit :


 If your system hang during boot, please switch to tty9, then save the
 output of
 ps aux
 systemctl list-jobs
 systemd-cgls

 Thanks Michael,

 the output of each command is attached in the corresponding file.


 Looks like a some kind of bug in schroot to me, which causes a dead lock.
 I assume, if you disable the schroot.service (update-rc.d disable
 schroot), the problem is gone?
 
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677811
 looks related, although this bug is marked as fixed.
 
 You could try disabling the /etc/schroot/setup.d/15binfmt  hook script,
 to narrow down the problem.
 
 

Dear Michael,

Thanks, indeed it does look like a race condition between this script
and systemd support for binfmt. I guess the schroot service should
somehow depend on binfmt support to have terminated.

It is not so easy for me to check though, because the boot process tends
to run smoothly when I reboot several times in a row. Any solution will
take several days at least to be confirmed. I rebooted several times
with no problem after disabling schroot, but then again with no problem
after re-enabling it.

Kind regards, Thibaut.


-- 




signature.asc
Description: OpenPGP digital signature


Bug#781151: boot often stalls with two A start job is running messages: binfmt and schroot sessions

2015-03-25 Thread Michael Biebl
Am 25.03.2015 um 17:10 schrieb Michael Biebl:
 Am 25.03.2015 um 16:55 schrieb Thibaut Paumard:
 Le 25/03/2015 14:52, Michael Biebl a écrit :


 If your system hang during boot, please switch to tty9, then save the
 output of
 ps aux
 systemctl list-jobs
 systemd-cgls

 Thanks Michael,

 the output of each command is attached in the corresponding file.

 
 Looks like a some kind of bug in schroot to me, which causes a dead lock.
 I assume, if you disable the schroot.service (update-rc.d disable
 schroot), the problem is gone?

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677811
looks related, although this bug is marked as fixed.

You could try disabling the /etc/schroot/setup.d/15binfmt  hook script,
to narrow down the problem.


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature