Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-04-28 Thread Lennart Poettering
On Mon, 27.04.15 17:49, Christian Hesse (l...@eworm.de) wrote:

  As for the actual lockup, I'm afraid I don't understand at all
  what is happening (I'm anot familiar at all with how journald
  interacts with other services and D-Bus/logind).
  
  So from my POV my best recommendation would be to revert commit
  13790add4 upstream for now until this gets understood and fixed
  properly, especially if/when version 220 should be released. Breaking
  booting is much worse than not being able to restart journald.
 
 Any news about this one?
 Looks like everybody is waiting for a fix and nobody is working on it...

See:

http://lists.freedesktop.org/archives/systemd-devel/2015-April/031327.html

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-04-27 Thread Christian Hesse
Martin Pitt martin.p...@ubuntu.com on Sat, 2015/04/11 10:38:
 Hello Tobias,
 
 Tobias Hunger [2015-04-11  2:17 +0200]:
  did you make any progress with this bug? Apparently the same issue is
  blocking systemd-219 from getting into arch linux (
  https://bugs.archlinux.org/task/44016 ), so this seems to be a
  wide-spread issue. Is anyone taking a serious look into this issue?
 
 Sorry, no, I was pretty busy with making systemd work good enough
 for the impending Debian and Ubuntu releases. A few weeks ago I mostly
 wanted to see whether this was specific to Debian/Ubuntu somehow, and
 I couldn't reproduce it in a VM with Fedora 21 plus dbus and systemd
 from rawhide. But in the meantime we got plenty of confirmations that
 it affects Fedora and now Arch, so I don't believe this is actually
 related to d-bus or something such.
 
 As for the actual lockup, I'm afraid I don't understand at all
 what is happening (I'm anot familiar at all with how journald
 interacts with other services and D-Bus/logind).
 
 So from my POV my best recommendation would be to revert commit
 13790add4 upstream for now until this gets understood and fixed
 properly, especially if/when version 220 should be released. Breaking
 booting is much worse than not being able to restart journald.

Any news about this one?
Looks like everybody is waiting for a fix and nobody is working on it...

I do not know how to debug this. If I can help let me know.
-- 
main(a){char*c=/*Schoene Gruesse */B?IJj;MEH
CX:;,b;for(a/*Chris   get my mail address:*/=0;b=c[a++];)
putchar(b-1/(/*   gcc -o sig sig.c  ./sig*/b/42*2-3)*42);}


pgpCZgV7v7BLX.pgp
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-04-11 Thread Martin Pitt
Hello Tobias,

Tobias Hunger [2015-04-11  2:17 +0200]:
 did you make any progress with this bug? Apparently the same issue is
 blocking systemd-219 from getting into arch linux (
 https://bugs.archlinux.org/task/44016 ), so this seems to be a
 wide-spread issue. Is anyone taking a serious look into this issue?

Sorry, no, I was pretty busy with making systemd work good enough
for the impending Debian and Ubuntu releases. A few weeks ago I mostly
wanted to see whether this was specific to Debian/Ubuntu somehow, and
I couldn't reproduce it in a VM with Fedora 21 plus dbus and systemd
from rawhide. But in the meantime we got plenty of confirmations that
it affects Fedora and now Arch, so I don't believe this is actually
related to d-bus or something such.

As for the actual lockup, I'm afraid I don't understand at all
what is happening (I'm anot familiar at all with how journald
interacts with other services and D-Bus/logind).

So from my POV my best recommendation would be to revert commit
13790add4 upstream for now until this gets understood and fixed
properly, especially if/when version 220 should be released. Breaking
booting is much worse than not being able to restart journald.

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-04-10 Thread Tobias Hunger
Hi Martin,

did you make any progress with this bug? Apparently the same issue is
blocking systemd-219 from getting into arch linux (
https://bugs.archlinux.org/task/44016 ), so this seems to be a
wide-spread issue. Is anyone taking a serious look into this issue?

Best Regards,
Tobias


On Mon, Mar 2, 2015 at 11:29 AM, Martin Pitt martin.p...@ubuntu.com wrote:
 Hey Lennart,

 Lennart Poettering [2015-02-28 13:05 +0100]:
 Any idea about the details of this?

 For the record, I'm still working on this on-and-off (I got some other
 urgent things to work on, though). It took me a while to install
 Fedora, as the rawhide images and upgrade are both broken ATM, but I
 now have F21 with rawhide's dbus and systemd in a VM. That's fairly
 close to Debian sid plus systemd from experimental. We did get reports
 about that hang under Debian, so it's at least likely that Fedora is
 affected too. But so far my simple reproducer doesn't trigger it under
 either, so I need to keep digging and experimenting, and probably go
 with the full reboot test iterations.  Of course I'll follow up here
 once I find out more.

 Martin
 --
 Martin Pitt| http://www.piware.de
 Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-03-02 Thread Martin Pitt
Hey Lennart,

Lennart Poettering [2015-02-28 13:05 +0100]:
 Any idea about the details of this?

For the record, I'm still working on this on-and-off (I got some other
urgent things to work on, though). It took me a while to install
Fedora, as the rawhide images and upgrade are both broken ATM, but I
now have F21 with rawhide's dbus and systemd in a VM. That's fairly
close to Debian sid plus systemd from experimental. We did get reports
about that hang under Debian, so it's at least likely that Fedora is
affected too. But so far my simple reproducer doesn't trigger it under
either, so I need to keep digging and experimenting, and probably go
with the full reboot test iterations.  Of course I'll follow up here
once I find out more.

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-28 Thread Lennart Poettering
On Sun, 22.02.15 17:55, Martin Pitt (martin.p...@ubuntu.com) wrote:

 Lennart Poettering [2015-02-20 17:02 +0100]:
  To me this appears as if dbus is hanging for some reason. Have you
  checked what dbus is doing?
 
 D-Bus itself seems to be fine. There are services on it, busctl works,
 etc.
 
 Anyway, I now have a capable enough arsenal to reproduce that hang
 fully automatically and a git bisect run script, which after a few
 hours of grinding spat out that this is the culprit:
 
   http://cgit.freedesktop.org/systemd/systemd/commit/?id=13790add4bf64
   (journald: allow restarting journald without losing stream connections)
 
 and indeed reverting that on top of current git master (reverts
 cleanly) makes things work perfectly again.
 
 I haven't drilled down into the patch itself yet, that's not something
 I want to start doing on a Sunday :-)

Any idea about the details of this? Is this reproducible with
unpatched 219 (in particular without the fsckd patches applied)

I have never seen this issue, can you tell me how to reproduce this on
my machine?

Does your boot process restart journald or so?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-22 Thread Martin Pitt
Lennart Poettering [2015-02-20 17:02 +0100]:
 To me this appears as if dbus is hanging for some reason. Have you
 checked what dbus is doing?

D-Bus itself seems to be fine. There are services on it, busctl works,
etc.

Anyway, I now have a capable enough arsenal to reproduce that hang
fully automatically and a git bisect run script, which after a few
hours of grinding spat out that this is the culprit:

  http://cgit.freedesktop.org/systemd/systemd/commit/?id=13790add4bf64
  (journald: allow restarting journald without losing stream connections)

and indeed reverting that on top of current git master (reverts
cleanly) makes things work perfectly again.

I haven't drilled down into the patch itself yet, that's not something
I want to start doing on a Sunday :-)

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-20 Thread Martin Pitt
Hello all,

Since we updated to 219 in Ubuntu, several people reported boot
failures. Booting hangs a long time after starting D-Bus, in the
journal you get a lot of error messages like

   systemd[1]: Failed to register match for Disconnected message: Connection 
timed out
   systemd-logind[749]: Failed to fully start up daemon: Connection timed out
   dbus[800]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': 
timed out

polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi,
etc.) to get stuck in an eternal retry loop.

Unfortunately reproducing this is a real nuisance, classic heisenbug.
I'm now able to trigger it (sometimes) in a VM, but I still haven't
found a reliable recipe for reproducing it, so that bisecting just
takes ages.

I'm keeping debug log, notes, and progress in
https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for
other distros in case they also get reports like this, to shortcut the
debugging exercise (I already wasted 7 hours on this, and I'm not even
close to the solution). Quite surprisingly it's somewhere in journald.
Running 218 with journald from 219 causes the hang, 219 with journald
from 218 is fine.

Martin

-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-20 Thread Michael Biebl
2015-02-20 15:36 GMT+01:00 Martin Pitt martin.p...@ubuntu.com:
 Hello all,

 Since we updated to 219 in Ubuntu, several people reported boot
 failures. Booting hangs a long time after starting D-Bus, in the
 journal you get a lot of error messages like

systemd[1]: Failed to register match for Disconnected message: Connection 
 timed out
systemd-logind[749]: Failed to fully start up daemon: Connection timed out
dbus[800]: [system] Failed to activate service 
 'org.freedesktop.PolicyKit1': timed out

 polkitd isn't running. This causes lots of jobs (logind, NetworkManager, 
 avahi,
 etc.) to get stuck in an eternal retry loop.

 Unfortunately reproducing this is a real nuisance, classic heisenbug.
 I'm now able to trigger it (sometimes) in a VM, but I still haven't
 found a reliable recipe for reproducing it, so that bisecting just
 takes ages.

 I'm keeping debug log, notes, and progress in
 https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for
 other distros in case they also get reports like this, to shortcut the
 debugging exercise (I already wasted 7 hours on this, and I'm not even
 close to the solution). Quite surprisingly it's somewhere in journald.
 Running 218 with journald from 219 causes the hang, 219 with journald
 from 218 is fine.


I noticed this as well. Interestingly, it only ever happened after
applying the fsckd patches  and running with plymouth enabled.

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-20 Thread Didier Roche

Le 20/02/2015 15:41, Michael Biebl a écrit :

2015-02-20 15:36 GMT+01:00 Martin Pitt martin.p...@ubuntu.com:

Hello all,

Since we updated to 219 in Ubuntu, several people reported boot
failures. Booting hangs a long time after starting D-Bus, in the
journal you get a lot of error messages like

systemd[1]: Failed to register match for Disconnected message: Connection 
timed out
systemd-logind[749]: Failed to fully start up daemon: Connection timed out
dbus[800]: [system] Failed to activate service 
'org.freedesktop.PolicyKit1': timed out

polkitd isn't running. This causes lots of jobs (logind, NetworkManager, avahi,
etc.) to get stuck in an eternal retry loop.

Unfortunately reproducing this is a real nuisance, classic heisenbug.
I'm now able to trigger it (sometimes) in a VM, but I still haven't
found a reliable recipe for reproducing it, so that bisecting just
takes ages.

I'm keeping debug log, notes, and progress in
https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for
other distros in case they also get reports like this, to shortcut the
debugging exercise (I already wasted 7 hours on this, and I'm not even
close to the solution). Quite surprisingly it's somewhere in journald.
Running 218 with journald from 219 causes the hang, 219 with journald
from 218 is fine.


I noticed this as well. Interestingly, it only ever happened after
applying the fsckd patches  and running with plymouth enabled.



We get it with quiet splash removed as well. It just that it's random 
on the machine load. We already ruled out the fsckd patch yesterday, but 
I did retry today again after this comment on my own vms and with 
systemd/udev/… ubuntu package 218-8ubuntu2 which doesn't contain the 
fsckd patch (and have the 218 and 219 systemd-fsck writing to 
/dev/console) + systemd-journal binary copied from 219-1ubuntu1, I was 
able to reproduce the hang after 15 boots.


Cheers,
Didier
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-20 Thread Zbigniew Jędrzejewski-Szmek
On Fri, Feb 20, 2015 at 04:01:34PM +0100, Didier Roche wrote:
 Le 20/02/2015 15:41, Michael Biebl a écrit :
 2015-02-20 15:36 GMT+01:00 Martin Pitt martin.p...@ubuntu.com:
 Hello all,
 
 Since we updated to 219 in Ubuntu, several people reported boot
 failures. Booting hangs a long time after starting D-Bus, in the
 journal you get a lot of error messages like
 
 systemd[1]: Failed to register match for Disconnected message: 
  Connection timed out
 systemd-logind[749]: Failed to fully start up daemon: Connection timed 
  out
 dbus[800]: [system] Failed to activate service 
  'org.freedesktop.PolicyKit1': timed out
 
 polkitd isn't running. This causes lots of jobs (logind, NetworkManager, 
 avahi,
 etc.) to get stuck in an eternal retry loop.
 
 Unfortunately reproducing this is a real nuisance, classic heisenbug.
 I'm now able to trigger it (sometimes) in a VM, but I still haven't
 found a reliable recipe for reproducing it, so that bisecting just
 takes ages.
 
 I'm keeping debug log, notes, and progress in
 https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for
 other distros in case they also get reports like this, to shortcut the
 debugging exercise (I already wasted 7 hours on this, and I'm not even
 close to the solution). Quite surprisingly it's somewhere in journald.
 Running 218 with journald from 219 causes the hang, 219 with journald
 from 218 is fine.
 
 I noticed this as well. Interestingly, it only ever happened after
 applying the fsckd patches  and running with plymouth enabled.
 
 
 We get it with quiet splash removed as well. It just that it's
 random on the machine load. We already ruled out the fsckd patch
 yesterday, but I did retry today again after this comment on my own
 vms and with systemd/udev/… ubuntu package 218-8ubuntu2 which
 doesn't contain the fsckd patch (and have the 218 and 219
 systemd-fsck writing to /dev/console) + systemd-journal binary
 copied from 219-1ubuntu1, I was able to reproduce the hang after 15
 boots.

Anything interesetingif you attach gdb to systemd-jouranld?Can you
paste bt and the Server variable (IIRC, it's *s in main)? How many
open fds does sd-journald have?

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-20 Thread Lennart Poettering
On Fri, 20.02.15 15:36, Martin Pitt (martin.p...@ubuntu.com) wrote:

 Hello all,
 
 Since we updated to 219 in Ubuntu, several people reported boot
 failures. Booting hangs a long time after starting D-Bus, in the
 journal you get a lot of error messages like
 
systemd[1]: Failed to register match for Disconnected message: Connection 
 timed out
systemd-logind[749]: Failed to fully start up daemon: Connection timed out
dbus[800]: [system] Failed to activate service 
 'org.freedesktop.PolicyKit1': timed out
 
 polkitd isn't running. This causes lots of jobs (logind, NetworkManager, 
 avahi,
 etc.) to get stuck in an eternal retry loop.
 
 Unfortunately reproducing this is a real nuisance, classic heisenbug.
 I'm now able to trigger it (sometimes) in a VM, but I still haven't
 found a reliable recipe for reproducing it, so that bisecting just
 takes ages.
 
 I'm keeping debug log, notes, and progress in
 https://launchpad.net/bugs/1423811 FTR. This is mostly a heads-up for
 other distros in case they also get reports like this, to shortcut the
 debugging exercise (I already wasted 7 hours on this, and I'm not even
 close to the solution). Quite surprisingly it's somewhere in journald.
 Running 218 with journald from 219 causes the hang, 219 with journald
 from 218 is fine.

Umm, please don't mix things like this... 

To me this appears as if dbus is hanging for some reason. Have you
checked what dbus is doing?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] heads-up: chasing journal(?) related regression in 219 causing boot hang/fail

2015-02-20 Thread Martin Pitt
Hey Zbigniew,

Zbigniew Jędrzejewski-Szmek [2015-02-20 16:08 +0100]:
 Anything interesetingif you attach gdb to systemd-jouranld?Can you
 paste bt

It's in __epoll_wait_nocancel (glibc) →  sd_event_wait() -
sd_event_run() - main(), nothing surprising I'd say.

 and the Server variable (IIRC, it's *s in main)?

http://imagebin.org/33

gdb and other commands are a bit awkward to use in the debug shell, as
systemd keeps painting its (1 of 5) a startup job is running...
messages over the screen. (Is there a trick to stop that? I don't want
to boot with quiet, that makes it much harder to see if/when it's
failing)

 How many open fds does sd-journald have?

Highest fd is 25, so I'd say trivial.

Thanks,

Martin

-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel