Re: [systemd-devel] Too little information is shown when system enters emergency mode

2012-10-27 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Oct 23, 2012 at 04:58:46PM +0200, Lennart Poettering wrote:
 On Sun, 21.10.12 15:59, Andrey Borzenkov (arvidj...@gmail.com) wrote:
 
  Welcome to emergency mode. Use systemctl default or ^D to enter default 
  mode.
  Give root password for login:
 
 systemd 195 will now also mention journalctl -b in this
 message. Originally this was only in the rescue mode, because of the
Thanks! I tweaked the message now a bit more to take into account
what sulogin says by itself.

 assumption that if you boot directly to emergency mode then no logs
 would be in the journal, and hence no point in recommending this
 command. However, after all most of the times people will end up in
 emergency mode is when file systems not showing up where journald *is*
 actually running and includes the desired, useful information.
 
  Started /boot/efi  [  
  OK  ]
  Dependency failed. Aborted start of /mnt   [ 
  ABORT]
  Dependency failed. Aborted start of Login Service  [ 
  ABORT]
  Dependency failed. Aborted start of D-Bus System Message Bus   [ 
  ABORT]
  Welcome to emergency mode. Use systemctl default or ^D to enter default 
  mode.
 
 Hmm, we definitely should show the initial unit that failed in this
 output. 
 
 Can you restest with 195 please? If you find that there's information
 missing in journalctl -b or in the status output, then please file a
 bug, we really should place useful information at both.
It _is_ already better, the output is more complete and includes the
failing device, so that part is fine.

But there's one fundamental problem: the message suggests 'systemctl
default', but 'systemctl default' will fail again, unless the error
went away by itself, which is not going to happen in case of a missing
device. But it's a tough nut to crack.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Too little information is shown when system enters emergency mode

2012-10-23 Thread Lennart Poettering
On Sun, 21.10.12 15:59, Andrey Borzenkov (arvidj...@gmail.com) wrote:

 Welcome to emergency mode. Use systemctl default or ^D to enter default 
 mode.
 Give root password for login:

systemd 195 will now also mention journalctl -b in this
message. Originally this was only in the rescue mode, because of the
assumption that if you boot directly to emergency mode then no logs
would be in the journal, and hence no point in recommending this
command. However, after all most of the times people will end up in
emergency mode is when file systems not showing up where journald *is*
actually running and includes the desired, useful information.

 Started /boot/efi  [  OK  
 ]
 Dependency failed. Aborted start of /mnt   [ 
 ABORT]
 Dependency failed. Aborted start of Login Service  [ 
 ABORT]
 Dependency failed. Aborted start of D-Bus System Message Bus   [ 
 ABORT]
 Welcome to emergency mode. Use systemctl default or ^D to enter default 
 mode.

Hmm, we definitely should show the initial unit that failed in this
output. 

Can you restest with 195 please? If you find that there's information
missing in journalctl -b or in the status output, then please file a
bug, we really should place useful information at both.

Thanks,

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Too little information is shown when system enters emergency mode

2012-10-23 Thread Lennart Poettering
On Mon, 22.10.12 11:41, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:

  Please note the version of systemd (v44) in openSUSE doesn't have all
  the needed bits to always display on the screen why dependency failed
  (and you end up in emergency mode). This is fixed with systemd 195 which
  should land in Factory pretty soon.
 As an experiment, I tried the same (add '/dev/sda9 /mnt' to /etc/fstab)
 under v194-138-g20f59e4, i.e. very recent. After rebooting all I see is
 the emergency mode prompt.
 
 Now the problem is that 'dev-sda9.device' is loaded  inactive(dead).
 This means that it doesn't show up in --failed. So 'systemctl' with
 various options doesn't show what failed in an easy to recognize way.
 
 OTOH 'journalctl -b' is immensly useful:
 red
 Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device.
 Dependency failed for /mnt.
 Dependency failed for Local File Systemds.
 ...
 /red
 
 This is great, and it would be really nice to expose it more. I guess that
 the first change would be to advertise 'journalctl -b' in the emergency
 mode intro.

I added this yesterday, it is included in 195.

 Would be nice to also un-eescape the device name: Timed out waiting
 for device /dev/sda9 should be much more understandable for the
 non-systemd-knowledgable person than Timed out waiting for device
 dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device.

This is a cool idea. Adding the inverse of unit_name_mangle(), and
showing that if we have a failure on a unit with no sane description
string and sounds like an awesome idea (though we probably should show
the mangled name as well, dunno, might be useful for proficient folks).

The fix might actually be simple, we could transparently do this in
unit_description() on access.

Added this to the TODO list.

 But it would be best to provide a short status like:
 
 systemd was trying to reach target 'default.target'
 (which points to 'Multi-User', multi-user.target), but failed,
 because device /dev/sda9 is missing 
 (dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device).
 in turn this caused '/mnt' mount to fail (mnt.mount),
 in turn this caused 'Local File Systems' target to fail (local-fs.target),
 ...
 in turn this caused 'Multi User' target to fail (multi-user.target).
 

Uh, This is quite hard to do, since we don't track the reasons so
much. I wonder if a simple list of failure and failure-due-to-dep
wouldn't be sufficient, rather than prose here...

 And a hint how to e.g. temporarily disable the failing mount point. I
 admit that I'm not sure what is the proper way, short of editing
 /etc/fstab and rebooting.

I wonder if this is something to handle with the explanation database
(aka message catalogue) I want to add to the journal. This would
optionally augment log entries with static info from the vendor about
the issue, with longer help, links and support contact. All this would
be keyed off the message ID of a message, and be translated to the local
language of the user. My idea is to expose this with journalctl -e or
so, where every log line gets this data attched to it, in a block below
each line, where it is available.

Using the explanation database is a great way to handle this and more
errors and get translation and links for free. 

 Would be nice if this output could be easily retrieved again. If the
 user starts looking at the system, and then forgets what exactly
 failed, he or she should be able to repeat this short diagnosis.

Sounds like a job for journalct -ebp err or so, if we have the
explanation database?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Too little information is shown when system enters emergency mode

2012-10-22 Thread Zbigniew Jędrzejewski-Szmek
On Sun, Oct 21, 2012 at 03:13:22PM +0200, Frederic Crozat wrote:
 Le dimanche 21 octobre 2012 à 15:59 +0400, Andrey Borzenkov a écrit :
  This issue comes up relatively often on openSUSE forums. Users
  complaint that when system drops in emergency, there is nothing that
  would explain user why it happened or what to do. Typical situation is
  https://bugzilla.novell.com/show_bug.cgi?id=782904.
  
  openSUSE by default is using splash quiet kernel parameter. So the
  first issue is, interpretation of quite changed in systemd. Now it
  means suppress all output of systemd services. As result we have the
  following (even without boot splash involved) when some device in
  fstab is missing:
  
  doing fast boot
  Creating device nodes with udev
  Waiting for device /dev/root to appear:  ok
  fsck from util-linux 2.21.2
  [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda6
  /dev/sda6: clean, 31805/705744 files, 344231/2819584 blocks
  fsck succeeded. Mounting root device read-write.
  Mounting root /dev/root
  mount -o rw,acl,user_xattr -t ext4 /dev/root /root
  [   10.706463] piix4_smbus :00:07.3: SMBus base address
  uninitialized - upgrade BIOS or use force_addr=0xaddr
  Welcome to emergency mode. Use systemctl default or ^D to enter default 
  mode.
  Give root password for login:
  
  This is literally everything that user sees on console. My first
  reaction was to add systemctl --failed as pre-exec to emergency.
  Unfortunately:
  
  linux-q652:~ # systemctl --no-pager --failed
  UNIT LOAD   ACTIVE SUB JOB DESCRIPTION
  
  LOAD   = Reflects whether the unit definition was properly loaded.
  ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
  SUB= The low-level unit activation state, values depend on unit type.
  JOB= Pending job for the unit.
  
  0 units listed. Pass --all to see inactive units, too.
  
  Everything is fine. This is understandable - we are now in different
  transaction and as far as I understand, systemctl --failed shows only
  results of currently active transaction (am I right?).
  
  Only when quiet is turned off, do I really see something (again -
  assuming we do not have bootsplash ...)
  
  Started /boot/efi  [  
  OK  ]
  Dependency failed. Aborted start of /mnt   [ 
  ABORT]
  Dependency failed. Aborted start of Login Service  [ 
  ABORT]
  Dependency failed. Aborted start of D-Bus System Message Bus   [ 
  ABORT]
  Welcome to emergency mode. Use systemctl default or ^D to enter default 
  mode.
  
  So right now if anything goes extremely wrong we have baffled user
  sitting before emergency mode prompt and not knowing what to do
  next. Is it considered a problem by someone else? Would it be feasible
  to turn off quiet and bootsplash immediately after any unit failed
  during system boot?
 
 Please note the version of systemd (v44) in openSUSE doesn't have all
 the needed bits to always display on the screen why dependency failed
 (and you end up in emergency mode). This is fixed with systemd 195 which
 should land in Factory pretty soon.
As an experiment, I tried the same (add '/dev/sda9 /mnt' to /etc/fstab)
under v194-138-g20f59e4, i.e. very recent. After rebooting all I see is
the emergency mode prompt.

Now the problem is that 'dev-sda9.device' is loaded  inactive(dead).
This means that it doesn't show up in --failed. So 'systemctl' with
various options doesn't show what failed in an easy to recognize way.

OTOH 'journalctl -b' is immensly useful:
red
Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device.
Dependency failed for /mnt.
Dependency failed for Local File Systemds.
...
/red

This is great, and it would be really nice to expose it more. I guess that
the first change would be to advertise 'journalctl -b' in the emergency
mode intro.

Would be nice to also un-eescape the device name: Timed out waiting
for device /dev/sda9 should be much more understandable for the
non-systemd-knowledgable person than Timed out waiting for device
dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device.

But it would be best to provide a short status like:

systemd was trying to reach target 'default.target'
(which points to 'Multi-User', multi-user.target), but failed,
because device /dev/sda9 is missing 
(dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device).
in turn this caused '/mnt' mount to fail (mnt.mount),
in turn this caused 'Local File Systems' target to fail (local-fs.target),
...
in turn this caused 'Multi User' target to fail (multi-user.target).


And a hint how to e.g. temporarily disable the failing mount point. I
admit that I'm not sure what is the proper way, short of editing
/etc/fstab and rebooting.

Would be nice if this output could be easily retrieved again. If the
user starts looking at the system, and then forgets what exactly
failed, he or she should be able to repeat this short diagnosis.

 However, 

[systemd-devel] Too little information is shown when system enters emergency mode

2012-10-21 Thread Andrey Borzenkov
This issue comes up relatively often on openSUSE forums. Users
complaint that when system drops in emergency, there is nothing that
would explain user why it happened or what to do. Typical situation is
https://bugzilla.novell.com/show_bug.cgi?id=782904.

openSUSE by default is using splash quiet kernel parameter. So the
first issue is, interpretation of quite changed in systemd. Now it
means suppress all output of systemd services. As result we have the
following (even without boot splash involved) when some device in
fstab is missing:

doing fast boot
Creating device nodes with udev
Waiting for device /dev/root to appear:  ok
fsck from util-linux 2.21.2
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda6
/dev/sda6: clean, 31805/705744 files, 344231/2819584 blocks
fsck succeeded. Mounting root device read-write.
Mounting root /dev/root
mount -o rw,acl,user_xattr -t ext4 /dev/root /root
[   10.706463] piix4_smbus :00:07.3: SMBus base address
uninitialized - upgrade BIOS or use force_addr=0xaddr
Welcome to emergency mode. Use systemctl default or ^D to enter default mode.
Give root password for login:

This is literally everything that user sees on console. My first
reaction was to add systemctl --failed as pre-exec to emergency.
Unfortunately:

linux-q652:~ # systemctl --no-pager --failed
UNIT LOAD   ACTIVE SUB JOB DESCRIPTION

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB= The low-level unit activation state, values depend on unit type.
JOB= Pending job for the unit.

0 units listed. Pass --all to see inactive units, too.

Everything is fine. This is understandable - we are now in different
transaction and as far as I understand, systemctl --failed shows only
results of currently active transaction (am I right?).

Only when quiet is turned off, do I really see something (again -
assuming we do not have bootsplash ...)

Started /boot/efi  [  OK  ]
Dependency failed. Aborted start of /mnt   [ ABORT]
Dependency failed. Aborted start of Login Service  [ ABORT]
Dependency failed. Aborted start of D-Bus System Message Bus   [ ABORT]
Welcome to emergency mode. Use systemctl default or ^D to enter default mode.

So right now if anything goes extremely wrong we have baffled user
sitting before emergency mode prompt and not knowing what to do
next. Is it considered a problem by someone else? Would it be feasible
to turn off quiet and bootsplash immediately after any unit failed
during system boot?

Thank you

-andrey
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Too little information is shown when system enters emergency mode

2012-10-21 Thread Frederic Crozat
Le dimanche 21 octobre 2012 à 15:59 +0400, Andrey Borzenkov a écrit :
 This issue comes up relatively often on openSUSE forums. Users
 complaint that when system drops in emergency, there is nothing that
 would explain user why it happened or what to do. Typical situation is
 https://bugzilla.novell.com/show_bug.cgi?id=782904.
 
 openSUSE by default is using splash quiet kernel parameter. So the
 first issue is, interpretation of quite changed in systemd. Now it
 means suppress all output of systemd services. As result we have the
 following (even without boot splash involved) when some device in
 fstab is missing:
 
 doing fast boot
 Creating device nodes with udev
 Waiting for device /dev/root to appear:  ok
 fsck from util-linux 2.21.2
 [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda6
 /dev/sda6: clean, 31805/705744 files, 344231/2819584 blocks
 fsck succeeded. Mounting root device read-write.
 Mounting root /dev/root
 mount -o rw,acl,user_xattr -t ext4 /dev/root /root
 [   10.706463] piix4_smbus :00:07.3: SMBus base address
 uninitialized - upgrade BIOS or use force_addr=0xaddr
 Welcome to emergency mode. Use systemctl default or ^D to enter default 
 mode.
 Give root password for login:
 
 This is literally everything that user sees on console. My first
 reaction was to add systemctl --failed as pre-exec to emergency.
 Unfortunately:
 
 linux-q652:~ # systemctl --no-pager --failed
 UNIT LOAD   ACTIVE SUB JOB DESCRIPTION
 
 LOAD   = Reflects whether the unit definition was properly loaded.
 ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
 SUB= The low-level unit activation state, values depend on unit type.
 JOB= Pending job for the unit.
 
 0 units listed. Pass --all to see inactive units, too.
 
 Everything is fine. This is understandable - we are now in different
 transaction and as far as I understand, systemctl --failed shows only
 results of currently active transaction (am I right?).
 
 Only when quiet is turned off, do I really see something (again -
 assuming we do not have bootsplash ...)
 
 Started /boot/efi  [  OK  
 ]
 Dependency failed. Aborted start of /mnt   [ 
 ABORT]
 Dependency failed. Aborted start of Login Service  [ 
 ABORT]
 Dependency failed. Aborted start of D-Bus System Message Bus   [ 
 ABORT]
 Welcome to emergency mode. Use systemctl default or ^D to enter default 
 mode.
 
 So right now if anything goes extremely wrong we have baffled user
 sitting before emergency mode prompt and not knowing what to do
 next. Is it considered a problem by someone else? Would it be feasible
 to turn off quiet and bootsplash immediately after any unit failed
 during system boot?

Please note the version of systemd (v44) in openSUSE doesn't have all
the needed bits to always display on the screen why dependency failed
(and you end up in emergency mode). This is fixed with systemd 195 which
should land in Factory pretty soon.

However, on a more general basis (not openSUSE specific), I think we
should add some special handly in systemd for a kernel command line
option (for instance debug or debug=1), which would expand into
systemd.log_level=debug systemd.log_target=kmsg). This would be much
easier to tell users when debug is needed and we could also add an
additional menu entry in bootloader (under the advanced settings) so
this setting would be always available, if needed.

-- 
Frederic Crozat fcro...@suse.com
SUSE

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel