Re: [systemd-devel] Too little information is shown when system enters emergency mode
On Tue, Oct 23, 2012 at 04:58:46PM +0200, Lennart Poettering wrote: On Sun, 21.10.12 15:59, Andrey Borzenkov (arvidj...@gmail.com) wrote: Welcome to emergency mode. Use systemctl default or ^D to enter default mode. Give root password for login: systemd 195 will now also mention journalctl -b in this message. Originally this was only in the rescue mode, because of the Thanks! I tweaked the message now a bit more to take into account what sulogin says by itself. assumption that if you boot directly to emergency mode then no logs would be in the journal, and hence no point in recommending this command. However, after all most of the times people will end up in emergency mode is when file systems not showing up where journald *is* actually running and includes the desired, useful information. Started /boot/efi [ OK ] Dependency failed. Aborted start of /mnt [ ABORT] Dependency failed. Aborted start of Login Service [ ABORT] Dependency failed. Aborted start of D-Bus System Message Bus [ ABORT] Welcome to emergency mode. Use systemctl default or ^D to enter default mode. Hmm, we definitely should show the initial unit that failed in this output. Can you restest with 195 please? If you find that there's information missing in journalctl -b or in the status output, then please file a bug, we really should place useful information at both. It _is_ already better, the output is more complete and includes the failing device, so that part is fine. But there's one fundamental problem: the message suggests 'systemctl default', but 'systemctl default' will fail again, unless the error went away by itself, which is not going to happen in case of a missing device. But it's a tough nut to crack. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Too little information is shown when system enters emergency mode
On Sun, 21.10.12 15:59, Andrey Borzenkov (arvidj...@gmail.com) wrote: Welcome to emergency mode. Use systemctl default or ^D to enter default mode. Give root password for login: systemd 195 will now also mention journalctl -b in this message. Originally this was only in the rescue mode, because of the assumption that if you boot directly to emergency mode then no logs would be in the journal, and hence no point in recommending this command. However, after all most of the times people will end up in emergency mode is when file systems not showing up where journald *is* actually running and includes the desired, useful information. Started /boot/efi [ OK ] Dependency failed. Aborted start of /mnt [ ABORT] Dependency failed. Aborted start of Login Service [ ABORT] Dependency failed. Aborted start of D-Bus System Message Bus [ ABORT] Welcome to emergency mode. Use systemctl default or ^D to enter default mode. Hmm, we definitely should show the initial unit that failed in this output. Can you restest with 195 please? If you find that there's information missing in journalctl -b or in the status output, then please file a bug, we really should place useful information at both. Thanks, Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Too little information is shown when system enters emergency mode
On Mon, 22.10.12 11:41, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: Please note the version of systemd (v44) in openSUSE doesn't have all the needed bits to always display on the screen why dependency failed (and you end up in emergency mode). This is fixed with systemd 195 which should land in Factory pretty soon. As an experiment, I tried the same (add '/dev/sda9 /mnt' to /etc/fstab) under v194-138-g20f59e4, i.e. very recent. After rebooting all I see is the emergency mode prompt. Now the problem is that 'dev-sda9.device' is loaded inactive(dead). This means that it doesn't show up in --failed. So 'systemctl' with various options doesn't show what failed in an easy to recognize way. OTOH 'journalctl -b' is immensly useful: red Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device. Dependency failed for /mnt. Dependency failed for Local File Systemds. ... /red This is great, and it would be really nice to expose it more. I guess that the first change would be to advertise 'journalctl -b' in the emergency mode intro. I added this yesterday, it is included in 195. Would be nice to also un-eescape the device name: Timed out waiting for device /dev/sda9 should be much more understandable for the non-systemd-knowledgable person than Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device. This is a cool idea. Adding the inverse of unit_name_mangle(), and showing that if we have a failure on a unit with no sane description string and sounds like an awesome idea (though we probably should show the mangled name as well, dunno, might be useful for proficient folks). The fix might actually be simple, we could transparently do this in unit_description() on access. Added this to the TODO list. But it would be best to provide a short status like: systemd was trying to reach target 'default.target' (which points to 'Multi-User', multi-user.target), but failed, because device /dev/sda9 is missing (dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device). in turn this caused '/mnt' mount to fail (mnt.mount), in turn this caused 'Local File Systems' target to fail (local-fs.target), ... in turn this caused 'Multi User' target to fail (multi-user.target). Uh, This is quite hard to do, since we don't track the reasons so much. I wonder if a simple list of failure and failure-due-to-dep wouldn't be sufficient, rather than prose here... And a hint how to e.g. temporarily disable the failing mount point. I admit that I'm not sure what is the proper way, short of editing /etc/fstab and rebooting. I wonder if this is something to handle with the explanation database (aka message catalogue) I want to add to the journal. This would optionally augment log entries with static info from the vendor about the issue, with longer help, links and support contact. All this would be keyed off the message ID of a message, and be translated to the local language of the user. My idea is to expose this with journalctl -e or so, where every log line gets this data attched to it, in a block below each line, where it is available. Using the explanation database is a great way to handle this and more errors and get translation and links for free. Would be nice if this output could be easily retrieved again. If the user starts looking at the system, and then forgets what exactly failed, he or she should be able to repeat this short diagnosis. Sounds like a job for journalct -ebp err or so, if we have the explanation database? Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Too little information is shown when system enters emergency mode
On Sun, Oct 21, 2012 at 03:13:22PM +0200, Frederic Crozat wrote: Le dimanche 21 octobre 2012 à 15:59 +0400, Andrey Borzenkov a écrit : This issue comes up relatively often on openSUSE forums. Users complaint that when system drops in emergency, there is nothing that would explain user why it happened or what to do. Typical situation is https://bugzilla.novell.com/show_bug.cgi?id=782904. openSUSE by default is using splash quiet kernel parameter. So the first issue is, interpretation of quite changed in systemd. Now it means suppress all output of systemd services. As result we have the following (even without boot splash involved) when some device in fstab is missing: doing fast boot Creating device nodes with udev Waiting for device /dev/root to appear: ok fsck from util-linux 2.21.2 [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda6 /dev/sda6: clean, 31805/705744 files, 344231/2819584 blocks fsck succeeded. Mounting root device read-write. Mounting root /dev/root mount -o rw,acl,user_xattr -t ext4 /dev/root /root [ 10.706463] piix4_smbus :00:07.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr Welcome to emergency mode. Use systemctl default or ^D to enter default mode. Give root password for login: This is literally everything that user sees on console. My first reaction was to add systemctl --failed as pre-exec to emergency. Unfortunately: linux-q652:~ # systemctl --no-pager --failed UNIT LOAD ACTIVE SUB JOB DESCRIPTION LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB= The low-level unit activation state, values depend on unit type. JOB= Pending job for the unit. 0 units listed. Pass --all to see inactive units, too. Everything is fine. This is understandable - we are now in different transaction and as far as I understand, systemctl --failed shows only results of currently active transaction (am I right?). Only when quiet is turned off, do I really see something (again - assuming we do not have bootsplash ...) Started /boot/efi [ OK ] Dependency failed. Aborted start of /mnt [ ABORT] Dependency failed. Aborted start of Login Service [ ABORT] Dependency failed. Aborted start of D-Bus System Message Bus [ ABORT] Welcome to emergency mode. Use systemctl default or ^D to enter default mode. So right now if anything goes extremely wrong we have baffled user sitting before emergency mode prompt and not knowing what to do next. Is it considered a problem by someone else? Would it be feasible to turn off quiet and bootsplash immediately after any unit failed during system boot? Please note the version of systemd (v44) in openSUSE doesn't have all the needed bits to always display on the screen why dependency failed (and you end up in emergency mode). This is fixed with systemd 195 which should land in Factory pretty soon. As an experiment, I tried the same (add '/dev/sda9 /mnt' to /etc/fstab) under v194-138-g20f59e4, i.e. very recent. After rebooting all I see is the emergency mode prompt. Now the problem is that 'dev-sda9.device' is loaded inactive(dead). This means that it doesn't show up in --failed. So 'systemctl' with various options doesn't show what failed in an easy to recognize way. OTOH 'journalctl -b' is immensly useful: red Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device. Dependency failed for /mnt. Dependency failed for Local File Systemds. ... /red This is great, and it would be really nice to expose it more. I guess that the first change would be to advertise 'journalctl -b' in the emergency mode intro. Would be nice to also un-eescape the device name: Timed out waiting for device /dev/sda9 should be much more understandable for the non-systemd-knowledgable person than Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device. But it would be best to provide a short status like: systemd was trying to reach target 'default.target' (which points to 'Multi-User', multi-user.target), but failed, because device /dev/sda9 is missing (dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device). in turn this caused '/mnt' mount to fail (mnt.mount), in turn this caused 'Local File Systems' target to fail (local-fs.target), ... in turn this caused 'Multi User' target to fail (multi-user.target). And a hint how to e.g. temporarily disable the failing mount point. I admit that I'm not sure what is the proper way, short of editing /etc/fstab and rebooting. Would be nice if this output could be easily retrieved again. If the user starts looking at the system, and then forgets what exactly failed, he or she should be able to repeat this short diagnosis. However,
[systemd-devel] Too little information is shown when system enters emergency mode
This issue comes up relatively often on openSUSE forums. Users complaint that when system drops in emergency, there is nothing that would explain user why it happened or what to do. Typical situation is https://bugzilla.novell.com/show_bug.cgi?id=782904. openSUSE by default is using splash quiet kernel parameter. So the first issue is, interpretation of quite changed in systemd. Now it means suppress all output of systemd services. As result we have the following (even without boot splash involved) when some device in fstab is missing: doing fast boot Creating device nodes with udev Waiting for device /dev/root to appear: ok fsck from util-linux 2.21.2 [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda6 /dev/sda6: clean, 31805/705744 files, 344231/2819584 blocks fsck succeeded. Mounting root device read-write. Mounting root /dev/root mount -o rw,acl,user_xattr -t ext4 /dev/root /root [ 10.706463] piix4_smbus :00:07.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr Welcome to emergency mode. Use systemctl default or ^D to enter default mode. Give root password for login: This is literally everything that user sees on console. My first reaction was to add systemctl --failed as pre-exec to emergency. Unfortunately: linux-q652:~ # systemctl --no-pager --failed UNIT LOAD ACTIVE SUB JOB DESCRIPTION LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB= The low-level unit activation state, values depend on unit type. JOB= Pending job for the unit. 0 units listed. Pass --all to see inactive units, too. Everything is fine. This is understandable - we are now in different transaction and as far as I understand, systemctl --failed shows only results of currently active transaction (am I right?). Only when quiet is turned off, do I really see something (again - assuming we do not have bootsplash ...) Started /boot/efi [ OK ] Dependency failed. Aborted start of /mnt [ ABORT] Dependency failed. Aborted start of Login Service [ ABORT] Dependency failed. Aborted start of D-Bus System Message Bus [ ABORT] Welcome to emergency mode. Use systemctl default or ^D to enter default mode. So right now if anything goes extremely wrong we have baffled user sitting before emergency mode prompt and not knowing what to do next. Is it considered a problem by someone else? Would it be feasible to turn off quiet and bootsplash immediately after any unit failed during system boot? Thank you -andrey ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Too little information is shown when system enters emergency mode
Le dimanche 21 octobre 2012 à 15:59 +0400, Andrey Borzenkov a écrit : This issue comes up relatively often on openSUSE forums. Users complaint that when system drops in emergency, there is nothing that would explain user why it happened or what to do. Typical situation is https://bugzilla.novell.com/show_bug.cgi?id=782904. openSUSE by default is using splash quiet kernel parameter. So the first issue is, interpretation of quite changed in systemd. Now it means suppress all output of systemd services. As result we have the following (even without boot splash involved) when some device in fstab is missing: doing fast boot Creating device nodes with udev Waiting for device /dev/root to appear: ok fsck from util-linux 2.21.2 [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda6 /dev/sda6: clean, 31805/705744 files, 344231/2819584 blocks fsck succeeded. Mounting root device read-write. Mounting root /dev/root mount -o rw,acl,user_xattr -t ext4 /dev/root /root [ 10.706463] piix4_smbus :00:07.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr Welcome to emergency mode. Use systemctl default or ^D to enter default mode. Give root password for login: This is literally everything that user sees on console. My first reaction was to add systemctl --failed as pre-exec to emergency. Unfortunately: linux-q652:~ # systemctl --no-pager --failed UNIT LOAD ACTIVE SUB JOB DESCRIPTION LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB= The low-level unit activation state, values depend on unit type. JOB= Pending job for the unit. 0 units listed. Pass --all to see inactive units, too. Everything is fine. This is understandable - we are now in different transaction and as far as I understand, systemctl --failed shows only results of currently active transaction (am I right?). Only when quiet is turned off, do I really see something (again - assuming we do not have bootsplash ...) Started /boot/efi [ OK ] Dependency failed. Aborted start of /mnt [ ABORT] Dependency failed. Aborted start of Login Service [ ABORT] Dependency failed. Aborted start of D-Bus System Message Bus [ ABORT] Welcome to emergency mode. Use systemctl default or ^D to enter default mode. So right now if anything goes extremely wrong we have baffled user sitting before emergency mode prompt and not knowing what to do next. Is it considered a problem by someone else? Would it be feasible to turn off quiet and bootsplash immediately after any unit failed during system boot? Please note the version of systemd (v44) in openSUSE doesn't have all the needed bits to always display on the screen why dependency failed (and you end up in emergency mode). This is fixed with systemd 195 which should land in Factory pretty soon. However, on a more general basis (not openSUSE specific), I think we should add some special handly in systemd for a kernel command line option (for instance debug or debug=1), which would expand into systemd.log_level=debug systemd.log_target=kmsg). This would be much easier to tell users when debug is needed and we could also add an additional menu entry in bootloader (under the advanced settings) so this setting would be always available, if needed. -- Frederic Crozat fcro...@suse.com SUSE ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel