Hi,

On 12-06-18 10:24, Lennart Poettering wrote:
On Mo, 11.06.18 17:40, Hans de Goede (hdego...@redhat.com) wrote:

It am very sure it's not worth trying to maintain a shutdown_sucess
variable that is determined that early. That's a pointless excercise,
OB> > you won't catch 99% of relevant issues that way...

Ok, I had a quick chat with the rest of the laptop team about this
and will just drop the shutdown_success flag.

Excellent! Thanks for reconsidering.

I mean, if there was a nice place we could store shutdown state info
at a very late point of shutdown we'd totally do that, but nothing
good has appeared so far. There are EFI variables and pstore, but
given the low quality of the memory of those things it's probably not
a good idea to write to them on every shutdown.

Yes, I have considered using an EFI variable too, but I too I'm
afraid this will damage the crappy backing store for the EFI
variables.

But we also (Fedora 30 timeframe) want to support fastboot, where
we don't check for a keypress at all. The problem is that scanning
the USB bus can take quite long and some firmware skips this if
their "fastboot" option is enabled (typically the default now a days),
but if we then ask for keypress / state info in grub most firmwares (*)
will do the USB scan at that point, causing easily 2-3 seconds extra
boot time.

The other option of course is to emphasize the "reboot into firmware"
feature of EFI more.

Yes we need "reboot into firmware" support for machines which
have fastboot enabled in the firmware, because otherwise there is no
way to get into the firmware. We actually already need this today.

But this does not really help with getting the grub menu when it
is necessary to rescue the system:

1) AFAICT this will not help with getting into grub when grub's fastboot
support is enabled and it won't even check for a key.

2) The system may be broken in such a way that the user is unable to
run the command / click the menu item for this.

In systemd there's "systemctl reboot --firmware"
to get into the firmware setup that way.

Ah I was working on a minimal hack to do this inside grub, but I will
drop that then.

In sd-boot we also implicitly
add a menu item if the functionality is available.

And I've cherry-picked a patch from Ubuntu to do the same in grub
(if the menu is shown) which is something which we should have done
a long time ago.

I figure gdm could
try to expose that feature somewhere, maybe in the top-right menu or
so?

Yes I need to talk to the GNOME designers about adding some advanced
reboot options somewhere:

1) Reboot into firmware setup
2) Show boot menu next menu

Any others?

I'm thinking myself to do something like what Windows does (assuming
that will help with discoverability) where shift + click on reboot shows
this menu.

I figure if there's need for it we could even have some mini daemon
whose only job is to provide a reboot-into-firmware hotkey during
early boot time. i.e. something that just listens to some otherwise
silent keycombo (maybe shift+alt+ctrl), and when it's pressed within
the first minute of bootup we'll instantly reboot into firmware or
so... In theory that could even be systemd-logind (which already
watches input devices for SW_DOCK and SW_LID events), but logind is
started quite late, hence maybe a seperate mini daemon might be
wise...

Hmm, how soon during boot is the ctrl+alt+up target available ?
We could add a .service file there which forces showing the grub
menu next time (and the grub menu will also allow entering firmware).

I already have a menu_show_once grubenv variable which gets checked
in grub.cfg-s generated (*) by the new grub2-mkconfig code I'm working
on for the auto-hide stuff, so the service file would just need
to call grub2-editenv to set that.

*) yes grub is ugly


Are you sure that powering up a system and powering it down
right-after should trigger the boot menu?

I know that that is not ideal, but who would do that anyways? This
should happen very rarely and the side-effect is completely
harmless.

I have the suspicion that this can happen pretty regularly. Think:
university computer pools, internet cafes and suchlike, which boot up
in the morning, and shutdown in the night, and might not see anyone
actually log in. (That said, not sure if computer pools and internet
cafes still exist event — maybe in some less connected country, dunno)

Maybe an approach like this could work: define two image states:
"known-good" and "dont-know". A newly installed image comes up as
"dont-know", and as soon as the system level stuff is happy is marked
as "known-good". This part is obvious I guess. But now we'd allow the
system be moved back to "dont-know", and iterate through this
again. The first login on the system would use this, and set the state
back to "dont-know" and then "known-good" when the login worked
fine. And you could actually use systemd to manage both: the former
would be implemented by a target unit for the PID 1 service manager,
and the latter by a similar named target unit for the per-user service
manager.

With that approach you'd have a universal system again: server and
desktop systems would have the same behaviour and the same mechanisms
and you can easily convert one to the other and back.

BTW, to fill in a bit of background, which might be interesting in
this context, but is also a bit orthogonal: it was our intention to
add boot-counting and revert-to-last-working-image support to
sd-boot. The scheme is supposed to be very simple: whenever a new
kernel/initrd image is dropped into the ESP its filename would be
suffixed with a boot counter ".5". Whenever such an image is started
by sd-boot its name is first changed, decrementing the counter by
one. i.e. on the first boot of the image the counter would become
".4", and so on. Images with a counter of ".0" would not be booted
automatically anymore. And when the system managed to boot up cleanly,
the suffix would be dropped from the filename. In this scheme an image
"foo", could hence appear under the following names during its
lifetime:

- foo.5 (freshly installed, 5 tries to go)
- foo.4 (one failed boot, 4 tries to go)
- foo.3 (two failed boots, 3 tries to go)
- foo.2 (three failed boots, 2 tries to go)
- foo.1 (four failed boots, 1 tries to go)
- foo.0 (five failed boots, never try this one again)
- foo (known good, pick this one again)

Decrementing the counter would be done from boot loader
context. Dropping the counter would be done from the OS, after
boot-up.

This scheme is really simple as the counter is stored in very
discoverable ways in the ESP, and modifiable with simple shell tools
(both Linux and EFI shells that is). It is also implementable with one
simple operation that has a great chance of working correctly in EFI's
crappy file system implementations: file rename. Moreover it's
relatively lean on metadata: simply by picking the initial name the
installer can say how many tries shall be tried before giving up.

I wonder if it would be worth agreeing on common semantics
here. i.e. extend the Bootloader spec, to document these suffixes, so
that Grub could honour them too. And then extend them slightly to
cover your case too. For example, we could also say that whenever the
boot loader decrements the counter it also increments another
one. example: foo.5 → foo.4.1 → foo.3.2 → foo.2.3 → foo.1.4 →
foo.0.5. And to implement your usecase you'd then show the boot menu
automatically whenever the name has the second counter set.

Interesting. I've added Javier Martinez Canillas who is working
on implemeting BLS for Fedora 29 to the Cc.

We did consider doing some scheme where we would automatically
fallback to an older kernel (setting a boot_once variable for
that kernel (rather then boot 5 times) and then if that boot
was not marked as successful, switch back to the older kernel.

The problem is that classic Fedora Workstation is a grab bag
of bits and pieces and this scheme only takes the kernel into
account.

Where as an update to GNOME or mesa could just as well render
the system unusable and then we still want the user to get to
the grub menu so he can enter single-user mode and do a
downgrade there.  I know that for a lot of users if the system
is broken it is broken and a reinstall is the only answer, but
there is a group of users who will appreciate being able to
rescue there systems at this point.

The same problem applies to your "known good" suggestion
from above, we would need to clear "known good" as soon as
a single package changes, which makes it loose almost all its
value.

Another issue is side-effects of failing to mark boot_success,
showing the menu is an undesirable, but otherwise we-can-live-with-it
side-effect of failing to mark boot_success while we should have.

Automatically falling back to an older kernel is a worse
side-effect. I know you were not suggesting that, with the second
counter proposal. But this is something which we've considered
and rejected for classic Fedora Workstation.

Now OTOH for Atomic where the entire kernel + Base-OS is a single
fixed entity, auto-fallback is definitely something we want.

So while I have you attention, for this whole auto-hide the menu /
determine previous boot was successful we also want to sometimes
increment an integer grub environment variable called
boot_indeterminate. Basically call:

grub2-editenv - incr boot_indeterminate

This is intended for reboots caused by selinux-relabels and
offline updates.

The idea is that boot_indeterminate==1 also counts as a boot
success (after which grub itself will increment it to 2). We don't
want the offline-updates to set boot_success=1 as we want to detect
an offline-updates reboot loop (as unlikely as that may be).

TL;DR: we want to call "grub2-editenv - incr boot_indeterminate"
when doing offline updates. I could just add a service for this
to: /lib/systemd/system/system-update.target.wants.

In the scheme I suggested above such an operation would simply be
"increase the counter again by one".

Ack.

If I understand correctly systemd will start all services under:
/lib/systemd/system/system-update.target.wants one by one and
the service to mark the boot indeterminate should exit with an
error since it is not the one handling the updates (that is fine),
but if the actual update service runs before us then we won't
run.

I could modify all the services under 
/lib/systemd/system/system-update.target.wants
with an ExecStartPre to call grub2-editenv but I would prefer
a generic solution, any suggestions here ?

Not sure I follow. I mean, setting the state to "indeterminate" should
happen whenever the offline update operation succeeded, no? If the
offline update operation fails then this should be counted as a bad
boot, no? As such your little plugin should run after
system-update.target, exactly like in the default.target regular boot
case?

AFAIK the service actually doing the updates is supposed to call
systemctl reboot --force when it is done, so any targets after
system-update.target won't get started ?

Regards,

Hans


_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to