On Mon, 10 Apr 2017, Lennart Poettering wrote:
On Mon, 10.04.17 18:45, Michael Chapman (m...@very.puzzling.org) wrote:
On Mon, 10 Apr 2017, Lennart Poettering wrote:
On Sun, 09.04.17 10:11, Michael Chapman (m...@very.puzzling.org) wrote:
Don't forget, they've provided an interface for software to use if it needs
more than the guarantees provided by sync. Informally speaking, the FIFREEZE
ioctl is intended to place a filesystem into a "fully consistent" state, not
just a "fully recoverable" state. (Formally it's all a bit hazy: POSIX
really doesn't guarantee anything with sync.)
FIFREEZE does considerably more than what you suggest: it also pauses
all further changes until FITHAW is called. And that's semantics we
really cannot have.
If systemd is just about to call reboot(2), why does it matter?
Well, in the general case we don't actually call reboot(), because we
instead transition back into the initrd, which then eventually calls
that. At least that's what happens on the major general purpose
distros that have an initrd that does that (for example: Fedora/RHEL
with Dracut).
If it's not systemd _inside_ the initrd calling reboot(2), then there's
nothing systemd can do about it.
Moreover, on the kernel side, various bits and pieces hook into the
reboot() syscall too and do last-minute stuff before going down. Are
you sure that if you have a complex storage setup (let's say DM on top
of loop on top of XFS on top of something else), that having frozen a
lower-level file system is not going to make the kernel itself pretty
unhappy if it then tries to clean up something further above?
OK, that is a good point.
I am sorry, but just making all accesses hang is just broken. That
can't work.
I do think we should attempt to remount readonly before doing the FIFREEZE.
I thought systemd did that, but it appears that it does not. A readonly
remount will do what we want so long as no remaining processes have any
files opened for writing on the filesystem. The FIFREEZE would only be
necessary when the remount fails.
We remount everything read-only we can if we cannot unmount
something.
Ah, I see the code for that now. I was looking for something after the
umount call (specifically, if umount failed), not before.
But do note that we can't do that in all cases. Most
prominently: consider a process that is running from an executable
that has been updated on disk (specifically: whose binary got deleted
because it was replaced by a newer version). This process will keep
the file pinned, and will block all read-only remounts, as the kernel
wants to mark the file properly deleted first, but it can't since the
process is keeping it pinned.
This is specifically the case that happened for Plymouth: the binary
probably got updated, hence the process in memory references a deleted
file, which blocks the read-only remounting, in which case we can't do
anything, and sync and remount.
OK, so how about this. _After_ the unmount-everything loop we do a freeze
+ thaw for each remaining filesystem, one filesystem at a time. That won't
permanently block processes that are still writing to the filesystems (and
why would they be?!), it will ensure that all filesystems' journals are
fully flushed (which will make GRUB and other OSs happy), and it won't
block the kernel from doing any kind of reboot()-time cleanups you were
talking about earlier.
Note that systemd itself always reexecutes itself on shutdown, to
ensure that if itself got updated during runtime we'll stop pinning
the old file.
Remember, all of this is because there *is* software that does the wrong
thing, and it *is* possible for software to hang and be unkillable. It would
be good for systemd to do the right thing even in the presence of that kind
of software.
Yeah, we do what we can.
But I seriously doubt FIFREEZE will make things better. It's just
going to make shutdowns hang every now and then.
To be honest, I think having systems unbootable is a more serious problem
than having shutdowns hang. But I also think with a freeze _and_ a thaw
for each filesystem, we won't have hangs.
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel