Reordered the quotes below for better reading flow.

Am 28.06.2018 um 10:52 schrieb Lennart Poettering:
But quite frankly I don't grok the problem at hand, i.e. what you are
trying to do, even.

Was this explanation any better?

Not really still, what I don't grok what precisely a "system snapshot"
in suse terms is actually supposed to entail. Is it supposed to
contain only the vendor RPMs, i.e. only /usr?

That's the general idea, yes.*

Everything which contains variable or user data (i.e. which is not supposed to be rolled back like databases or files created by the user) will be put onto an own subvolume or partition.

For reference here's how this looks like on openSUSE Leap 15 again:
ID     parent top lvl path
--     ------ ------- ----
257    5      5       <FS_TREE>/@
258    257    257     <FS_TREE>/@/var
259    257    257     <FS_TREE>/@/usr/local
260    257    257     <FS_TREE>/@/tmp
261    257    257     <FS_TREE>/@/srv
262    257    257     <FS_TREE>/@/root
263    257    257     <FS_TREE>/@/opt
264    257    257     <FS_TREE>/@/home
265    257    257     <FS_TREE>/@/boot/grub2/x86_64-efi
266    257    257     <FS_TREE>/@/boot/grub2/i386-pc
267    257    257     <FS_TREE>/@/.snapshots
411    267    267     <FS_TREE>/@/.snapshots/138/snapshot
412    267    267     <FS_TREE>/@/.snapshots/139/snapshot


*) Some packages will still use /bin, /lib and the like, and those will be part of the snapshot; on the other hand distribution RPMs may also contain files or directories in e.g. /var, which will not be part of the snapshot. Because of that I'd prefer the term "static / read-only / unmodifiable part of the root file system" instead of "vendor RPMs".

or everything except
/home, /srv, /var, /tmp?

Everything except the directories listed above, because those contain variable data which one usually doesn't want to reset just because e.g. a new kernel doesn't boot. That won't prevent the user from creating his own snapshots of these subvolumes of course.

systemd will never create disassociated subvolumes for you.

That's the problem - it will create subvolumes which will just disappear
from the system when switching to the next snapshot.

Well, no, if snapshots are done recursively they wouldn't, they would
be switched at the same time.

I think it's not relevant for this discussion, you were repeatedly talking about recursive snapshots now, however as far as I'm aware btrfs is not capable to doing that. I've found a patchset on https://www.spinics.net/lists/linux-btrfs/msg29205.html, but it seems the relevant parts for snapshot creation weren't added upstream.

So how are those recursive btrfs snapshots supposed to work?

tmpfiles won't create any subvolumes for you — except if they are
missing. tmpfiles can't guess the complex mappings you applied to your
tree, it can't know that you don't want to allow recursive snapshots,
but place them all in the same dir and bind mount them. Also, if I
understand correctly the way suse sets this up always *requires*
additions to fstab for any subvol created, which is clearly out of
focus for tmpfiles.

I agree that it's next to impossible to programmatically find out what a user intended to do with a specific layout. However in my opinion it would be preferable to create at least a working, though maybe not optimal configuration compared to a configuration which is known to break in several cases (independent of the distribution).

Instead of adding fstab entries (which I also have a bellyache with) it may be an alternative to create a mount unit instead. But yes, something would have to be done to mount those subvolumes on boot.

Also, tmpfiles won't actually create any subvols below /usr (unless a
user dropped something in to do that on its own), it will only do so
in the root dir for precisely /var, /tmp, /home and /srv. All others
are created below /var. Which means you rule of "don't create subvols
below system directories" isn't actually touched, because the
read-only OS is monopolized in /usr anyway... Or maybe I am still not
getting what you are trying to say?

The rule would be "don't create subvols below snapshots", and the read-only OS is not exactly monopolized in /usr either (not only because of /bin, /lib etc, but also because of /boot - see last paragraph of the mail), but apart from that that nails it.

The issue was originally discovered when upgrading systemd on an older openSUSE machine which did not have a unified /var subvolume, so /var/lib/machines got attached to the root subvolume. This may happen again in the future for us, but as said we are not the only ones using this mechanism. Seeing the default Fedora and Ubuntu btrfs layouts it's even more likely to happen if anybody is using pattern 3 there. Apart from that I'd prefer systemd-tmpfiles to work even if a user threw in something unexpected.

I'm wondering if just refusing to create a subvolume on a snapshot would be another option... That way the problem would be given back to the user or distribution.

The assumption systemd-tmpfiles makes is always that the subvolumes
it implicitly creates for you if they are missing are associated
with the subvolume they are created below, and that this means they
are snapshotted, removed and otheerwise managed along with them.

Keeping this logic more or less assumes that snapshots will always be used
as static backups and pattern 3 from above must not be used.

I don't see that at all. I mean, this all depends how you want to
associate /var with /. my assumption is that they belong together, but
i figure that's not what you have in mind? you want to keep using the
same /var even though you switch back and forth to different /?

Exactly - viewing them as separate entities after installation has proven to work very reliably for us and is documented accordingly. As said above the reasoning behind this is that you usually don't want to loose e.g. all accumulated databases changes just because you have to revert the system state due to a failed package update.

i am not sure if follow fully, but i think the model should be the
other way round: keep the root file system in one subvolume, and keep
/usr completely separate from that, and only combine the two through
bind mounts when you want to go for one specific version. In that
mode, all subvolumes systemd generates would be children of the root
subvolume, as they should be, but /usr would be separate.

Currently the snapshot contains everything which is relevant for a complete rollback of the system including /boot and /.snapshots (containing snapper metadata). Splitting this up into three (or more) separate subvolumes would be a major architectural change. I'll think about this over the weekend, but I don't think I like the idea - synchronizing those volumes will probably be a nightmare.

Ignaz
--
Ignaz Forster <[email protected]>
Research Engineer
SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg
Tel: +49-911-74053-281;  https://www.suse.com/
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard,
Graham Norton, HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to