Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
On Fr, 29.06.18 21:04, Ignaz Forster (ifors...@suse.de) wrote: > Reordered the quotes below for better reading flow. > > Am 28.06.2018 um 10:52 schrieb Lennart Poettering: > > > > But quite frankly I don't grok the problem at hand, i.e. what you are > > > > trying to do, even. > > > > > > Was this explanation any better? > > > > Not really still, what I don't grok what precisely a "system snapshot" > > in suse terms is actually supposed to entail. Is it supposed to > > contain only the vendor RPMs, i.e. only /usr? > > That's the general idea, yes.* > > Everything which contains variable or user data (i.e. which is not supposed > to be rolled back like databases or files created by the user) will be put > onto an own subvolume or partition. > > For reference here's how this looks like on openSUSE Leap 15 again: > ID parent top lvl path > -- -- --- > 2575 5 /@ > 258257257 /@/var > 259257257 /@/usr/local > 260257257 /@/tmp > 261257257 /@/srv > 262257257 /@/root > 263257257 /@/opt > 264257257 /@/home > 265257257 /@/boot/grub2/x86_64-efi > 266257257 /@/boot/grub2/i386-pc > 267257257 /@/.snapshots > 411267267 /@/.snapshots/138/snapshot > 412267267 /@/.snapshots/139/snapshot > > > *) Some packages will still use /bin, /lib and the like, and those will be > part of the snapshot; on the other hand distribution RPMs may also contain > files or directories in e.g. /var, which will not be part of the snapshot. > Because of that I'd prefer the term "static / read-only / unmodifiable part > of the root file system" instead of "vendor RPMs". > > > or everything except > > /home, /srv, /var, /tmp? > > Everything except the directories listed above, because those contain > variable data which one usually doesn't want to reset just because e.g. a > new kernel doesn't boot. > That won't prevent the user from creating his own snapshots of these > subvolumes of course. > > > > > systemd will never create disassociated subvolumes for you. > > > > > > That's the problem - it will create subvolumes which will just disappear > > > from the system when switching to the next snapshot. > > > > Well, no, if snapshots are done recursively they wouldn't, they would > > be switched at the same time. > > I think it's not relevant for this discussion, you were repeatedly talking > about recursive snapshots now, however as far as I'm aware btrfs is not > capable to doing that. I've found a patchset on > https://www.spinics.net/lists/linux-btrfs/msg29205.html, but it seems the > relevant parts for snapshot creation weren't added upstream. > > So how are those recursive btrfs snapshots supposed to work? So, systemd's btrfs code supports doing recursive snapshots (which is exposed through "machinectl clone" or "systemd-nspawn --ephemeral"). If the upstream btrfs tools don't support them, please work with them to fix that. There's nothing too magic about them, it's a pity that this isn't supported yet. > > tmpfiles won't create any subvolumes for you — except if they are > > missing. tmpfiles can't guess the complex mappings you applied to your > > tree, it can't know that you don't want to allow recursive snapshots, > > but place them all in the same dir and bind mount them. Also, if I > > understand correctly the way suse sets this up always *requires* > > additions to fstab for any subvol created, which is clearly out of > > focus for tmpfiles. > > I agree that it's next to impossible to programmatically find out what a > user intended to do with a specific layout. > However in my opinion it would be preferable to create at least a working, > though maybe not optimal configuration compared to a configuration which is > known to break in several cases (independent of the distribution). > > Instead of adding fstab entries (which I also have a bellyache with) it may > be an alternative to create a mount unit instead. But yes, something would > have to be done to mount those subvolumes on boot. I am very much convinced that tmpfiles not should change mount configuration. It's a tool to adjust file system objects on disk, and it should remain that. I think the much nicer approach is the one I suggested, i.e. where subvol trees are always cloned in full, recursively, and it is solely /usr and whatever else shall be disconnected fom them each tree that is mounted into it. > I'm wondering if just refusing to create a subvolume on a snapshot would be > another option... That way the problem would be given back to the user or > distribution. My recommendation: if you really want to go with the design you proposed, go ahead, but make sure you created the bind mounts early enough, tmpfiles won't change them then. After all, tmpfiles will only make changes if something is missing here, it will never change anything that already exists into a subvol. >
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
Reordered the quotes below for better reading flow. Am 28.06.2018 um 10:52 schrieb Lennart Poettering: But quite frankly I don't grok the problem at hand, i.e. what you are trying to do, even. Was this explanation any better? Not really still, what I don't grok what precisely a "system snapshot" in suse terms is actually supposed to entail. Is it supposed to contain only the vendor RPMs, i.e. only /usr? That's the general idea, yes.* Everything which contains variable or user data (i.e. which is not supposed to be rolled back like databases or files created by the user) will be put onto an own subvolume or partition. For reference here's how this looks like on openSUSE Leap 15 again: ID parent top lvl path -- -- --- 2575 5 /@ 258257257 /@/var 259257257 /@/usr/local 260257257 /@/tmp 261257257 /@/srv 262257257 /@/root 263257257 /@/opt 264257257 /@/home 265257257 /@/boot/grub2/x86_64-efi 266257257 /@/boot/grub2/i386-pc 267257257 /@/.snapshots 411267267 /@/.snapshots/138/snapshot 412267267 /@/.snapshots/139/snapshot *) Some packages will still use /bin, /lib and the like, and those will be part of the snapshot; on the other hand distribution RPMs may also contain files or directories in e.g. /var, which will not be part of the snapshot. Because of that I'd prefer the term "static / read-only / unmodifiable part of the root file system" instead of "vendor RPMs". or everything except /home, /srv, /var, /tmp? Everything except the directories listed above, because those contain variable data which one usually doesn't want to reset just because e.g. a new kernel doesn't boot. That won't prevent the user from creating his own snapshots of these subvolumes of course. systemd will never create disassociated subvolumes for you. That's the problem - it will create subvolumes which will just disappear from the system when switching to the next snapshot. Well, no, if snapshots are done recursively they wouldn't, they would be switched at the same time. I think it's not relevant for this discussion, you were repeatedly talking about recursive snapshots now, however as far as I'm aware btrfs is not capable to doing that. I've found a patchset on https://www.spinics.net/lists/linux-btrfs/msg29205.html, but it seems the relevant parts for snapshot creation weren't added upstream. So how are those recursive btrfs snapshots supposed to work? tmpfiles won't create any subvolumes for you — except if they are missing. tmpfiles can't guess the complex mappings you applied to your tree, it can't know that you don't want to allow recursive snapshots, but place them all in the same dir and bind mount them. Also, if I understand correctly the way suse sets this up always *requires* additions to fstab for any subvol created, which is clearly out of focus for tmpfiles. I agree that it's next to impossible to programmatically find out what a user intended to do with a specific layout. However in my opinion it would be preferable to create at least a working, though maybe not optimal configuration compared to a configuration which is known to break in several cases (independent of the distribution). Instead of adding fstab entries (which I also have a bellyache with) it may be an alternative to create a mount unit instead. But yes, something would have to be done to mount those subvolumes on boot. Also, tmpfiles won't actually create any subvols below /usr (unless a user dropped something in to do that on its own), it will only do so in the root dir for precisely /var, /tmp, /home and /srv. All others are created below /var. Which means you rule of "don't create subvols below system directories" isn't actually touched, because the read-only OS is monopolized in /usr anyway... Or maybe I am still not getting what you are trying to say? The rule would be "don't create subvols below snapshots", and the read-only OS is not exactly monopolized in /usr either (not only because of /bin, /lib etc, but also because of /boot - see last paragraph of the mail), but apart from that that nails it. The issue was originally discovered when upgrading systemd on an older openSUSE machine which did not have a unified /var subvolume, so /var/lib/machines got attached to the root subvolume. This may happen again in the future for us, but as said we are not the only ones using this mechanism. Seeing the default Fedora and Ubuntu btrfs layouts it's even more likely to happen if anybody is using pattern 3 there. Apart from that I'd prefer systemd-tmpfiles to work even if a user threw in something unexpected. I'm wondering if just refusing to create a subvolume on a snapshot would be another option... That way the problem would be given back to the user or distribution. The assumption systemd-tmpfiles makes
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
On Mi, 27.06.18 23:25, Ignaz Forster (ifors...@suse.de) wrote: > 3) Set the default btrfs subvolume to the backup snapshot subvolume (or a > copy of it) > > I'm talking about case *3* here. Whenever one wants to roll back to a > certain snapshot one would just call e.g. > btrfs subvolume set-default /.snapshots/123/snapshot/ > and reboot. > > In contrast to case 1 and 2 there is no dedicated static "/" subvolume > (which also implies this is not a "Nested", but a "Flat" or "Mixed" layout > as outlined on > https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Layout), but the > default btrfs subvolume will always point to the snapshot of the last > rollback. (On openSUSE this mechanism is also used for read-only systems > where each update will switch to a new snapshot.) > > > I'd sum this up as: subvolumes for system directories should never be > created as children of a snapshot as snapshots are a variable thing. tmpfiles won't create any subvolumes for you — except if they are missing. tmpfiles can't guess the complex mappings you applied to your tree, it can't know that you don't want to allow recursive snapshots, but place them all in the same dir and bind mount them. Also, if I understand correctly the way suse sets this up always *requires* additions to fstab for any subvol created, which is clearly out of focus for tmpfiles. Also, tmpfiles won't actually create any subvols below /usr (unless a user dropped something in to do that on its own), it will only do so in the root dir for precisely /var, /tmp, /home and /srv. All others are created below /var. Which means you rule of "don't create subvols below system directories" isn't actually touched, because the read-only OS is monopolized in /usr anyway... Or maybe I am still not getting what you are trying to say? > > The assumption systemd-tmpfiles makes is always that the subvolumes > > it implicitly creates for you if they are missing are associated > > with the subvolume they are created below, and that this means they > > are snapshotted, removed and otheerwise managed along with them. > > Keeping this logic more or less assumes that snapshots will always be used > as static backups and pattern 3 from above must not be used. I don't see that at all. I mean, this all depends how you want to associate /var with /. my assumption is that they belong together, but i figure that's not what you have in mind? you want to keep using the same /var even though you switch back and forth to different /? i am not sure if follow fully, but i think the model should be the other way round: keep the root file system in one subvolume, and keep /usr completely separate from that, and only combine the two through bind mounts when you want to go for one specific version. In that mode, all subvolumes systemd generates would be children of the root subvolume, as they should be, but /usr would be separate. > > systemd will never create disassociated subvolumes for you. > > That's the problem - it will create subvolumes which will just disappear > from the system when switching to the next snapshot. Well, no, if snapshots are done recursively they wouldn't, they would be switched at the same time. > > But quite frankly I don't grok the problem at hand, i.e. what you are > > trying to do, even. > > Was this explanation any better? Not really still, what I don't grok what precisely a "system snapshot" in suse terms is actually supposed to entail. Is it supposed to contain only the vendor RPMs, i.e. only /usr? or everything except /home, /srv, /var, /tmp? Or the inverse of that? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
Am 27.06.2018 um 16:34 schrieb Lennart Poettering: On Mi, 27.06.18 15:50, Ignaz Forster (ifors...@suse.de) wrote: By recursive snaphots I really mean recursive snapshots, i.e. if you have a subvolume called `/foobar` and there's a subvolume below it called `/foobar/var`, and you'd make a snapshot of `/foobar` and call it `/foobar2`, then this would implicitly also have the effect of snapshotting `/foobar/var` and calling it `/foobar2/var`, so that each snapshot is always "complete". Ah, I see - no, that's not the problem here. The subvolumes are there because we do *not* want to snapshot them. It's guess it's best to just ignore the second bullet point - it's a follow up problem, but it isn't really important for the main point: Attaching a new subvolume to a snapshot. I still don't grok this. What's the precise problem then? The problem is that the subvolume layout may not always be the way systemd-tmpfiles expects it to be. This is caused by different possible ways of handling rollbacks to a previous snapshot. I'm aware of three common patterns on how to restore a previous snapshot of the system (i.e. the root file system): 1) Use 'mv' to move the backup to the original location 2) Delete the root file system snapshot and recreate it using the backup snapshot as a parent 3) Set the default btrfs subvolume to the backup snapshot subvolume (or a copy of it) I'm talking about case *3* here. Whenever one wants to roll back to a certain snapshot one would just call e.g. btrfs subvolume set-default /.snapshots/123/snapshot/ and reboot. In contrast to case 1 and 2 there is no dedicated static "/" subvolume (which also implies this is not a "Nested", but a "Flat" or "Mixed" layout as outlined on https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Layout), but the default btrfs subvolume will always point to the snapshot of the last rollback. (On openSUSE this mechanism is also used for read-only systems where each update will switch to a new snapshot.) I'd sum this up as: subvolumes for system directories should never be created as childs of a snapshot as snapshots are a variable thing. The assumption systemd-tmpfiles makes is always that the subvolumes it implicitly creates for you if they are missing are associated with the subvolume they are created below, and that this means they are snapshotted, removed and otheerwise managed along with them. Keeping this logic more or less assumes that snapshots will always be used as static backups and pattern 3 from above must not be used. However not only *SUSE or snapper are using this pattern, but several websites also suggest this workflow - that's why I'm interested in upstream support. systemd will never create disassociated subvolumes for you. That's the problem - it will create subvolumes which will just disappear from the system when switching to the next snapshot. If you want that use some other tools, but tmpfiles is not really supposed to do complex stuff like that. The added complexity is the reason why I brought this to the list. However I'd (obviously) still prefer compatibility with a larger array of btrfs layouts by default. Finding the subvolume and making sure it's mounted on the next boot (e.g. by adding an fstab entry or a mount unit) would be the most complex part about this. But quite frankly I don't grok the problem at hand, i.e. what you are trying to do, even. Was this explanation any better? Ignaz -- Ignaz Forster Research Engineer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-281; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
On Mi, 27.06.18 15:50, Ignaz Forster (ifors...@suse.de) wrote: > > By recursive snaphots I really mean recursive snapshots, i.e. if you > > have a subvolume called `/foobar` and there's a subvolume below it > > called `/foobar/var`, and you'd make a snapshot of `/foobar` and call > > it `/foobar2`, then this would implicitly also have the effect of > > snapshotting `/foobar/var` and calling it `/foobar2/var`, so that each > > snapshot is always "complete". > > Ah, I see - no, that's not the problem here. > The subvolumes are there because we do *not* want to snapshot them. > > It's guess it's best to just ignore the second bullet point - it's a follow > up problem, but it isn't really important for the main point: Attaching a > new subvolume to a snapshot. I still don't grok this. What's the precise problem then? The assumption systemd-tmpfiles makes is always that the subvolumes it implicitly creates for you if they are missing are associated with the subvolume they are created below, and that this means they are snapshotted, removed and otheerwise managed along with them. systemd will never create disassociated subvolumes for you. If you want that use some other tools, but tmpfiles is not really supposed to do complex stuff like that. But quite frankly I don't grok the problem at hand, i.e. what you are trying to do, even. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
Am 27.06.2018 um 15:37 schrieb Lennart Poettering: On Mi, 27.06.18 15:09, Ignaz Forster (ifors...@suse.de) wrote: Am 27.06.2018 um 13:39 schrieb Lennart Poettering: On Mi, 27.06.18 13:02, Ignaz Forster (ifors...@suse.de) wrote: Hello, when using systemd-tmpfiles' feature to create subvolumes it will always create the new subvolume as a child of the subvolume of the given path. This however may not always be the expected parent, especially when using btrfs snapshots to switch between various system states. Example layout: === Let's assume the following subvolume layout (a simplified openSUSE layout): ID parent top level path -- -- - 257 5 5 /@ 258 257 257 /@/var 259 257 257 /@/.snapshots/1/snapshot 260 257 257 /@/.snapshots/2/snapshot 261 257 257 /@/.snapshots/3/snapshot A corresponding /etc/fstab could look like this: /dev/sdx/ btrfs defaults0 0 /dev/sdx/varbtrfs subvol=@/var0 0 with the default btrfs subvolume set to "261". The third snapshot would thus be the root file system, with /var mounted on top of it. The problem: Creating "/var/test" would create a new entry like 262 258 258 @/var/test as expected. However creating "/opt" would create an entry similar to the following: 263 261 261 @/.snapshots/3/snapshot/opt This is not good, as two things will happen now: * When changing the snapshot (e.g. by reverting back to an old snapshot or creating a new one) /opt won't be visible any more (without manually mounting it), as it is not nested into the existing structure any more * The third snapshot cannot be deleted without removing the subvolume first I am not sure I follow here fully. but isn't this just a shortcoming because you are not doing recursive snapshots? why not just fix that? With "recursive snapshots" I assume you mean putting the snapshot below the original root file system? By recursive snaphots I really mean recursive snapshots, i.e. if you have a subvolume called `/foobar` and there's a subvolume below it called `/foobar/var`, and you'd make a snapshot of `/foobar` and call it `/foobar2`, then this would implicitly also have the effect of snapshotting `/foobar/var` and calling it `/foobar2/var`, so that each snapshot is always "complete". Ah, I see - no, that's not the problem here. The subvolumes are there because we do *not* want to snapshot them. It's guess it's best to just ignore the second bullet point - it's a follow up problem, but it isn't really important for the main point: Attaching a new subvolume to a snapshot. Ignaz -- Ignaz Forster Research Engineer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-281; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
On Mi, 27.06.18 15:09, Ignaz Forster (ifors...@suse.de) wrote: > Am 27.06.2018 um 13:39 schrieb Lennart Poettering: > > On Mi, 27.06.18 13:02, Ignaz Forster (ifors...@suse.de) wrote: > > > > > Hello, > > > > > > when using systemd-tmpfiles' feature to create subvolumes it will always > > > create the new subvolume as a child of the subvolume of the given path. > > > This > > > however may not always be the expected parent, especially when using btrfs > > > snapshots to switch between various system states. > > > > > > Example layout: > > > === > > > > > > Let's assume the following subvolume layout (a simplified openSUSE > > > layout): > > > > > > IDparent top level path > > > ---- - > > > 257 5 5 /@ > > > 258 257 257 /@/var > > > 259 257 257 /@/.snapshots/1/snapshot > > > 260 257 257 /@/.snapshots/2/snapshot > > > 261 257 257 /@/.snapshots/3/snapshot > > > > > > A corresponding /etc/fstab could look like this: > > > > > > /dev/sdx / btrfs defaults0 0 > > > /dev/sdx /varbtrfs subvol=@/var0 0 > > > > > > with the default btrfs subvolume set to "261". > > > The third snapshot would thus be the root file system, with /var mounted > > > on > > > top of it. > > > > > > > > > The problem: > > > > > > > > > Creating "/var/test" would create a new entry like > > > 262 258 258 @/var/test > > > as expected. > > > However creating "/opt" would create an entry similar to the following: > > > 263 261 261 @/.snapshots/3/snapshot/opt > > > > > > This is not good, as two things will happen now: > > > * When changing the snapshot (e.g. by reverting back to an old snapshot or > > > creating a new one) /opt won't be visible any more (without manually > > > mounting it), as it is not nested into the existing structure any more > > > * The third snapshot cannot be deleted without removing the > > > subvolume first > > > > I am not sure I follow here fully. but isn't this just a shortcoming because > > you are not doing recursive snapshots? why not just fix that? > > With "recursive snapshots" I assume you mean putting the snapshot below the > original root file system? By recursive snaphots I really mean recursive snapshots, i.e. if you have a subvolume called `/foobar` and there's a subvolume below it called `/foobar/var`, and you'd make a snapshot of `/foobar` and call it `/foobar2`, then this would implicitly also have the effect of snapshotting `/foobar/var` and calling it `/foobar2/var`, so that each snapshot is always "complete". Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
Am 27.06.2018 um 13:39 schrieb Lennart Poettering: On Mi, 27.06.18 13:02, Ignaz Forster (ifors...@suse.de) wrote: Hello, when using systemd-tmpfiles' feature to create subvolumes it will always create the new subvolume as a child of the subvolume of the given path. This however may not always be the expected parent, especially when using btrfs snapshots to switch between various system states. Example layout: === Let's assume the following subvolume layout (a simplified openSUSE layout): ID parent top level path -- -- - 257 5 5 /@ 258 257 257 /@/var 259 257 257 /@/.snapshots/1/snapshot 260 257 257 /@/.snapshots/2/snapshot 261 257 257 /@/.snapshots/3/snapshot A corresponding /etc/fstab could look like this: /dev/sdx/ btrfs defaults0 0 /dev/sdx/varbtrfs subvol=@/var0 0 with the default btrfs subvolume set to "261". The third snapshot would thus be the root file system, with /var mounted on top of it. The problem: Creating "/var/test" would create a new entry like 262 258 258 @/var/test as expected. However creating "/opt" would create an entry similar to the following: 263 261 261 @/.snapshots/3/snapshot/opt This is not good, as two things will happen now: * When changing the snapshot (e.g. by reverting back to an old snapshot or creating a new one) /opt won't be visible any more (without manually mounting it), as it is not nested into the existing structure any more * The third snapshot cannot be deleted without removing the subvolume first I am not sure I follow here fully. but isn't this just a shortcoming because you are not doing recursive snapshots? why not just fix that? With "recursive snapshots" I assume you mean putting the snapshot below the original root file system? If so that's not how the btrfs subvolumes are organized in this case: The "@" subvolume itself is almost empty and only contains further subvolumes. During *SUSE setup everything will be installed to "@/.snapshots/1/snapshot" instead, and this subvolume will be set as the default btrfs subvolume (which would be equivalent to using the mount options 'subvol=@/.snapshots/1/snapshot' for '/'). After installation a tool called Snapper (default on *SUSE, but also available on several other distributions) will take care of snapshot management (e.g. by creating a new snapshot on system changes). Now if a user wants to do a rollback, the default btrfs subvolume will just be set to that specific snapshot and makes that one the new root file system. This design is intentional by relying on btrfs' feature to change the default subvolume, and thus imho works a designed. -- Ignaz Forster Research Engineer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-281; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
On Mi, 27.06.18 13:02, Ignaz Forster (ifors...@suse.de) wrote: > Hello, > > when using systemd-tmpfiles' feature to create subvolumes it will always > create the new subvolume as a child of the subvolume of the given path. This > however may not always be the expected parent, especially when using btrfs > snapshots to switch between various system states. > > Example layout: > === > > Let's assume the following subvolume layout (a simplified openSUSE layout): > > IDparent top level path > ---- - > 257 5 5 /@ > 258 257 257 /@/var > 259 257 257 /@/.snapshots/1/snapshot > 260 257 257 /@/.snapshots/2/snapshot > 261 257 257 /@/.snapshots/3/snapshot > > A corresponding /etc/fstab could look like this: > > /dev/sdx / btrfs defaults0 0 > /dev/sdx /varbtrfs subvol=@/var0 0 > > with the default btrfs subvolume set to "261". > The third snapshot would thus be the root file system, with /var mounted on > top of it. > > > The problem: > > > Creating "/var/test" would create a new entry like > 262 258 258 @/var/test > as expected. > However creating "/opt" would create an entry similar to the following: > 263 261 261 @/.snapshots/3/snapshot/opt > > This is not good, as two things will happen now: > * When changing the snapshot (e.g. by reverting back to an old snapshot or > creating a new one) /opt won't be visible any more (without manually > mounting it), as it is not nested into the existing structure any more > * The third snapshot cannot be deleted without removing the > subvolume first I am not sure I follow here fully. but isn't this just a shortcoming because you are not doing recursive snapshots? why not just fix that? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd-tmpfiles subvolume handling vs. changing default btrfs root
Hello, when using systemd-tmpfiles' feature to create subvolumes it will always create the new subvolume as a child of the subvolume of the given path. This however may not always be the expected parent, especially when using btrfs snapshots to switch between various system states. Example layout: === Let's assume the following subvolume layout (a simplified openSUSE layout): ID parent top level path -- -- - 257 5 5 /@ 258 257 257 /@/var 259 257 257 /@/.snapshots/1/snapshot 260 257 257 /@/.snapshots/2/snapshot 261 257 257 /@/.snapshots/3/snapshot A corresponding /etc/fstab could look like this: /dev/sdx/ btrfs defaults0 0 /dev/sdx/varbtrfs subvol=@/var0 0 with the default btrfs subvolume set to "261". The third snapshot would thus be the root file system, with /var mounted on top of it. The problem: Creating "/var/test" would create a new entry like 262 258 258 @/var/test as expected. However creating "/opt" would create an entry similar to the following: 263 261 261 @/.snapshots/3/snapshot/opt This is not good, as two things will happen now: * When changing the snapshot (e.g. by reverting back to an old snapshot or creating a new one) /opt won't be visible any more (without manually mounting it), as it is not nested into the existing structure any more * The third snapshot cannot be deleted without removing the subvolume first Possible solutions: === Let's have a look at the default btrfs layouts of some distributions. Fedora 28 (Server): ID parent top lvl path -- -- --- 257 5 5 /root Ubuntu 18.04: 257 5 5 /@ 258 5 5 /@home openSUSE Leap 15: 257 5 5 /@ 258 257 257 /@/var 259 257 257 /@/usr/local 260 257 257 /@/tmp 261 257 257 /@/srv 262 257 257 /@/root 263 257 257 /@/opt 264 257 257 /@/home 265 257 257 /@/boot/grub2/x86_64-efi 266 257 257 /@/boot/grub2/i386-pc 267 257 257 /@/.snapshots 411 267 267 /@/.snapshots/138/snapshot 412 267 267 /@/.snapshots/139/snapshot Option 1: - Go back the path and check for subvolumes where the parent ID is "5". This would work for the default btrfs layout of these three distributions and is easy to implement, but would break in the hypothetical case if someone would create snapshot directories directly as children of ID 5, e.g. 257 5 5 /@ 258 5 5 /@home 259 5 5 /@snapshot1 Option 2: - A variant would be to keep the current behaviour except when the parent ID would be a snapshot (by checking if a parent UUID is set?); in this case the snapshot would be created as a child of ID 5. I can't think of any case where automatically creating a subvolume below a snapshot is a good idea, so in theory this sounds like best approach. In any case a corresponding implementation would mean additional handling for generating an fstab entry and mounting the subvolume if the subvolume is not created as a subdirectory of an existing subvolume. Would this be an approach that would be acceptable upstream? Please note that I'm not a btrfs expert, so I may be missing something. -- Ignaz Forster Research Engineer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-281; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel