Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-20 Thread Chris Murphy
On Mon, Jul 20, 2020 at 12:11 PM Daniel P. Berrangé  wrote:
>
> On Mon, Jul 13, 2020 at 08:06:22AM -0600, Chris Murphy wrote:
> > On Mon, Jul 13, 2020 at 6:20 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Sat, Jul 11, 2020 at 07:28:43PM -0600, Chris Murphy wrote:
> >
> > > > Also, the location for GNOME Boxes doesn't exist at install time, so
> > > > the installer doing it with a post-install script isn't a complete
> > > > solution.
> > > >
> > > > While Boxes can use 'qemu-img create -o nocow=on' there is an
> > > > advantage of 'chattr +C' on the enclosing directory: files copied into
> > > > it, as well as new files created, inherit the attribute. Meanwhile, it
> > > > can't be set after the file has non-zero size.
> > >
> > > Boxes will use libvirt storage pools APIs to create the directory in
> > > the first place. So if we add nocow support to the pool APIs, we dont
> > > need to worry about per-file usage in most cases.
> > >
> > >
> > > Is there a good way to determine whether a given filesystem supports
> > > copy-on-write, other than to simply try to set the +C attribute on a
> > > directory ?  Ideally we would check for this feature, not check
> > > whether the filesystem is btrfs, as that makes it future proof to
> > > other cow supporting filesystems.
> >
> > It's a messy history. lsattr and chattr come from e2fsprogs. Not all
> > file systems support file attributes. And not all COW file systems (or
> > layers) support an option for nocow, in fact Btrfs is the only one I'm
> > aware of that has it. Both as a file attribute, 'chattr +C' but also
> > with a mount option 'nodatacow' - the metadata is always updated per
> > copy-on-write semantic.
> >
> > ZFS is always copy-on-write.
> >
> > XFS is overwrite in place, but supports shared extents (reflink
> > copies). Any overwrite to a shared extent becomes temporarily COW
> > behavior. It's the same behavior on Btrfs if the file is nodatacow and
> > reflinked (or snapshot).
> >
> > For device-mapper layered approaches (thin provisioning, vdo) I'm not
> > certain whether this information is available to higher layers or can
> > be inhibited.
>
> Given this mess, I've taken the simple option and just checked the
> filesystem magic == btrfs.
>
> With this series:
>
>   https://www.redhat.com/archives/libvir-list/2020-July/msg01377.html
>
> any application which builds storage pools in libvirt will automatically
> get "nocow" attribute set on btrfs, unless they take explicit steps to
> override this default.
>
> This will work with virt-manager and GNOME Boxes out of the box IIUC.
>
> Despite the fact that /var/lib/libvirt/images is created by RPM, this
> should still get +C  attribute set by virt-manager, and virt-install
> *provided* using this syntax:
>
>   virt-install --disk size=10
>
> It won't get +C if using the explicit path:
>
>   virt-install --disk size=10,path=/var/lib/libvirt/images/demo.img
>
> I'm thinking most people will use the simpler syntax, so not a big
> deal.
>
> IOW, this should give us reasonable behaviour on btrfs out of the
> box with +C set on directories, and thus auto-inherited by images.


Awesome. Thanks for this work. I've updated the RFE associated with
the 'btrfs by default' feature tracking bug.

-- 
Chris Murphy




Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-20 Thread Daniel P . Berrangé
On Mon, Jul 13, 2020 at 08:06:22AM -0600, Chris Murphy wrote:
> On Mon, Jul 13, 2020 at 6:20 AM Daniel P. Berrangé  
> wrote:
> >
> > On Sat, Jul 11, 2020 at 07:28:43PM -0600, Chris Murphy wrote:
> 
> > > Also, the location for GNOME Boxes doesn't exist at install time, so
> > > the installer doing it with a post-install script isn't a complete
> > > solution.
> > >
> > > While Boxes can use 'qemu-img create -o nocow=on' there is an
> > > advantage of 'chattr +C' on the enclosing directory: files copied into
> > > it, as well as new files created, inherit the attribute. Meanwhile, it
> > > can't be set after the file has non-zero size.
> >
> > Boxes will use libvirt storage pools APIs to create the directory in
> > the first place. So if we add nocow support to the pool APIs, we dont
> > need to worry about per-file usage in most cases.
> >
> >
> > Is there a good way to determine whether a given filesystem supports
> > copy-on-write, other than to simply try to set the +C attribute on a
> > directory ?  Ideally we would check for this feature, not check
> > whether the filesystem is btrfs, as that makes it future proof to
> > other cow supporting filesystems.
> 
> It's a messy history. lsattr and chattr come from e2fsprogs. Not all
> file systems support file attributes. And not all COW file systems (or
> layers) support an option for nocow, in fact Btrfs is the only one I'm
> aware of that has it. Both as a file attribute, 'chattr +C' but also
> with a mount option 'nodatacow' - the metadata is always updated per
> copy-on-write semantic.
> 
> ZFS is always copy-on-write.
> 
> XFS is overwrite in place, but supports shared extents (reflink
> copies). Any overwrite to a shared extent becomes temporarily COW
> behavior. It's the same behavior on Btrfs if the file is nodatacow and
> reflinked (or snapshot).
> 
> For device-mapper layered approaches (thin provisioning, vdo) I'm not
> certain whether this information is available to higher layers or can
> be inhibited.

Given this mess, I've taken the simple option and just checked the
filesystem magic == btrfs.

With this series:

  https://www.redhat.com/archives/libvir-list/2020-July/msg01377.html

any application which builds storage pools in libvirt will automatically
get "nocow" attribute set on btrfs, unless they take explicit steps to
override this default.

This will work with virt-manager and GNOME Boxes out of the box IIUC.

Despite the fact that /var/lib/libvirt/images is created by RPM, this
should still get +C  attribute set by virt-manager, and virt-install
*provided* using this syntax:

  virt-install --disk size=10

It won't get +C if using the explicit path:

  virt-install --disk size=10,path=/var/lib/libvirt/images/demo.img

I'm thinking most people will use the simpler syntax, so not a big
deal.

IOW, this should give us reasonable behaviour on btrfs out of the
box with +C set on directories, and thus auto-inherited by images.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-13 Thread Chris Murphy
On Mon, Jul 13, 2020 at 6:20 AM Daniel P. Berrangé  wrote:
>
> On Sat, Jul 11, 2020 at 07:28:43PM -0600, Chris Murphy wrote:

> > Also, the location for GNOME Boxes doesn't exist at install time, so
> > the installer doing it with a post-install script isn't a complete
> > solution.
> >
> > While Boxes can use 'qemu-img create -o nocow=on' there is an
> > advantage of 'chattr +C' on the enclosing directory: files copied into
> > it, as well as new files created, inherit the attribute. Meanwhile, it
> > can't be set after the file has non-zero size.
>
> Boxes will use libvirt storage pools APIs to create the directory in
> the first place. So if we add nocow support to the pool APIs, we dont
> need to worry about per-file usage in most cases.
>
>
> Is there a good way to determine whether a given filesystem supports
> copy-on-write, other than to simply try to set the +C attribute on a
> directory ?  Ideally we would check for this feature, not check
> whether the filesystem is btrfs, as that makes it future proof to
> other cow supporting filesystems.

It's a messy history. lsattr and chattr come from e2fsprogs. Not all
file systems support file attributes. And not all COW file systems (or
layers) support an option for nocow, in fact Btrfs is the only one I'm
aware of that has it. Both as a file attribute, 'chattr +C' but also
with a mount option 'nodatacow' - the metadata is always updated per
copy-on-write semantic.

ZFS is always copy-on-write.

XFS is overwrite in place, but supports shared extents (reflink
copies). Any overwrite to a shared extent becomes temporarily COW
behavior. It's the same behavior on Btrfs if the file is nodatacow and
reflinked (or snapshot).

For device-mapper layered approaches (thin provisioning, vdo) I'm not
certain whether this information is available to higher layers or can
be inhibited.

-- 
Chris Murphy




Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-13 Thread Daniel P . Berrangé
On Sat, Jul 11, 2020 at 07:28:43PM -0600, Chris Murphy wrote:
> On Fri, Jul 10, 2020 at 6:16 AM Daniel P. Berrangé  
> wrote:
> >
> > On Thu, Jul 09, 2020 at 02:15:50PM -0600, Chris Murphy wrote:
> > > On Thu, Jul 9, 2020 at 12:27 PM Daniel P. Berrangé  
> > > wrote:
> > > Good point. I am open to suggestion/recommendation by Felipe Borges
> > > about GNOME Boxes' ability to do installs by injecting kickstarts. It
> > > might sound nutty but it's quite sane to consider the guest do
> > > something like plain XFS (no LVM). But all of my VM's are as you
> > > describe: guest is btrfs with the checksums and compression, on a raw
> > > file with chattr +C set.
> >
> > GNOME Boxes / virt-install both use libosinfo for doing the automated
> > kickstart installs. In the case of Fedora that's driven by a template
> > common across all versions. We already have to cope with switch from
> > ext3 to ext4 way back in Fedora 10. If new Fedora decides on btrfs
> > by default, we'll need to update the kickstart to follow that
> > recmmendation too
> >
> > https://gitlab.com/libosinfo/osinfo-db/-/blob/master/data/install-script/fedoraproject.org/fedora-kickstart-jeos.xml.in#L80
> 
> Understood.
> 
> 
> > > For lives it's rsync today.  I'm not certain if rsync carries over
> > > file attributes. tar does not. Also not sure if squashfs and
> > > unsquashfs do either. So this might mean an anaconda post-install
> > > script is a more reliable way to go, since Anaconda can support rsync
> > > (Live) and rpm (netinstall,dvd) installs. And there is a proposal
> > > dangling in the wind to use plain squashfs (no nested ext4 as today).
> >
> > Hmm, tricky, so many different scenarios to consider - traditional
> > Anaconda install, install from live CD, plain RPM post-install,
> > and pre-built disk image.
> 
> Also, the location for GNOME Boxes doesn't exist at install time, so
> the installer doing it with a post-install script isn't a complete
> solution.
> 
> While Boxes can use 'qemu-img create -o nocow=on' there is an
> advantage of 'chattr +C' on the enclosing directory: files copied into
> it, as well as new files created, inherit the attribute. Meanwhile, it
> can't be set after the file has non-zero size.

Boxes will use libvirt storage pools APIs to create the directory in
the first place. So if we add nocow support to the pool APIs, we dont
need to worry about per-file usage in most cases.


Is there a good way to determine whether a given filesystem supports
copy-on-write, other than to simply try to set the +C attribute on a
directory ?  Ideally we would check for this feature, not check
whether the filesystem is btrfs, as that makes it future proof to
other cow supporting filesystems.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-11 Thread Chris Murphy
On Fri, Jul 10, 2020 at 6:16 AM Daniel P. Berrangé  wrote:
>
> On Thu, Jul 09, 2020 at 02:15:50PM -0600, Chris Murphy wrote:
> > On Thu, Jul 9, 2020 at 12:27 PM Daniel P. Berrangé  
> > wrote:
> > Good point. I am open to suggestion/recommendation by Felipe Borges
> > about GNOME Boxes' ability to do installs by injecting kickstarts. It
> > might sound nutty but it's quite sane to consider the guest do
> > something like plain XFS (no LVM). But all of my VM's are as you
> > describe: guest is btrfs with the checksums and compression, on a raw
> > file with chattr +C set.
>
> GNOME Boxes / virt-install both use libosinfo for doing the automated
> kickstart installs. In the case of Fedora that's driven by a template
> common across all versions. We already have to cope with switch from
> ext3 to ext4 way back in Fedora 10. If new Fedora decides on btrfs
> by default, we'll need to update the kickstart to follow that
> recmmendation too
>
> https://gitlab.com/libosinfo/osinfo-db/-/blob/master/data/install-script/fedoraproject.org/fedora-kickstart-jeos.xml.in#L80

Understood.


> > For lives it's rsync today.  I'm not certain if rsync carries over
> > file attributes. tar does not. Also not sure if squashfs and
> > unsquashfs do either. So this might mean an anaconda post-install
> > script is a more reliable way to go, since Anaconda can support rsync
> > (Live) and rpm (netinstall,dvd) installs. And there is a proposal
> > dangling in the wind to use plain squashfs (no nested ext4 as today).
>
> Hmm, tricky, so many different scenarios to consider - traditional
> Anaconda install, install from live CD, plain RPM post-install,
> and pre-built disk image.

Also, the location for GNOME Boxes doesn't exist at install time, so
the installer doing it with a post-install script isn't a complete
solution.

While Boxes can use 'qemu-img create -o nocow=on' there is an
advantage of 'chattr +C' on the enclosing directory: files copied into
it, as well as new files created, inherit the attribute. Meanwhile, it
can't be set after the file has non-zero size.


> > It seems reasonable to me that libvirtd can "own"
> > /var/lib/libvirt/images and make these decisions. i.e. if it's empty,
> > and if btrfs, then delete it and recreate as subvolume and also chattr
> > +C
>
> Deleting & recreating as a subvolume is a bit more adventurous than
> I would like to be for something done transparently from the user.
> I think would need that to be an explicit decision somewhere tied
> to the libvirt pool build APIs.  Perhaps virt-install/virt-manager
> can do this though.

It's an optimization, not a prerequisite. And it's reasonable to just
cross this bridge if/when we get to it.

But it has dual benefits. And also a possible third, which is in a
'btrfs send/receive' use case where the incremental difference of the
subvolume can be computed and sent elsewhere, it's possible to
separate (e.g. backup policy) sysroot from VM images. 'btrfs send' has
a fairly cheap mechanism to identify only changed blocks without deep
traversal or comparison. Of course, it has to solve people's actual
problems, and prove itself useful.

--
Chris Murphy




Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-10 Thread Daniel P . Berrangé
On Thu, Jul 09, 2020 at 02:15:50PM -0600, Chris Murphy wrote:
> On Thu, Jul 9, 2020 at 12:27 PM Daniel P. Berrangé  
> wrote:
> >
> > On Thu, Jul 09, 2020 at 08:30:14AM -0600, Chris Murphy wrote:
> > > It's generally recommended by upstream Btrfs development to set
> > > 'nodatacow' using 'chattr +C' on the containing directory for VM
> > > images. By setting it on the containing directory it means new VM
> > > images inherit the attribute, including images copied to this
> > > location.
> > >
> > > But 'nodatacow' also implies no compression and no data checksums.
> > > Could there be use cases in which it's preferred to do compression and
> > > checksumming on VM images? In that case the fragmentation problem can
> > > be mitigated by periodic defragmentation.
> >
> > Setting nodatacow makes sense particularly when using qcow2, since
> > qcow2 is already potentially performing copy-on-write.
> >
> > Skipping compression and data checksums is reasonable in many cases,
> > as the OS inside the VM is often going to be able todo this itself,
> > so we don't want double checksums or double compression as it just
> > wastes performance.
> 
> Good point. I am open to suggestion/recommendation by Felipe Borges
> about GNOME Boxes' ability to do installs by injecting kickstarts. It
> might sound nutty but it's quite sane to consider the guest do
> something like plain XFS (no LVM). But all of my VM's are as you
> describe: guest is btrfs with the checksums and compression, on a raw
> file with chattr +C set.

GNOME Boxes / virt-install both use libosinfo for doing the automated
kickstart installs. In the case of Fedora that's driven by a template
common across all versions. We already have to cope with switch from
ext3 to ext4 way back in Fedora 10. If new Fedora decides on btrfs
by default, we'll need to update the kickstart to follow that
recmmendation too

https://gitlab.com/libosinfo/osinfo-db/-/blob/master/data/install-script/fedoraproject.org/fedora-kickstart-jeos.xml.in#L80


> > > Is this something libvirt can and should do? Possibly by default with
> > > a way for users to opt out?
> >
> > The default /var/lib/libvirt/images directory is created by RPM
> > at install time. Not sure if there's a way to get RPM to set
> > attributes on the dir at this time ?
> 
> For lives it's rsync today.  I'm not certain if rsync carries over
> file attributes. tar does not. Also not sure if squashfs and
> unsquashfs do either. So this might mean an anaconda post-install
> script is a more reliable way to go, since Anaconda can support rsync
> (Live) and rpm (netinstall,dvd) installs. And there is a proposal
> dangling in the wind to use plain squashfs (no nested ext4 as today).

Hmm, tricky, so many different scenarios to consider - traditional
Anaconda install, install from live CD, plain RPM post-install,
and pre-built disk image.

> > Libvirt's storage pool APIs has support for setting "nodatacow"
> > on a per-file basis when creating the images. This was added for
> > btrfs benefit, but this is opt in and I'm not sure any mgmt tools
> > actually use this right now. So in practice it probably dooesn't
> > have any out of the box benefit.
> >
> > The storage pool APIs don't have any feature to set nodatacow
> > for the directory as a whole, but probably we should add this.
> 
> systemd-journald checks for btrfs and sets +C on /var/log/journal if
> it is. This can be inhibited by 'touch
> /etc/tmpfiles.d/journal-nocow.conf'

> > > Another option is to have the installer set 'chattr +C' in the short
> > > term. But this doesn't help GNOME Boxes, since the user home isn't
> > > created at installation time.
> > >
> > > Three advantages of libvirt awareness of Btrfs:
> > >
> > > (a) GNOME Boxes Cockpit, and other users of libvirt can then use this
> > > same mechanism, and apply to their VM image locations.
> > >
> > > (b) Create the containing directory as a subvolume instead of a directory
> > > (1) btrfs snapshots are not recursive, therefore making this a
> > > subvolume would prevent it from being snapshot, and thus (temporarily)
> > > resuming datacow.
> > > (2) in heavy workloads there can be lock contention on file
> > > btrees; a subvolume is a dedicated file btree; so this would reduce
> > > tendency for lock contention in heavy workloads (this is not likely a
> > > desktop/laptop use case)
> >
> > Being able to create subvolumes sounds like a reasonable idea. We already
> > have a ZFS specific storage driver that can do the ZFS equivalent.
> >
> > Again though we'll also need mgmt tools modified to take advantage of
> > this. Not sure how we would make this all work out of the box, with
> > the way we let RPM pre-create /var/lib/libvirt/images, as we'd need
> > different behaviour depending on what filesystem you install the RPM
> > onto.
> 
> It seems reasonable to me that libvirtd can "own"
> /var/lib/libvirt/images and make these decisions. i.e. if it's empty,
> and if btrfs, then delete it and 

Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-09 Thread Chris Murphy
On Thu, Jul 9, 2020 at 12:27 PM Daniel P. Berrangé  wrote:
>
> On Thu, Jul 09, 2020 at 08:30:14AM -0600, Chris Murphy wrote:
> > It's generally recommended by upstream Btrfs development to set
> > 'nodatacow' using 'chattr +C' on the containing directory for VM
> > images. By setting it on the containing directory it means new VM
> > images inherit the attribute, including images copied to this
> > location.
> >
> > But 'nodatacow' also implies no compression and no data checksums.
> > Could there be use cases in which it's preferred to do compression and
> > checksumming on VM images? In that case the fragmentation problem can
> > be mitigated by periodic defragmentation.
>
> Setting nodatacow makes sense particularly when using qcow2, since
> qcow2 is already potentially performing copy-on-write.
>
> Skipping compression and data checksums is reasonable in many cases,
> as the OS inside the VM is often going to be able todo this itself,
> so we don't want double checksums or double compression as it just
> wastes performance.

Good point. I am open to suggestion/recommendation by Felipe Borges
about GNOME Boxes' ability to do installs by injecting kickstarts. It
might sound nutty but it's quite sane to consider the guest do
something like plain XFS (no LVM). But all of my VM's are as you
describe: guest is btrfs with the checksums and compression, on a raw
file with chattr +C set.

>
> > Is this something libvirt can and should do? Possibly by default with
> > a way for users to opt out?
>
> The default /var/lib/libvirt/images directory is created by RPM
> at install time. Not sure if there's a way to get RPM to set
> attributes on the dir at this time ?

For lives it's rsync today.  I'm not certain if rsync carries over
file attributes. tar does not. Also not sure if squashfs and
unsquashfs do either. So this might mean an anaconda post-install
script is a more reliable way to go, since Anaconda can support rsync
(Live) and rpm (netinstall,dvd) installs. And there is a proposal
dangling in the wind to use plain squashfs (no nested ext4 as today).

>
> Libvirt's storage pool APIs has support for setting "nodatacow"
> on a per-file basis when creating the images. This was added for
> btrfs benefit, but this is opt in and I'm not sure any mgmt tools
> actually use this right now. So in practice it probably dooesn't
> have any out of the box benefit.
>
> The storage pool APIs don't have any feature to set nodatacow
> for the directory as a whole, but probably we should add this.

systemd-journald checks for btrfs and sets +C on /var/log/journal if
it is. This can be inhibited by 'touch
/etc/tmpfiles.d/journal-nocow.conf'


>
> > Another option is to have the installer set 'chattr +C' in the short
> > term. But this doesn't help GNOME Boxes, since the user home isn't
> > created at installation time.
> >
> > Three advantages of libvirt awareness of Btrfs:
> >
> > (a) GNOME Boxes Cockpit, and other users of libvirt can then use this
> > same mechanism, and apply to their VM image locations.
> >
> > (b) Create the containing directory as a subvolume instead of a directory
> > (1) btrfs snapshots are not recursive, therefore making this a
> > subvolume would prevent it from being snapshot, and thus (temporarily)
> > resuming datacow.
> > (2) in heavy workloads there can be lock contention on file
> > btrees; a subvolume is a dedicated file btree; so this would reduce
> > tendency for lock contention in heavy workloads (this is not likely a
> > desktop/laptop use case)
>
> Being able to create subvolumes sounds like a reasonable idea. We already
> have a ZFS specific storage driver that can do the ZFS equivalent.
>
> Again though we'll also need mgmt tools modified to take advantage of
> this. Not sure how we would make this all work out of the box, with
> the way we let RPM pre-create /var/lib/libvirt/images, as we'd need
> different behaviour depending on what filesystem you install the RPM
> onto.

It seems reasonable to me that libvirtd can "own"
/var/lib/libvirt/images and make these decisions. i.e. if it's empty,
and if btrfs, then delete it and recreate as subvolume and also chattr
+C

There's also precedent by systemd to create its own nested subvolumes
for containers run by nspawn.


Thanks,


-- 
Chris Murphy




Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-09 Thread Daniel P . Berrangé
On Thu, Jul 09, 2020 at 08:30:14AM -0600, Chris Murphy wrote:
> It's generally recommended by upstream Btrfs development to set
> 'nodatacow' using 'chattr +C' on the containing directory for VM
> images. By setting it on the containing directory it means new VM
> images inherit the attribute, including images copied to this
> location.
> 
> But 'nodatacow' also implies no compression and no data checksums.
> Could there be use cases in which it's preferred to do compression and
> checksumming on VM images? In that case the fragmentation problem can
> be mitigated by periodic defragmentation.

Setting nodatacow makes sense particularly when using qcow2, since
qcow2 is already potentially performing copy-on-write.

Skipping compression and data checksums is reasonable in many cases,
as the OS inside the VM is often going to be able todo this itself,
so we don't want double checksums or double compression as it just
wastes performance.

> Is this something libvirt can and should do? Possibly by default with
> a way for users to opt out?

The default /var/lib/libvirt/images directory is created by RPM
at install time. Not sure if there's a way to get RPM to set
attributes on the dir at this time ?

Libvirt's storage pool APIs has support for setting "nodatacow"
on a per-file basis when creating the images. This was added for
btrfs benefit, but this is opt in and I'm not sure any mgmt tools
actually use this right now. So in practice it probably dooesn't
have any out of the box benefit.

The storage pool APIs don't have any feature to set nodatacow
for the directory as a whole, but probably we should add this.

> Another option is to have the installer set 'chattr +C' in the short
> term. But this doesn't help GNOME Boxes, since the user home isn't
> created at installation time.
> 
> Three advantages of libvirt awareness of Btrfs:
> 
> (a) GNOME Boxes Cockpit, and other users of libvirt can then use this
> same mechanism, and apply to their VM image locations.
> 
> (b) Create the containing directory as a subvolume instead of a directory
> (1) btrfs snapshots are not recursive, therefore making this a
> subvolume would prevent it from being snapshot, and thus (temporarily)
> resuming datacow.
> (2) in heavy workloads there can be lock contention on file
> btrees; a subvolume is a dedicated file btree; so this would reduce
> tendency for lock contention in heavy workloads (this is not likely a
> desktop/laptop use case)

Being able to create subvolumes sounds like a reasonable idea. We already
have a ZFS specific storage driver that can do the ZFS equivalent.

Again though we'll also need mgmt tools modified to take advantage of
this. Not sure how we would make this all work out of the box, with
the way we let RPM pre-create /var/lib/libvirt/images, as we'd need
different behaviour depending on what filesystem you install the RPM
onto.

> (c) virtiofs might be able to take advantage of btrfs subvolumes.

Libvirt doesn't currently do anything much wrt virtiofs except
configure QEMU.  The creation of the directory containing the
share and populating its contents is left as an exercise for the
user/admin/mgmt tool.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Setting 'nodatacow' on VM image when on Btrfs

2020-07-09 Thread Chris Murphy
It's generally recommended by upstream Btrfs development to set
'nodatacow' using 'chattr +C' on the containing directory for VM
images. By setting it on the containing directory it means new VM
images inherit the attribute, including images copied to this
location.

But 'nodatacow' also implies no compression and no data checksums.
Could there be use cases in which it's preferred to do compression and
checksumming on VM images? In that case the fragmentation problem can
be mitigated by periodic defragmentation.

Is this something libvirt can and should do? Possibly by default with
a way for users to opt out?

Another option is to have the installer set 'chattr +C' in the short
term. But this doesn't help GNOME Boxes, since the user home isn't
created at installation time.

Three advantages of libvirt awareness of Btrfs:

(a) GNOME Boxes Cockpit, and other users of libvirt can then use this
same mechanism, and apply to their VM image locations.

(b) Create the containing directory as a subvolume instead of a directory
(1) btrfs snapshots are not recursive, therefore making this a
subvolume would prevent it from being snapshot, and thus (temporarily)
resuming datacow.
(2) in heavy workloads there can be lock contention on file
btrees; a subvolume is a dedicated file btree; so this would reduce
tendency for lock contention in heavy workloads (this is not likely a
desktop/laptop use case)

(c) virtiofs might be able to take advantage of btrfs subvolumes.


(This is a partial rewrite of
https://bugzilla.redhat.com/show_bug.cgi?id=1855000)

Thanks,

-- 
Chris Murphy