Re: btrfs is using 25% more disk than it should
On Sat, Dec 20, 2014 at 06:28:22AM -0500, Josef Bacik wrote: We now have two extents with the same bytenr but with different lengths. [...] Then there is the problem of actually returning the free space. Now if we drop all of the refs for an extent we know the space is free and we return it to the allocator. With the above example we can't do that anymore, we have to check the extent tree for any area that is left overlapping the area we just freed. This add's another search to every btrfs_free_extent operation, which slows the whole system down and again leaves us with weird corner cases and pain for the users. Plus this would be an incompatible format change so would require setting a feature flag in the fs and rolled to voluntarily. Ouchie. Now I have another solution, but I'm not convinced it's awesome either. Take the same example above, but instead we split the original extent in the extent tree so we avoid all the mess of having overlapping ranges Would this work for a read-only snapshot? For a read-write snapshot it would be as if we had modified both (or all, if there are multiple snapshots) versions of the tree with split extents. This wouldn't require a format change so everybody would get this behaviour as soon as we turned it on It could be a mount option, like autodefrag, off by default until the bugs were worked out. Arguably there could be a 'garbage-collection tool' similar to 'btrfs fi defrag', that could be used to clean out any large partially-obscured extents from specific files. This might be important for deduplication as well (although the extent-same code looks like it does split extents?). Definitely something to think about. Thanks for the detailed explanations. signature.asc Description: Digital signature
Re: btrfs is using 25% more disk than it should
Ok, so this is what I did: 1. Copied the sparse 315GB (with 302GB inside) to another server 2. Re-formatted the btrfs partition 3. chattr +C on the parent dir 4. Copied the 315GB file back to the btrfs partition (the file is not sparse any more due to the copying) This is the end result: root@s4 /opt/drives/ssd # ls -alhs total 316G 16K drwxr-xr-x 1 libvirt-qemu libvirt-qemu 42 Dec 20 07:00 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 315G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 20 09:11 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 20 06:53 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 316G. root@s4 /opt/drives/ssd # df -h /dev/md3411G 316G 94G 78% /opt/drives/ssd root@s4 /opt/drives/ssd # btrfs filesystem df /opt/drives/ssd Data, single: total=323.01GiB, used=315.08GiB System, DUP: total=8.00MiB, used=64.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=1.00GiB, used=880.00KiB Metadata, single: total=8.00MiB, used=0.00 unknown, single: total=16.00MiB, used=0.00 root@s4 /opt/drives/ssd # lsattr ./snapshots ---C ./disk_208.img As you can see, it looks much better now. The file takes as much space as it should and the Metadata is only 880kb. I will do some writes inside the VM and see if the file grows on the outside. If everything is ok, it should not. 2014-12-20 5:17 GMT+08:00 Josef Bacik jba...@fb.com: On 12/19/2014 04:10 PM, Josef Bacik wrote: On 12/18/2014 09:59 AM, Daniele Testa wrote: Hey, I am hoping you guys can shed some light on my issue. I know that it's a common question that people see differences in the disk used when running different calculations, but I still think that my issue is weird. root@s4 / # mount /dev/md3 on /opt/drives/ssd type btrfs (rw,noatime,compress=zlib,discard,nospace_cache) root@s4 / # btrfs filesystem df /opt/drives/ssd Data: total=407.97GB, used=404.08GB System, DUP: total=8.00MB, used=52.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.25GB, used=672.21MB Metadata: total=8.00MB, used=0.00 root@s4 /opt/drives/ssd # ls -alhs total 302G 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 302G. As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? Hello and welcome to the wonderful world of btrfs, where COW can really suck hard without being super clear why! It's 4pm on a Friday right before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to use pretty pictures. You have this case to start with file offset 0 offset 302g [-prealloced 302g extent--] (man it's impressive I got all that lined up right) On disk you have 2 things. First your file which has file extents which says inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g and then in the extent tree, who keeps track of actual allocated space has this extent bytenr 123, len 302g, refs 1 Now say you boot up your virt image and it writes 1 4k block to offset 0. Now you have this [4k][302g-4k--] And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file
Re: btrfs is using 25% more disk than it should
On 12/20/2014 01:18 AM, Daniele Testa wrote: But I read somewhere that compression should be turned off on mounts that just store large VM-images. Is that wrong? It doesn't really matter frankly. Usually virt images are preallocated with fallocate which means compression doesn't happen as writes into fallocated areas aren't compressed, but you aren't doing that so you would be getting some compression. Btw, I am not pre-allocation space for the images. I use sparse files with: dd if=/dev/zero of=drive.img bs=1 count=1 seek=300G It creates the file in a few ms. Is it better to use fallocate with btrfs? It depends. If you are going to use nodatacow for your virt images then I would definitely suggest using fallocate since you'll get a nice contiguous chunk of data for your virt images. If I use sparse files, it adds a benefit when I want to copy/move the image-file to another server. Like if the 300GB sparse file just has 10GB of data in it, I only need to copy 10GB when moving it to another server. Would the same be true with fallocate? No, but send/receive would only copy 10GB, but the resulting file would be sparse. Anyways, would disabling CoW (by putting +C on the parent dir) prevent the performance issues and 2*filesize issue? Yes. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/19/2014 01:17 PM, Josef Bacik wrote: tl;dr: Cow means you can in the worst case end up using 2 * filesize - blocksize of data on disk and the file will appear to be filesize. Thanks, Doesn't the worst case more like N^log(N) (when N is file in blocksize) in the pernicious case? Staggered block overwrites can peer down through gaps to create more than two layers of retention. The only real requirement is that each layer get smaller than the one before it so as to leave some of each of it's predecessor visible. So if I make a file size N blocks, then overwrite it with N-1 blocks, then overwrite it again with N-2 blocks (etc). I can easily create a deep slop of obscured data. [-] [] [---] [--] [-] [] [---] [--] [-] (etc...) Or would I have to bracket the front and back -- -- Or could I bracket the sides - --- --- -- -- - - There's got to be pahological patterns like this that can end up with a heck of a lot of hidden data. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/20/2014 12:52 AM, Zygo Blaxell wrote: On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote: And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, disklen 302g and in the extent tree you have this extent bytenr 123, len 302g, refs 2 extent bytenr whatever, len 4k, refs 1 extent bytenr notimportant, len 4k, refs 1 See that refs 2 change? We split the original extent, so we have 2 file extents pointing to the same physical extents, so we bumped the ref count. This will happen over and over again until we have completely overwritten the original extent, at which point your space usage will go back down to ~302g. Wait, *what*? OK, I did a small experiment, and found that btrfs actually does do something like this. Can't argue with fact, though it would be nice if btrfs could be smarter and drop unused portions of the original extent sooner. :-P So we've thought about changing this, and will eventually, but it's kind of difficult. Above is an example of what happens currently, so the split code for file extents is kind of big and scary, check __btrfs_drop_extents. We would have to fix that to adjust the disk_bytenr and disk_num_bytes, which isn't too bad since we already are doing this dance and adjusting offset. The trick would be when updating the extent references, we would have to split those extents. So say we have a 128mb extent and we write 4k at 1mb. If we split the extent refs we'd have this afterwards (note this isn't how they'd be ordered on disk, just written this way so it makes logical sense) extent bytenr 0, len 1mb, refs 1 extent bytenr 128mb, len 4k, refs 1 extent bytenr 1mb+4k, len 128mb-4k, refs 1 Ok so now we have 3 extents in the extent tree to describe essentially 2 ranges that are in use, but we get back the 4k so that's nice. But wait there's more! What if we're snapshotted? We can't just drop that 4k because somebody else has a reference to it. So what do we do? Well we could do something like this extent bytenr 0, len 1mb, refs 1 extent bytenr 0, len 128mb, refs 1 extent bytenr 128mb, len 4k, refs 1 extent bytenr 1mb+4k, len 128mb-4k, refs 1 This creates all sorts of problems for us. We now have two extents with the same bytenr but with different lengths. This could be ok, we'd have to add a bunch of checks to make sure we're looking at the right extent, but it wouldn't be horrible. I imagine we'd be fixing weird corruption bugs for a few releases though while we found all of the corner cases we missed. Then there is the problem of actually returning the free space. Now if we drop all of the refs for an extent we know the space is free and we return it to the allocator. With the above example we can't do that anymore, we have to check the extent tree for any area that is left overlapping the area we just freed. This add's another search to every btrfs_free_extent operation, which slows the whole system down and again leaves us with weird corner cases and pain for the users. Plus this would be an incompatible format change so would require setting a feature flag in the fs and rolled to voluntarily. Now I have another solution, but I'm not convinced it's awesome either. Take the same example above, but instead we split the original extent in the extent tree so we avoid all the mess of having overlapping ranges and get this instead extent bytenr 0, len 1mb, refs 2 extent bytenr 1mb, len 4k, refs 1 -- part of the original extent pointed to by the snapshot extent bytenr 128mb, len 4k, refs 1 extent bytenr 1mb+4k, len 128mb-4k, refs 2 So yay we've solved the problem of overlapping extents and bonus this is backwards compatible. So why don't we do this? Well all the reasons I listed above about corner cases and much pain for our users. This wouldn't require a format change so everybody would get this behaviour as soon as we turned it on, and I feel I would be doing a lot of fsck work for the next 6 months. Plus we would have to add a 'split' operation to the extent operations that copies all of the
Re: btrfs is using 25% more disk than it should
On 12/20/2014 06:23 AM, Robert White wrote: On 12/19/2014 01:17 PM, Josef Bacik wrote: tl;dr: Cow means you can in the worst case end up using 2 * filesize - blocksize of data on disk and the file will appear to be filesize. Thanks, Doesn't the worst case more like N^log(N) (when N is file in blocksize) in the pernicious case? Staggered block overwrites can peer down through gaps to create more than two layers of retention. The only real requirement is that each layer get smaller than the one before it so as to leave some of each of it's predecessor visible. So if I make a file size N blocks, then overwrite it with N-1 blocks, then overwrite it again with N-2 blocks (etc). I can easily create a deep slop of obscured data. [-] [] [---] [--] [-] [] [---] [--] [-] (etc...) Or would I have to bracket the front and back -- -- Or could I bracket the sides - --- --- -- -- - - There's got to be pahological patterns like this that can end up with a heck of a lot of hidden data. Just the sloped case would do it, the pathological case would result in way more used than you expect. So I guess the worst case would be something like (num_blocks + (num_blocks - 1)!) * blocksize in actually size usage. Our extents are limited to 128mb in size, but still that ends up being pretty huge. I'm actually going to do this locally and see what happens. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/20/2014 03:39 AM, Josef Bacik wrote: On 12/20/2014 06:23 AM, Robert White wrote: On 12/19/2014 01:17 PM, Josef Bacik wrote: tl;dr: Cow means you can in the worst case end up using 2 * filesize - blocksize of data on disk and the file will appear to be filesize. Thanks, Doesn't the worst case more like N^log(N) (when N is file in blocksize) in the pernicious case? Staggered block overwrites can peer down through gaps to create more than two layers of retention. The only real requirement is that each layer get smaller than the one before it so as to leave some of each of it's predecessor visible. So if I make a file size N blocks, then overwrite it with N-1 blocks, then overwrite it again with N-2 blocks (etc). I can easily create a deep slop of obscured data. [-] [] [---] [--] [-] [] [---] [--] [-] (etc...) Or would I have to bracket the front and back -- -- Or could I bracket the sides - --- --- -- -- - - There's got to be pahological patterns like this that can end up with a heck of a lot of hidden data. Just the sloped case would do it, the pathological case would result in way more used than you expect. So I guess the worst case would be something like (num_blocks + (num_blocks - 1)!) * blocksize I think that for a single file it's not factorial but consecutive sum. (One of Gauss' equations.) so max=((n * (n+1))/2)*blocksize A lot smaller than factorial but still n^2+n blocks, which is nothing to discard lightly. in actually size usage. Our extents are limited to 128mb in size, but still that ends up being pretty huge. I'm actually going to do this locally and see what happens. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/19/2014 01:10 PM, Josef Bacik wrote: On 12/18/2014 09:59 AM, Daniele Testa wrote: Hey, I am hoping you guys can shed some light on my issue. I know that it's a common question that people see differences in the disk used when running different calculations, but I still think that my issue is weird. root@s4 / # mount /dev/md3 on /opt/drives/ssd type btrfs (rw,noatime,compress=zlib,discard,nospace_cache) root@s4 / # btrfs filesystem df /opt/drives/ssd Data: total=407.97GB, used=404.08GB System, DUP: total=8.00MB, used=52.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.25GB, used=672.21MB Metadata: total=8.00MB, used=0.00 root@s4 /opt/drives/ssd # ls -alhs total 302G 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 302G. As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? Hello and welcome to the wonderful world of btrfs, where COW can really suck hard without being super clear why! It's 4pm on a Friday right before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to use pretty pictures. You have this case to start with file offset 0 offset 302g [-prealloced 302g extent--] (man it's impressive I got all that lined up right) On disk you have 2 things. First your file which has file extents which says inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g and then in the extent tree, who keeps track of actual allocated space has this extent bytenr 123, len 302g, refs 1 Now say you boot up your virt image and it writes 1 4k block to offset 0. Now you have this [4k][302g-4k--] And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, disklen 302g and in the extent tree you have this extent bytenr 123, len 302g, refs 2 extent bytenr whatever, len 4k, refs 1 extent bytenr notimportant, len 4k, refs 1 See that refs 2 change? We split the original extent, so we have 2 file extents pointing to the same physical extents, so we bumped the ref count. This will happen over and over again until we have completely overwritten the original extent, at which point your space usage will go back down to ~302g. We split big extents with cow, so unless you've got lots of space to spare or are going to use nodatacow you should probably not pre-allocate virt images. Thanks, Stll too new to the code base to offer much other than psudocode... Is it easy to find all the inodes that are using a particular extent at runtime? It occurs to me that since every extent starts life with exactly one owner, a scruplous breaking of extents can prevent the unbounded left-overlap problem... If the preexisting extent is always broken up into two or three new extents wherever it's being referenced, then problematic overlaps are eleminated and dead data can be discarded as soon as it's actually dead. So in the exemplar case '.' == preexisting extent '+' == new written extent '-' == preexisting described by new
Re: btrfs is using 25% more disk than it should
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/18/2014 9:59 AM, Daniele Testa wrote: As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. So you don't have any snapshots or other subvolumes? However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Did you flag the file as nodatacow? Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? You probably have the data checksums enabled and that isn't unreasonable for checksums on 302g of data. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUlHQyAAoJENRVrw2cjl5RZWEIAKfdDzlNVrD/IYDZ5wzIeg5P DR5H8anGGc2QPTAD76vEX/XA7/j1Kg+PbQRHGdz6Iq2+Vq4CGno/yIi46oVVVYaL H4XvuH7GvPJyzHJ+XCMHjPGLrSCBxgIm1XSluNXmFNCwqi/FONk8TUhWsw7JchaZ yCVe/82YI+MLZhmJdudt48MeNFzW6LYi58dQo/JfYnTGnpZAFutdgBM7vLmnqLY2 WVLQUNHZsHBa7solttCuRtc4h8ku9FBObfKKYNPAEn1YWfx7bihWgPeBMH/blsza yhpMq96OMhIfn2SmIZMSwGh2ys+AxQQfymYR69fyGYTIajHmJEhJUzltuQD9Yg8= =Z9/S -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
No, I don't have any snapshots or subvolumes. Only that single file. The file has both checksums and datacow on it. I will do chattr +C on the parent dir and re-create the file to make sure all files are marked as nodatacow. Should I also turn off checksums with the mount-flags if this filesystem only contain big VM-files? Or is it not needed if I put +C on the parent dir? 2014-12-20 2:53 GMT+08:00 Phillip Susi ps...@ubuntu.com: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/18/2014 9:59 AM, Daniele Testa wrote: As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. So you don't have any snapshots or other subvolumes? However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Did you flag the file as nodatacow? Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? You probably have the data checksums enabled and that isn't unreasonable for checksums on 302g of data. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUlHQyAAoJENRVrw2cjl5RZWEIAKfdDzlNVrD/IYDZ5wzIeg5P DR5H8anGGc2QPTAD76vEX/XA7/j1Kg+PbQRHGdz6Iq2+Vq4CGno/yIi46oVVVYaL H4XvuH7GvPJyzHJ+XCMHjPGLrSCBxgIm1XSluNXmFNCwqi/FONk8TUhWsw7JchaZ yCVe/82YI+MLZhmJdudt48MeNFzW6LYi58dQo/JfYnTGnpZAFutdgBM7vLmnqLY2 WVLQUNHZsHBa7solttCuRtc4h8ku9FBObfKKYNPAEn1YWfx7bihWgPeBMH/blsza yhpMq96OMhIfn2SmIZMSwGh2ys+AxQQfymYR69fyGYTIajHmJEhJUzltuQD9Yg8= =Z9/S -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/19/2014 2:59 PM, Daniele Testa wrote: No, I don't have any snapshots or subvolumes. Only that single file. The file has both checksums and datacow on it. I will do chattr +C on the parent dir and re-create the file to make sure all files are marked as nodatacow. Should I also turn off checksums with the mount-flags if this filesystem only contain big VM-files? Or is it not needed if I put +C on the parent dir? If you don't want the overhead of those checksums, then yea. Also I would question why you are using btrfs to hold only big vm files in the first place. You would be better off using lvm thinp volumes instead of files, though personally I prefer to just use regular lvm volumes and manually allocate enough space. It avoids the fragmentation you get from thin provisioning ( or qcow2 ) at the cost of a bit of overallocated space and the need to do some manual resizing to add more if and when it is needed. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUlIwGAAoJENRVrw2cjl5RlGEH/1OYz07C/OjGBASA9IHTCVMV NkYHnO3/s2+SOsafQj4ej/RifgX9aG43b8Y6z9XAdosG/X+8z7xRjW9Nic0H5beK JZRpwP+02Dw02A3/RSPjGqJBeAmS8yi9yTlunnPaCau+m1kPYL4M/vFM8/hqrGeU Jy+jbffX+XtOedBWptxnDVIyXpYskgVyH8AmQ9d3TGrv52jw/QY1BxkuoVG60hBU Fk4Q8ed43C9zjCVihmkDOeER6Ygr1roDb1/gFLoeCk4FwVLO9Kusft2Qi2oXyHy1 iTkoVJan8NRzXBhrPtZexxQdewHSw9Z4wyHxlal3b/xIbRf6/DRwPRHfgG5djvM= =AqC/ -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/18/2014 09:59 AM, Daniele Testa wrote: Hey, I am hoping you guys can shed some light on my issue. I know that it's a common question that people see differences in the disk used when running different calculations, but I still think that my issue is weird. root@s4 / # mount /dev/md3 on /opt/drives/ssd type btrfs (rw,noatime,compress=zlib,discard,nospace_cache) root@s4 / # btrfs filesystem df /opt/drives/ssd Data: total=407.97GB, used=404.08GB System, DUP: total=8.00MB, used=52.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.25GB, used=672.21MB Metadata: total=8.00MB, used=0.00 root@s4 /opt/drives/ssd # ls -alhs total 302G 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 302G. As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? Hello and welcome to the wonderful world of btrfs, where COW can really suck hard without being super clear why! It's 4pm on a Friday right before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to use pretty pictures. You have this case to start with file offset 0 offset 302g [-prealloced 302g extent--] (man it's impressive I got all that lined up right) On disk you have 2 things. First your file which has file extents which says inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g and then in the extent tree, who keeps track of actual allocated space has this extent bytenr 123, len 302g, refs 1 Now say you boot up your virt image and it writes 1 4k block to offset 0. Now you have this [4k][302g-4k--] And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, disklen 302g and in the extent tree you have this extent bytenr 123, len 302g, refs 2 extent bytenr whatever, len 4k, refs 1 extent bytenr notimportant, len 4k, refs 1 See that refs 2 change? We split the original extent, so we have 2 file extents pointing to the same physical extents, so we bumped the ref count. This will happen over and over again until we have completely overwritten the original extent, at which point your space usage will go back down to ~302g. We split big extents with cow, so unless you've got lots of space to spare or are going to use nodatacow you should probably not pre-allocate virt images. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/19/2014 02:59 PM, Daniele Testa wrote: No, I don't have any snapshots or subvolumes. Only that single file. The file has both checksums and datacow on it. I will do chattr +C on the parent dir and re-create the file to make sure all files are marked as nodatacow. Should I also turn off checksums with the mount-flags if this filesystem only contain big VM-files? Or is it not needed if I put +C on the parent dir? Please God don't turn off of checksums. Checksums are tracked in metadata anyway, they won't show up in the data accounting. Our csums are 8 bytes per block, so basic math says you are going to max out at 604 megabytes for that big of a file. Please people try to only take advice from people who know what they are talking about. So unless it's from somebody who has commits in btrfs/btrfs-progs take their feedback with a grain of salt. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/19/2014 04:10 PM, Josef Bacik wrote: On 12/18/2014 09:59 AM, Daniele Testa wrote: Hey, I am hoping you guys can shed some light on my issue. I know that it's a common question that people see differences in the disk used when running different calculations, but I still think that my issue is weird. root@s4 / # mount /dev/md3 on /opt/drives/ssd type btrfs (rw,noatime,compress=zlib,discard,nospace_cache) root@s4 / # btrfs filesystem df /opt/drives/ssd Data: total=407.97GB, used=404.08GB System, DUP: total=8.00MB, used=52.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.25GB, used=672.21MB Metadata: total=8.00MB, used=0.00 root@s4 /opt/drives/ssd # ls -alhs total 302G 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 302G. As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? Hello and welcome to the wonderful world of btrfs, where COW can really suck hard without being super clear why! It's 4pm on a Friday right before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to use pretty pictures. You have this case to start with file offset 0 offset 302g [-prealloced 302g extent--] (man it's impressive I got all that lined up right) On disk you have 2 things. First your file which has file extents which says inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g and then in the extent tree, who keeps track of actual allocated space has this extent bytenr 123, len 302g, refs 1 Now say you boot up your virt image and it writes 1 4k block to offset 0. Now you have this [4k][302g-4k--] And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, disklen 302g and in the extent tree you have this extent bytenr 123, len 302g, refs 2 extent bytenr whatever, len 4k, refs 1 extent bytenr notimportant, len 4k, refs 1 See that refs 2 change? We split the original extent, so we have 2 file extents pointing to the same physical extents, so we bumped the ref count. This will happen over and over again until we have completely overwritten the original extent, at which point your space usage will go back down to ~302g. We split big extents with cow, so unless you've got lots of space to spare or are going to use nodatacow you should probably not pre-allocate virt images. Thanks, Sorry should have added a tl;dr: Cow means you can in the worst case end up using 2 * filesize - blocksize of data on disk and the file will appear to be filesize. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/19/2014 4:15 PM, Josef Bacik wrote: Please God don't turn off of checksums. Checksums are tracked in metadata anyway, they won't show up in the data accounting. Our csums are 8 bytes per block, so basic math says you are going to max out at 604 megabytes for that big of a file. Yes, and it is exactly that metadata space he is complaining about. So if you don't want to use up all of that space ( and have no use for the checksums ), then you turn them off. Please people try to only take advice from people who know what they are talking about. So unless it's from somebody who has commits in btrfs/btrfs-progs take their feedback with a grain of salt. Thanks, Well that is rather arrogant and rude. For that matter, I *do* have commits in btrfs-progs. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUlJ5gAAoJENRVrw2cjl5RZ5MIAI0Ok0q0hFTMcYYXu1U48R4Z AsuRg6zQDMOa9C1SqZucH2cuiiaGU8XixKcscaquoJDzzaND2kuy+sxp0k2YQnGz +/269OmZUtwjYil1NcSFTJiE2bYUAx1R+xWUGax/03NsXRr672f0EtAQ2sIitTaG WsNUhiU0GREpQL6pK403fO79eD2vRmgCx2w50gB2OYPQYciJ+YN0YAJ7z8VEmUro M9xqce2oc7haAHliDvazl+7IDRkkiZ7FcpSs2nBSqiHiUhgVaxuTzHZEXvUasE5l LamJCwiSwuevWWPCDE4N/r7qVcamKM2K/DMvZCiOuPkSm3YkcVyrUd8x4i8OEJs= =8R13 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On 12/19/2014 04:53 PM, Phillip Susi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/19/2014 4:15 PM, Josef Bacik wrote: Please God don't turn off of checksums. Checksums are tracked in metadata anyway, they won't show up in the data accounting. Our csums are 8 bytes per block, so basic math says you are going to max out at 604 megabytes for that big of a file. Yes, and it is exactly that metadata space he is complaining about. So if you don't want to use up all of that space ( and have no use for the checksums ), then you turn them off. Please people try to only take advice from people who know what they are talking about. So unless it's from somebody who has commits in btrfs/btrfs-progs take their feedback with a grain of salt. Thanks, Well that is rather arrogant and rude. For that matter, I *do* have commits in btrfs-progs. root@destiny ~/btrfs-progs# git log --oneline --author=Phillip Susi c65345d btrfs-progs: document --rootdir mkfs switch f6b6e93 btrfs-progs: removed extraneous whitespace from mkfs man page Sorry I should have qualified that statement better. So unless it's from somebody who has had commits to meaningful portions of btrfs/btrfs-progs take their feedback with a grain of salt. There are too many people on this list who give random horribly wrong advice to users that can result in data loss or corruption. Now I'll admit I read her question wrong so what you said wasn't incorrect, I'm sorry for that. I've seen a lot of people responding to questions recently that I don't recognize that have been completely full of crap, I just assumed you were in that camp as well. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
Daniele Testa posted on Sat, 20 Dec 2014 03:59:42 +0800 as excerpted: The file has both checksums and datacow on it. I will do chattr +C on the parent dir and re-create the file to make sure all files are marked as nodatacow. Should I also turn off checksums with the mount-flags if this filesystem only contain big VM-files? Or is it not needed if I put +C on the parent dir? FWIW... Turning off datacow, whether by chattr +C on the parent dir before creating the file, or via mount option, turns off checksumming as well. (For completeness, it also turns off compression, but I don't think that applies in your case.) In general, active VM images (and database files) with default flags tend to get very highly fragmented very fast, due to btrfs' default COW on a file with a heavy internal rewrite pattern (as opposed to append-only or full rename/replace on rewrite). For relatively small files with this rewrite pattern, think typical desktop firefox sqlite database files of a quarter GiB or less, the btrfs autodefrag mount option can be helpful, but because it triggers a rewrite of the entire file, as filesize goes up, the viability of autodefrag goes down, and at somewhere around half a gig, autodefrag doesn't work so well any more, particularly on very active files where the incoming rewrite stream may be faster than btrfs can rewrite the entire file. Making heavy-internal-rewrite pattern files of over say half a GiB in size nocow is one suggested solution. However, snapshots lock in place the existing version, causing a one-time COW after a snapshot. If people are doing frequent automated snapshots (say once an hour), this can be a big problem, as the file ends up fragmenting pretty badly with these 1- cow writes as well. That's where snapshots come into the picture. There are ways to work around the problem (put the files in question on a subvolume and don't snapshot it as often as the parent, setup a cron job to do say weekly defrag on the files in question, etc), but since you don't have snapshots going anyway, that's not a concern for you except as a preventative -- consider it if you /do/ start doing snapshots. So anyway, as I said, creating the file nocow (whether by mount option or chattr) will turn off checksumming too. But on something that frequently internally rewritten, where corruption will very likely corrupt the VM anyway and there's already mechanisms in place to deal with that (either VM integrity mechanisms, or backups, or simply disposable VMs, fire up a new one when necessary), at least with btrfs single-mode-data where there's no second copy to restore from if the checksum /does/ fail, turning off checksumming isn't necessarily as bad as it may seem anyway. And it /should/ save you some on the metadata... tho I'd not consider that savings worth turning off checksumming if that were the /only/ reason, on its own. The metadata difference is more a nice side-effect of an already commonly recommended practice for large VM image files, than something you'd turn off checksumming for in the first place. Certainly, on most files I'd prefer the checksums, and in fact am running btrfs raid1 mode here specifically to get the benefit of having a second copy to retrieve from if the first attempted copy fails checksum. But VM images and database files are a bit of an exception. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
Josef Bacik posted on Fri, 19 Dec 2014 16:17:08 -0500 as excerpted: tl;dr: Cow means you can in the worst case end up using 2 * filesize - blocksize of data on disk and the file will appear to be filesize. Thanks for the tl;dr /and/ the very sensible longer explanation. That's a very nice thing to know and to file away for further reference. =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote: And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, disklen 302g and in the extent tree you have this extent bytenr 123, len 302g, refs 2 extent bytenr whatever, len 4k, refs 1 extent bytenr notimportant, len 4k, refs 1 See that refs 2 change? We split the original extent, so we have 2 file extents pointing to the same physical extents, so we bumped the ref count. This will happen over and over again until we have completely overwritten the original extent, at which point your space usage will go back down to ~302g. Wait, *what*? OK, I did a small experiment, and found that btrfs actually does do something like this. Can't argue with fact, though it would be nice if btrfs could be smarter and drop unused portions of the original extent sooner. :-P The above quoted scenario is a little oversimplified. Chances are that 302G file is made of much smaller extents (128M..256M). If the VM is writing 4K randomly everywhere then those 128M+ extents are not going away any time soon. Even the extents that are dropped stick around for a few btrfs transaction commits before they go away. I couldn't reproduce this behavior until I realized the extents I was overwriting in my tests were exactly the same size and position of the extents on disk. I changed the offset slightly and found that partially-overwritten extents do in fact stick around in their entirety. There seems to be an unexpected benefit for compression here: compression keeps the extents small, so many small updates will be less likely to leave big mostly-unused extents lying around the filesystem. signature.asc Description: Digital signature
Re: btrfs is using 25% more disk than it should
But I read somewhere that compression should be turned off on mounts that just store large VM-images. Is that wrong? Btw, I am not pre-allocation space for the images. I use sparse files with: dd if=/dev/zero of=drive.img bs=1 count=1 seek=300G It creates the file in a few ms. Is it better to use fallocate with btrfs? If I use sparse files, it adds a benefit when I want to copy/move the image-file to another server. Like if the 300GB sparse file just has 10GB of data in it, I only need to copy 10GB when moving it to another server. Would the same be true with fallocate? Anyways, would disabling CoW (by putting +C on the parent dir) prevent the performance issues and 2*filesize issue? 2014-12-20 13:52 GMT+08:00 Zygo Blaxell ce3g8...@umail.furryterror.org: On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote: And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, disklen 302g and in the extent tree you have this extent bytenr 123, len 302g, refs 2 extent bytenr whatever, len 4k, refs 1 extent bytenr notimportant, len 4k, refs 1 See that refs 2 change? We split the original extent, so we have 2 file extents pointing to the same physical extents, so we bumped the ref count. This will happen over and over again until we have completely overwritten the original extent, at which point your space usage will go back down to ~302g. Wait, *what*? OK, I did a small experiment, and found that btrfs actually does do something like this. Can't argue with fact, though it would be nice if btrfs could be smarter and drop unused portions of the original extent sooner. :-P The above quoted scenario is a little oversimplified. Chances are that 302G file is made of much smaller extents (128M..256M). If the VM is writing 4K randomly everywhere then those 128M+ extents are not going away any time soon. Even the extents that are dropped stick around for a few btrfs transaction commits before they go away. I couldn't reproduce this behavior until I realized the extents I was overwriting in my tests were exactly the same size and position of the extents on disk. I changed the offset slightly and found that partially-overwritten extents do in fact stick around in their entirety. There seems to be an unexpected benefit for compression here: compression keeps the extents small, so many small updates will be less likely to leave big mostly-unused extents lying around the filesystem. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
Daniele Testa posted on Sat, 20 Dec 2014 14:18:31 +0800 as excerpted: Anyways, would disabling CoW (by putting +C on the parent dir) prevent the performance issues and 2*filesize issue? It should, provided you don't then start snapshotting the file (which I don't believe you intend to do but just in case...). -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs is using 25% more disk than it should
Hey, I am hoping you guys can shed some light on my issue. I know that it's a common question that people see differences in the disk used when running different calculations, but I still think that my issue is weird. root@s4 / # mount /dev/md3 on /opt/drives/ssd type btrfs (rw,noatime,compress=zlib,discard,nospace_cache) root@s4 / # btrfs filesystem df /opt/drives/ssd Data: total=407.97GB, used=404.08GB System, DUP: total=8.00MB, used=52.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.25GB, used=672.21MB Metadata: total=8.00MB, used=0.00 root@s4 /opt/drives/ssd # ls -alhs total 302G 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 302G. As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? Regards, Daniele -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html