Re: btrfs is using 25% more disk than it should
Ok, so this is what I did: 1. Copied the sparse 315GB (with 302GB inside) to another server 2. Re-formatted the btrfs partition 3. chattr +C on the parent dir 4. Copied the 315GB file back to the btrfs partition (the file is not sparse any more due to the copying) This is the end result: root@s4 /opt/drives/ssd # ls -alhs total 316G 16K drwxr-xr-x 1 libvirt-qemu libvirt-qemu 42 Dec 20 07:00 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 315G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 20 09:11 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 20 06:53 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 316G. root@s4 /opt/drives/ssd # df -h /dev/md3411G 316G 94G 78% /opt/drives/ssd root@s4 /opt/drives/ssd # btrfs filesystem df /opt/drives/ssd Data, single: total=323.01GiB, used=315.08GiB System, DUP: total=8.00MiB, used=64.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=1.00GiB, used=880.00KiB Metadata, single: total=8.00MiB, used=0.00 unknown, single: total=16.00MiB, used=0.00 root@s4 /opt/drives/ssd # lsattr ./snapshots ---C ./disk_208.img As you can see, it looks much better now. The file takes as much space as it should and the Metadata is only 880kb. I will do some writes inside the VM and see if the file grows on the outside. If everything is ok, it should not. 2014-12-20 5:17 GMT+08:00 Josef Bacik jba...@fb.com: On 12/19/2014 04:10 PM, Josef Bacik wrote: On 12/18/2014 09:59 AM, Daniele Testa wrote: Hey, I am hoping you guys can shed some light on my issue. I know that it's a common question that people see differences in the disk used when running different calculations, but I still think that my issue is weird. root@s4 / # mount /dev/md3 on /opt/drives/ssd type btrfs (rw,noatime,compress=zlib,discard,nospace_cache) root@s4 / # btrfs filesystem df /opt/drives/ssd Data: total=407.97GB, used=404.08GB System, DUP: total=8.00MB, used=52.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.25GB, used=672.21MB Metadata: total=8.00MB, used=0.00 root@s4 /opt/drives/ssd # ls -alhs total 302G 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 302G. As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? Hello and welcome to the wonderful world of btrfs, where COW can really suck hard without being super clear why! It's 4pm on a Friday right before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to use pretty pictures. You have this case to start with file offset 0 offset 302g [-prealloced 302g extent--] (man it's impressive I got all that lined up right) On disk you have 2 things. First your file which has file extents which says inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g and then in the extent tree, who keeps track of actual allocated space has this extent bytenr 123, len 302g, refs 1 Now say you boot up your virt image and it writes 1 4k block to offset 0. Now you have this [4k][302g-4k--] And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file
Re: btrfs is using 25% more disk than it should
No, I don't have any snapshots or subvolumes. Only that single file. The file has both checksums and datacow on it. I will do chattr +C on the parent dir and re-create the file to make sure all files are marked as nodatacow. Should I also turn off checksums with the mount-flags if this filesystem only contain big VM-files? Or is it not needed if I put +C on the parent dir? 2014-12-20 2:53 GMT+08:00 Phillip Susi ps...@ubuntu.com: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/18/2014 9:59 AM, Daniele Testa wrote: As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. So you don't have any snapshots or other subvolumes? However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Did you flag the file as nodatacow? Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? You probably have the data checksums enabled and that isn't unreasonable for checksums on 302g of data. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUlHQyAAoJENRVrw2cjl5RZWEIAKfdDzlNVrD/IYDZ5wzIeg5P DR5H8anGGc2QPTAD76vEX/XA7/j1Kg+PbQRHGdz6Iq2+Vq4CGno/yIi46oVVVYaL H4XvuH7GvPJyzHJ+XCMHjPGLrSCBxgIm1XSluNXmFNCwqi/FONk8TUhWsw7JchaZ yCVe/82YI+MLZhmJdudt48MeNFzW6LYi58dQo/JfYnTGnpZAFutdgBM7vLmnqLY2 WVLQUNHZsHBa7solttCuRtc4h8ku9FBObfKKYNPAEn1YWfx7bihWgPeBMH/blsza yhpMq96OMhIfn2SmIZMSwGh2ys+AxQQfymYR69fyGYTIajHmJEhJUzltuQD9Yg8= =Z9/S -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs is using 25% more disk than it should
But I read somewhere that compression should be turned off on mounts that just store large VM-images. Is that wrong? Btw, I am not pre-allocation space for the images. I use sparse files with: dd if=/dev/zero of=drive.img bs=1 count=1 seek=300G It creates the file in a few ms. Is it better to use fallocate with btrfs? If I use sparse files, it adds a benefit when I want to copy/move the image-file to another server. Like if the 300GB sparse file just has 10GB of data in it, I only need to copy 10GB when moving it to another server. Would the same be true with fallocate? Anyways, would disabling CoW (by putting +C on the parent dir) prevent the performance issues and 2*filesize issue? 2014-12-20 13:52 GMT+08:00 Zygo Blaxell ce3g8...@umail.furryterror.org: On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote: And for your inode you now have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, disklen 302g and in your extent tree you have extent bytenr 123, len 302g, refs 1 extent bytenr whatever, len 4k, refs 1 See that? Your file is still the same size, it is still 302g. If you cp'ed it right now it would copy 302g of information. But what you have actually allocated on disk? Well that's now 302g + 4k. Now lets say your virt thing decides to write to the middle, lets say at offset 12k, now you have this inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), disklen 4k inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, disklen 4k inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, disklen 302g and in the extent tree you have this extent bytenr 123, len 302g, refs 2 extent bytenr whatever, len 4k, refs 1 extent bytenr notimportant, len 4k, refs 1 See that refs 2 change? We split the original extent, so we have 2 file extents pointing to the same physical extents, so we bumped the ref count. This will happen over and over again until we have completely overwritten the original extent, at which point your space usage will go back down to ~302g. Wait, *what*? OK, I did a small experiment, and found that btrfs actually does do something like this. Can't argue with fact, though it would be nice if btrfs could be smarter and drop unused portions of the original extent sooner. :-P The above quoted scenario is a little oversimplified. Chances are that 302G file is made of much smaller extents (128M..256M). If the VM is writing 4K randomly everywhere then those 128M+ extents are not going away any time soon. Even the extents that are dropped stick around for a few btrfs transaction commits before they go away. I couldn't reproduce this behavior until I realized the extents I was overwriting in my tests were exactly the same size and position of the extents on disk. I changed the offset slightly and found that partially-overwritten extents do in fact stick around in their entirety. There seems to be an unexpected benefit for compression here: compression keeps the extents small, so many small updates will be less likely to leave big mostly-unused extents lying around the filesystem. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs is using 25% more disk than it should
Hey, I am hoping you guys can shed some light on my issue. I know that it's a common question that people see differences in the disk used when running different calculations, but I still think that my issue is weird. root@s4 / # mount /dev/md3 on /opt/drives/ssd type btrfs (rw,noatime,compress=zlib,discard,nospace_cache) root@s4 / # btrfs filesystem df /opt/drives/ssd Data: total=407.97GB, used=404.08GB System, DUP: total=8.00MB, used=52.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.25GB, used=672.21MB Metadata: total=8.00MB, used=0.00 root@s4 /opt/drives/ssd # ls -alhs total 302G 4.0K drwxr-xr-x 1 root root 42 Dec 18 14:34 . 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 .. 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img 0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots root@s4 /opt/drives/ssd # du -h 0 ./snapshots 302G. As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On that partition, I have one single starse file, taking 302GB of space (max 315GB). The snapshots directory is completely empty. However, for some weird reason, btrfs seems to think it takes 404GB. The big file is a disk that I use in a virtual server and when I write stuff inside that virtual server, the disk-usage of the btrfs partition on the host keeps increasing even if the sparse-file is constant at 302GB. I even have 100GB of free disk-space inside that virtual disk-file. Writing 1GB inside the virtual disk-file seems to increase the usage about 4-5GB on the outside. Does anyone have a clue on what is going on? How can the difference and behaviour be like this when I just have one single file? Is it also normal to have 672MB of metadata for a single file? Regards, Daniele -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Extra info
Sorry, did not read the guidelines correctly. Here comes more info: root@s4 /opt/drives/ssd # uname -a Linux s4.podnix.com 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1+deb7u1 x86_64 GNU/Linux root@s4 /opt/drives/ssd # btrfs --version Btrfs Btrfs v0.19 root@s4 /opt/drives/ssd # btrfs fi show Label: none uuid: 752ed11b-defc-4717-b4c9-a9e08ad64ba6 Total devices 1 FS bytes used 404.74GB devid1 size 410.50GB used 410.50GB path /dev/md3 Regards, Daniele -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Extra info
I am running latest Debian stable. However, I used backports to update the kernel to 3.16. root@s4 /opt/drives/ssd # uname -a Linux s4.podnix.com 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 (2014-12-08) x86_64 GNU/Linux root@s4 /opt/drives/ssd # btrfs --version Btrfs v3.14.1 It still reports over-use, so I am running a defrag on the file: root@s4 /opt/drives/ssd # btrfs filesystem defragment /opt/drives/ssd/disk_208.img But I see it slowly eats even more disk space durring the defrag. I had about 7GB before. When it went down close to 1GB, I cancelled it as I'm afraid it will corrupt the file if it runs out of space. Do you know how btrfs behaves if it runs out of space durring a defrag? Any other ideas how I can solve it? Regards, Daniele 2014-12-18 23:35 GMT+08:00 Hugo Mills h...@carfax.org.uk: On Thu, Dec 18, 2014 at 11:02:34PM +0800, Daniele Testa wrote: Sorry, did not read the guidelines correctly. Here comes more info: root@s4 /opt/drives/ssd # uname -a Linux s4.podnix.com 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1+deb7u1 x86_64 GNU/Linux This is your problem. I think the difficulty is that writes into the middle of an extent didn't split the extent and allow the overwritten area to be reclaimed, so the whole extent still takes up space. IIRC, josef fixed this about 18 months ago. You should upgrade your kernel to something that isn't written in cueniform (like 3.18, say), and defrag the file in question. I think that should fix the problem. root@s4 /opt/drives/ssd # btrfs --version Btrfs Btrfs v0.19 This is also an antique, and probably needs an upgrade too (although it's less critical than the kernel). Hugo. root@s4 /opt/drives/ssd # btrfs fi show Label: none uuid: 752ed11b-defc-4717-b4c9-a9e08ad64ba6 Total devices 1 FS bytes used 404.74GB devid1 size 410.50GB used 410.50GB path /dev/md3 Regards, Daniele -- Hugo Mills | Python is executable pseudocode; perl is executable hugo@... carfax.org.uk | line-noise. http://carfax.org.uk/ | PGP: 65E74AC0 |Ben Burton -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html