Re: bedup --defrag freezing
On 2015-08-12 15:30, Chris Murphy wrote: On Wed, Aug 12, 2015 at 12:44 PM, Konstantin Svist fry@gmail.com wrote: On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote: On 2015-08-05 17:45, Konstantin Svist wrote: Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? AFAIK, bedup hasn't been actively developed for quite a while (I'm actually kind of surprised it runs with the newest btrfs-progs). Personally, I'd suggest using duperemove (https://github.com/markfasheh/duperemove) Thanks, good to know. Tried duperemove -- it looks like it builds a database of its own checksums every time it runs... why won't it use BTRFS internal checksums for fast rejection? Would run a LOT faster... I think the reason is duperremove does extent based deduplication. Where Btrfs checksums are 4KiB block based, not extent based. And so many 4KiB CRC32C checksums would need to be in memory, that could be kinda expensive. And also, I don't know if CRC32C checksums have essentially no practical chance of collision. If it's really rare, rather than so improbable as to be impossible then you could end up with really rare corruption where incorrect deduplication happens. Yeah, duperemove doesn't use them because of the memory limitations. Theoretically it's possible to take the the CRC checksums of the individual blocks and then combine them to get a checksum of the blocks as a whold, but it really isn't worth it for that (it would take just about as long as the current hashing. As for the collision properties of CRC32C, it's actually almost trivial to construct collisions. The reason that it is used in BTRFS is because there is a functional guarantee that any single bit error in a block _will_ result in a different CRC, and most larger errors will also. In other words, the usage of CRC32C in BTRFS is for error detection and because it's ridiculously fast on all modern processors. As far as the possibility of incorrect deduplication, the kernel does a bytewise comparison of the extents submitted before actually deduplicating them, so there's no chance (barring hardware issues and/or external influence from a ill-intentioned third-party) of it happening. Because of this, you could theoretically just call the ioctl on every possible combination of extents in the FS, but that would take a ridiculous amount of time (especially because calls involving the same byte ranges get internally serialized by the kernel), which is why we have programs like duperemove (while the hashing has to read all the data too, it's still a lot faster than just comparing all of it directly). There was a patch late last year I think to re-introduce sha256 hash as the checksum, but as far as I know it's not in btrfs-progs yet. I forget if that's file, extent or block based. I'm pretty sure that that patch never made it into the kernel (the original one was for the kernel, not the userspace programs, and it never got brought in because the argument for it (better protection against malicious intent) was inherently invalid for the usage of checksums in BTRFS (if someone can rewrite your data arbitrarily on disk, they can do so for the checksums also)), and that it was block based (and as such less useful for deduplication than the CRC32C that we are currently using). smime.p7s Description: S/MIME Cryptographic Signature
Re: bedup --defrag freezing
On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote: On 2015-08-05 17:45, Konstantin Svist wrote: Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? AFAIK, bedup hasn't been actively developed for quite a while (I'm actually kind of surprised it runs with the newest btrfs-progs). Personally, I'd suggest using duperemove (https://github.com/markfasheh/duperemove) Thanks, good to know. Tried duperemove -- it looks like it builds a database of its own checksums every time it runs... why won't it use BTRFS internal checksums for fast rejection? Would run a LOT faster... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bedup --defrag freezing
On Wed, Aug 12, 2015 at 12:44 PM, Konstantin Svist fry@gmail.com wrote: On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote: On 2015-08-05 17:45, Konstantin Svist wrote: Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? AFAIK, bedup hasn't been actively developed for quite a while (I'm actually kind of surprised it runs with the newest btrfs-progs). Personally, I'd suggest using duperemove (https://github.com/markfasheh/duperemove) Thanks, good to know. Tried duperemove -- it looks like it builds a database of its own checksums every time it runs... why won't it use BTRFS internal checksums for fast rejection? Would run a LOT faster... I think the reason is duperremove does extent based deduplication. Where Btrfs checksums are 4KiB block based, not extent based. And so many 4KiB CRC32C checksums would need to be in memory, that could be kinda expensive. And also, I don't know if CRC32C checksums have essentially no practical chance of collision. If it's really rare, rather than so improbable as to be impossible then you could end up with really rare corruption where incorrect deduplication happens. There was a patch late last year I think to re-introduce sha256 hash as the checksum, but as far as I know it's not in btrfs-progs yet. I forget if that's file, extent or block based. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bedup --defrag freezing
On 2015-08-05 17:45, Konstantin Svist wrote: Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? AFAIK, bedup hasn't been actively developed for quite a while (I'm actually kind of surprised it runs with the newest btrfs-progs). Personally, I'd suggest using duperemove (https://github.com/markfasheh/duperemove). smime.p7s Description: S/MIME Cryptographic Signature
bedup --defrag freezing
Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? Thanks, Konstantin # uname -a Linux mireille.svist.net 4.0.8-200.fc21.x86_64 #1 SMP Fri Jul 10 21:09:54 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.1 # btrfs fi show Label: none uuid: 5ac56e7d-3d04-4ffa-8160-5a47f46c2939 Total devices 1 FS bytes used 243.43GiB devid1 size 465.76GiB used 318.05GiB path /dev/sda2 btrfs-progs v4.1 # btrfs fi df / Data, single: total=309.01GiB, used=238.24GiB System, single: total=32.00MiB, used=64.00KiB Metadata, single: total=9.01GiB, used=5.19GiB GlobalReserve, single: total=512.00MiB, used=0.00B dmseg attached [0.00] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 4.0.8-200.fc21.x86_64 (mockbu...@bkernel02.phx2.fedoraproject.org) (gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC) ) #1 SMP Fri Jul 10 21:09:54 UTC 2015 [0.00] Command line: BOOT_IMAGE=/main/boot/vmlinuz-4.0.8-200.fc21.x86_64 root=/dev/sda2 ro rootflags=subvol=main vconsole.font=latarcyrheb-sun16 quiet [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xba14] usable [0.00] BIOS-e820: [mem 0xba15-0xba156fff] ACPI NVS [0.00] BIOS-e820: [mem 0xba157000-0xba94] usable [0.00] BIOS-e820: [mem 0xba95-0xbabedfff] reserved [0.00] BIOS-e820: [mem 0xbabee000-0xcac0afff] usable [0.00] BIOS-e820: [mem 0xcac0b000-0xcb10afff] reserved [0.00] BIOS-e820: [mem 0xcb10b000-0xcb63dfff] usable [0.00] BIOS-e820: [mem 0xcb63e000-0xcb7aafff] ACPI NVS [0.00] BIOS-e820: [mem 0xcb7ab000-0xcbffefff] reserved [0.00] BIOS-e820: [mem 0xcbfff000-0xcbff] usable [0.00] BIOS-e820: [mem 0xcd00-0xcf1f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00022fdf] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.8 present. [0.00] DMI: Notebook P15SM-A/SM1-A/P15SM-A/SM1-A, BIOS 4.6.5 03/27/2014 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x22fe00 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-E7FFF uncachable [0.00] E8000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask 7E write-back [0.00] 1 base 02 mask 7FE000 write-back [0.00] 2 base 022000 mask 7FF000 write-back [0.00] 3 base 00E000 mask 7FE000 uncachable [0.00] 4 base 00D000 mask 7FF000 uncachable [0.00] 5 base 00CE00 mask 7FFE00 uncachable [0.00] 6 base 00CD00 mask 7FFF00 uncachable [0.00] 7 base 022FE0 mask 7FFFE0 uncachable [0.00] 8 disabled [0.00] 9 disabled [0.00] PAT configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.00] e820: update [mem 0xcd00-0x] usable == reserved [0.00] e820: last_pfn = 0xcc000 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fd830-0x000fd83f] mapped at [880fd830] [0.00] Base memory trampoline at [88097000] 97000 size 24576 [0.00] Using