Re: bedup --defrag freezing

2015-08-13 Thread Austin S Hemmelgarn

On 2015-08-12 15:30, Chris Murphy wrote:

On Wed, Aug 12, 2015 at 12:44 PM, Konstantin Svist fry@gmail.com wrote:

On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote:

On 2015-08-05 17:45, Konstantin Svist wrote:

Hi,

I've been running btrfs on Fedora for a while now, with bedup --defrag
running in a night-time cronjob.
Last few runs seem to have gotten stuck, without possibility of even
killing the process (kill -9 doesn't work) -- all I could do is hard
power cycle.

Did something change recently? Is bedup simply too out of date? What
should I use to de-duplicate across snapshots instead? Etc.?


AFAIK, bedup hasn't been actively developed for quite a while (I'm
actually kind of surprised it runs with the newest btrfs-progs).
Personally, I'd suggest using duperemove
(https://github.com/markfasheh/duperemove)


Thanks, good to know.
Tried duperemove -- it looks like it builds a database of its own
checksums every time it runs... why won't it use BTRFS internal
checksums for fast rejection? Would run a LOT faster...


I think the reason is duperremove does extent based deduplication.
Where Btrfs checksums are 4KiB block based, not extent based. And so
many 4KiB CRC32C checksums would need to be in memory, that could be
kinda expensive. And also, I don't know if CRC32C checksums have
essentially no practical chance of collision. If it's really rare,
rather than so improbable as to be impossible then you could end up
with really rare corruption where incorrect deduplication happens.
Yeah, duperemove doesn't use them because of the memory limitations. 
Theoretically it's possible to take the the CRC checksums of the 
individual blocks and then combine them to get a checksum of the blocks 
as a whold, but it really isn't worth it for that (it would take just 
about as long as the current hashing.


As for the collision properties of CRC32C, it's actually almost trivial 
to construct collisions.  The reason that it is used in BTRFS is because 
there is a functional guarantee that any single bit error in a block 
_will_ result in a different CRC, and most larger errors will also.  In 
other words, the usage of CRC32C in BTRFS is for error detection and 
because it's ridiculously fast on all modern processors.  As far as the 
possibility of incorrect deduplication, the kernel does a bytewise 
comparison of the extents submitted before actually deduplicating them, 
so there's no chance (barring hardware issues and/or external influence 
from a ill-intentioned third-party) of it happening.  Because of this, 
you could theoretically just call the ioctl on every possible 
combination of extents in the FS, but that would take a ridiculous 
amount of time (especially because calls involving the same byte ranges 
get internally serialized by the kernel), which is why we have programs 
like duperemove (while the hashing has to read all the data too, it's 
still a lot faster than just comparing all of it directly).


There was a patch late last  year I think to re-introduce sha256 hash
as the checksum, but as far as I know it's not in btrfs-progs yet. I
forget if that's file, extent or block based.
I'm pretty sure that that patch never made it into the kernel (the 
original one was for the kernel, not the userspace programs, and it 
never got brought in because the argument for it (better protection 
against malicious intent) was inherently invalid for the usage of 
checksums in BTRFS (if someone can rewrite your data arbitrarily on 
disk, they can do so for the checksums also)), and that it was block 
based (and as such less useful for deduplication than the CRC32C that we 
are currently using).





smime.p7s
Description: S/MIME Cryptographic Signature


Re: bedup --defrag freezing

2015-08-12 Thread Konstantin Svist
On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote:
 On 2015-08-05 17:45, Konstantin Svist wrote:
 Hi,

 I've been running btrfs on Fedora for a while now, with bedup --defrag
 running in a night-time cronjob.
 Last few runs seem to have gotten stuck, without possibility of even
 killing the process (kill -9 doesn't work) -- all I could do is hard
 power cycle.

 Did something change recently? Is bedup simply too out of date? What
 should I use to de-duplicate across snapshots instead? Etc.?

 AFAIK, bedup hasn't been actively developed for quite a while (I'm
 actually kind of surprised it runs with the newest btrfs-progs).
 Personally, I'd suggest using duperemove
 (https://github.com/markfasheh/duperemove)

Thanks, good to know.
Tried duperemove -- it looks like it builds a database of its own
checksums every time it runs... why won't it use BTRFS internal
checksums for fast rejection? Would run a LOT faster...


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bedup --defrag freezing

2015-08-12 Thread Chris Murphy
On Wed, Aug 12, 2015 at 12:44 PM, Konstantin Svist fry@gmail.com wrote:
 On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote:
 On 2015-08-05 17:45, Konstantin Svist wrote:
 Hi,

 I've been running btrfs on Fedora for a while now, with bedup --defrag
 running in a night-time cronjob.
 Last few runs seem to have gotten stuck, without possibility of even
 killing the process (kill -9 doesn't work) -- all I could do is hard
 power cycle.

 Did something change recently? Is bedup simply too out of date? What
 should I use to de-duplicate across snapshots instead? Etc.?

 AFAIK, bedup hasn't been actively developed for quite a while (I'm
 actually kind of surprised it runs with the newest btrfs-progs).
 Personally, I'd suggest using duperemove
 (https://github.com/markfasheh/duperemove)

 Thanks, good to know.
 Tried duperemove -- it looks like it builds a database of its own
 checksums every time it runs... why won't it use BTRFS internal
 checksums for fast rejection? Would run a LOT faster...

I think the reason is duperremove does extent based deduplication.
Where Btrfs checksums are 4KiB block based, not extent based. And so
many 4KiB CRC32C checksums would need to be in memory, that could be
kinda expensive. And also, I don't know if CRC32C checksums have
essentially no practical chance of collision. If it's really rare,
rather than so improbable as to be impossible then you could end up
with really rare corruption where incorrect deduplication happens.

There was a patch late last  year I think to re-introduce sha256 hash
as the checksum, but as far as I know it's not in btrfs-progs yet. I
forget if that's file, extent or block based.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bedup --defrag freezing

2015-08-06 Thread Austin S Hemmelgarn

On 2015-08-05 17:45, Konstantin Svist wrote:

Hi,

I've been running btrfs on Fedora for a while now, with bedup --defrag
running in a night-time cronjob.
Last few runs seem to have gotten stuck, without possibility of even
killing the process (kill -9 doesn't work) -- all I could do is hard
power cycle.

Did something change recently? Is bedup simply too out of date? What
should I use to de-duplicate across snapshots instead? Etc.?

AFAIK, bedup hasn't been actively developed for quite a while (I'm 
actually kind of surprised it runs with the newest btrfs-progs). 
Personally, I'd suggest using duperemove 
(https://github.com/markfasheh/duperemove).





smime.p7s
Description: S/MIME Cryptographic Signature


bedup --defrag freezing

2015-08-05 Thread Konstantin Svist
Hi,

I've been running btrfs on Fedora for a while now, with bedup --defrag
running in a night-time cronjob.
Last few runs seem to have gotten stuck, without possibility of even
killing the process (kill -9 doesn't work) -- all I could do is hard
power cycle.

Did something change recently? Is bedup simply too out of date? What
should I use to de-duplicate across snapshots instead? Etc.?


Thanks,
Konstantin



# uname -a
Linux mireille.svist.net 4.0.8-200.fc21.x86_64 #1 SMP Fri Jul 10
21:09:54 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.1

# btrfs fi show
Label: none  uuid: 5ac56e7d-3d04-4ffa-8160-5a47f46c2939
Total devices 1 FS bytes used 243.43GiB
devid1 size 465.76GiB used 318.05GiB path /dev/sda2

btrfs-progs v4.1

# btrfs fi df /
Data, single: total=309.01GiB, used=238.24GiB
System, single: total=32.00MiB, used=64.00KiB
Metadata, single: total=9.01GiB, used=5.19GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

dmseg attached

[0.00] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 4.0.8-200.fc21.x86_64 (mockbu...@bkernel02.phx2.fedoraproject.org) (gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC) ) #1 SMP Fri Jul 10 21:09:54 UTC 2015
[0.00] Command line: BOOT_IMAGE=/main/boot/vmlinuz-4.0.8-200.fc21.x86_64 root=/dev/sda2 ro rootflags=subvol=main vconsole.font=latarcyrheb-sun16 quiet
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xba14] usable
[0.00] BIOS-e820: [mem 0xba15-0xba156fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xba157000-0xba94] usable
[0.00] BIOS-e820: [mem 0xba95-0xbabedfff] reserved
[0.00] BIOS-e820: [mem 0xbabee000-0xcac0afff] usable
[0.00] BIOS-e820: [mem 0xcac0b000-0xcb10afff] reserved
[0.00] BIOS-e820: [mem 0xcb10b000-0xcb63dfff] usable
[0.00] BIOS-e820: [mem 0xcb63e000-0xcb7aafff] ACPI NVS
[0.00] BIOS-e820: [mem 0xcb7ab000-0xcbffefff] reserved
[0.00] BIOS-e820: [mem 0xcbfff000-0xcbff] usable
[0.00] BIOS-e820: [mem 0xcd00-0xcf1f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00022fdf] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.8 present.
[0.00] DMI: Notebook P15SM-A/SM1-A/P15SM-A/SM1-A, BIOS 4.6.5 03/27/2014
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x22fe00 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-C write-protect
[0.00]   D-E7FFF uncachable
[0.00]   E8000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 00 mask 7E write-back
[0.00]   1 base 02 mask 7FE000 write-back
[0.00]   2 base 022000 mask 7FF000 write-back
[0.00]   3 base 00E000 mask 7FE000 uncachable
[0.00]   4 base 00D000 mask 7FF000 uncachable
[0.00]   5 base 00CE00 mask 7FFE00 uncachable
[0.00]   6 base 00CD00 mask 7FFF00 uncachable
[0.00]   7 base 022FE0 mask 7FFFE0 uncachable
[0.00]   8 disabled
[0.00]   9 disabled
[0.00] PAT configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- UC  
[0.00] e820: update [mem 0xcd00-0x] usable == reserved
[0.00] e820: last_pfn = 0xcc000 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000fd830-0x000fd83f] mapped at [880fd830]
[0.00] Base memory trampoline at [88097000] 97000 size 24576
[0.00] Using