Hi Mark, could you compare (appart from online/offline) your implementation to LiuBo's work?, appeared on ML a while ago: http://email@example.com/msg23656.html It would be interesting if the two approaches could share some code, and also confirmation that using one technique does not disregard using the other in future. Best wishes, Mark On Tuesday 16 April 2013 15:15:31 Mark Fasheh wrote: Hi, The following series of patches implements in btrfs an ioctl to do offline deduplication of file extents. To be clear, offline in this sense means that the file system is mounted and running, but the dedupe is not done during file writes, but after the fact when some userspace software initiates a dedupe. The primary patch is loosely based off of one sent by Josef Bacik back in January, 2011. http://permalink.gmane.org/gmane.comp.file-systems.btrfs/8508 I've made significant updates and changes from the original. In particular the structure passed is more fleshed out, this series has a high degree of code sharing between itself and the clone code, and the locking has been updated. ... Code review is very much appreciated. Thanks, --Mark -- Marek Otahal :o) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, this is awesome news! thank you for working on dedup. I have some questions about the dedup approach in regards to other layers/features. 1/ How will the snapshots be handled? Whether data would be dedup-ed between snapshots (potentially big saved-space ratio), or would snapshots be considered isolated? Best, if this could be set by the user. My concern is about being error-prone, where with deduping snapshots, actually only 1 copy of the data would exist and a corruption would damage it as well as all snapshots. Or is this not a problem and we say safety is handled by RAID? 2/ Order of dedup/compression? What would be done first, compress a file and then compare blocks for duplicates, or the other way around? Dedup 1st would save some compression work: file's block 00 - hash - isDup? (if no)- compress (10x0) - write but proble is written data size is unknown (it's not the 1 block at start) Other way, compress first, would waste compression cpu-operations on duplicate blocks, but would yield reduced dedup-related metadata usage, as 1 million of zeros would be compressed to a single block and that one only is compared/written. Usefullness here depends on the compression ratio of the file. I'm not sure which approach here would be better? Thank you for your time and explanation. Best wishes, Mark On Sunday 07 April 2013 21:12:48 Liu Bo wrote: (NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data.) This introduce the online data deduplication feature for btrfs. (1) WHY do we need deduplication? To improve our storage effiency. (2) WHAT is deduplication? Two key ways for practical deduplication implementations, * When the data is deduplicated (inband vs background) * The granularity of the deduplication. (block level vs file level) For btrfs, we choose * inband(synchronous) * block level We choose them because of the same reason as how zfs does. a) To get an immediate benefit. b) To remove redundant parts within a file. So we have an inband, block level data deduplication here. (3) HOW does deduplication works? This makes full use of file extent back reference, the same way as IOCTL_CLONE, which lets us easily store multiple copies of a set of data as a single copy along with an index of references to the copy. Here we have a) a new dedicated tree(DEDUP tree) and b) a new key(BTRFS_DEDUP_ITEM_KEY), which consists of (stop 64bits of hash, type, disk offset), * stop 64bits of hash It comes from sha256, which is very helpful on avoiding collision. And we take the stop 64bits as the index. * disk offset It helps to find where the data is stored. So the whole deduplication process works as, 1) write something, 2) calculate the hash of this something, 3) try to find the match of hash value by searching DEDUP keys in a dedicated tree, DEDUP tree. 4) if found, skip real IO and link to the existing copy if not, do real IO and insert a DEDUP key to the DEDUP tree. For now, we limit the deduplication unit to PAGESIZE, 4096, and we're going to increase this unit dynamically in the future. Signed-off-by: Liu Bo firstname.lastname@example.org -- Marek Otahal :o) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, just reading chattr manpage.. On Monday 18 March 2013 14:15:17 you wrote: Hi, After reading through the btrfs documentation I'm curious to know if it's possible to ever securely erase a file from a btrfs filesystem (or ZFS for that matter). On non-COW filesystems atop regular HDDs one can simply overwrite the file with zeros or random data using dd or some other tool and rest assured that the blocks which contained the sensitive information have been wiped. However on btrfs it would seem any such attempt would write the zeros/random data to a new location, leaving the old blocks with the sensitive data intact. Further, since specifying NOCOW is only possible for newly created files, there seems to be no way to overwrite the appropriate blocks short of deleting the associated file and then filling the entire free filesystem space with zeros/random data such that the old blocks are eventually overwritten. What's the verdict on this? what would chattr +s do? When a file with the `s' attribute set is deleted, its blocks are zeroed and written back to the disk. Note: please make sure to read the bugs and limitations section at the end of this document. Nice spring to all of you! :) Mark Regards, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Marek Otahal :o) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I was running btrfs fi defragment on my partitions, one went all fine and saved me some space, other crashed on some files.. [marek@beruska ~]$ uname -a Linux beruska 3.2.1-1-ARCH #1 SMP PREEMPT Fri Jan 13 08:19:09 UTC 2012 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz GenuineIntel GNU/Linux btrfs-progs 0.19.20120110-2 btrfs mounted on dm-crypt/luks with these options: mount /dev/mapper/storeDevice on /mnt/store type btrfs (rw,nosuid,nodev,noexec,relatime,thread_pool=4,nospace_cache) After this, i think, these processes keep running: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 16669 root 20 0 000 D 17.8 0.0 75:27.36 btrfs-delayed-m 16668 root 20 0 000 R 17.5 0.0 75:19.55 btrfs-delayed-m 16670 root 20 0 000 D 17.5 0.0 34:53.62 btrfs-delayed-m 31441 root 20 0 000 R 17.5 0.0 51:43.67 btrfs-delayed-m thank you, Mark [528547.210588] [ cut here ] [528547.210920] kernel BUG at fs/btrfs/transaction.c:1220! [528547.211219] invalid opcode: [#3] PREEMPT SMP [528547.211519] Modules linked in: snd_seq snd_seq_device nls_cp437 vfat fat tg3 b43 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables rfcomm ipv6 bnep cpufreq_ondemand zram(C) fuse btrfs zlib_deflate crc32c libcrc32c uas ums_realtek usb_storage uvcvideo btusb bluetooth videodev media joydev snd_hda_codec_realtek snd_hda_intel snd_hda_codec uhci_hcd snd_pcm snd_page_alloc ehci_hcd snd_hwdep usbcore psmouse snd_timer snd iTCO_wdt iTCO_vendor_support ideapad_laptop usb_common serio_raw i2c_i801 pcspkr libphy wmi thermal soundcore sparse_keymap arc4 battery evdev ac ssb mmc_core pcmcia mac80211 cfg80211 rfkill pcmcia_core acpi_cpufreq mperf processor freq_table sha256_generic sha512_generic aes_i586 ext4 crc16 jbd2 mbcache aes_generic xts gf128mul dm_crypt dm_mod sd_mod ata_piix libata scsi_mod i915 video button i2c_algo_bit drm_kms_helper drm i2c_core intel_agp intel_gtt agpgart [last unloaded: tg3] [528547.213821] [528547.213821] Pid: 15370, comm: sync Tainted: G D WC 3.2.1-1-ARCH #1 LENOVO 41875QG /Kuril [528547.213821] EIP: 0060:[f9c87385] EFLAGS: 00010286 CPU: 0 [528547.213821] EIP is at btrfs_commit_transaction+0x795/0x7e0 [btrfs] [528547.213821] EAX: ffe4 EBX: f39c6688 ECX: 631b7740 EDX: [528547.213821] ESI: f39c66dc EDI: 143d6ef8 EBP: eb0fbf4c ESP: eb0fbf00 [528547.213821] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [528547.213821] Process sync (pid: 15370, ti=eb0fa000 task=c08655e0 task.ti=eb0fa000) [528547.213821] Stack: [528547.213821] eb0fbf18 f39cb5d0 f4b34c00 c08655e0 [528547.213821] c0167e00 eb0fbf24 eb0fbf24 f4b34c00 f4b36c00 0001 0001 [528547.213821] f4b36c00 0001 0001 eb0fbf68 f9c62cc5 eb0fbf54 f4b34c00 f4b36c00 [528547.213821] Call Trace: [528547.213821] [c0167e00] ? abort_exclusive_wait+0x80/0x80 [528547.213821] [f9c62cc5] btrfs_sync_fs+0x55/0xf0 [btrfs] [528547.213821] [c024afcf] __sync_filesystem+0x5f/0x90 [528547.213821] [c024b017] sync_one_sb+0x17/0x20 [528547.213821] [c0225de8] iterate_supers+0x78/0xc0 [528547.213821] [c024b000] ? __sync_filesystem+0x90/0x90 [528547.213821] [c024b0af] sys_sync+0x3f/0x60 [528547.213821] [c047be1f] sysenter_do_call+0x12/0x28 [528547.213821] Code: e9 35 ff ff ff 0f b7 06 83 ea 02 83 c6 02 66 89 07 83 c7 02 e9 90 fd ff ff 0f b6 06 b2 ca 83 c6 01 88 07 83 c7 01 e9 72 fd ff ff 0f 0b ba bb 04 00 00 b8 62 ca cd f9 89 4d b8 e8 c7 24 4c c6 8b [528547.213821] EIP: [f9c87385] btrfs_commit_transaction+0x795/0x7e0 [btrfs] SS:ESP 0068:eb0fbf00 [528547.251313] ---[ end trace 015799c07e7725cd ]--- -- Marek Otahal :o) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 10 of June 2011 16:52:36 Josef Bacik wrote: On 06/10/2011 02:43 PM, Marek Otahal wrote: On Friday 10 of June 2011 15:33:20 Josef Bacik wrote: On 06/09/2011 10:06 PM, Daniel J Blueman wrote: On 10 June 2011 09:57, Andy Lutomirski l...@mit.edu wrote: On 06/06/2011 06:19 AM, Marek Otahal wrote: Hello, the issue happens every time when i have to hard power-off my notebook (suspend problems). With kernel 2.6.39 the partition is unmountable, solution is to boot 2.6.38 kernel which 1/ is able to mount the partition, 2/ by doing that fixes the problem so later .39 (after clean shutdown) can mount it also. Same problem here. Mounting with 2.6.38 says: [ 41.906259] Btrfs loaded [ 41.906747] device fsid e040a9d60da49596-66c0275e348878bf devid 1 transid 69217 /dev/mapper/vg_midnight_ssd-home [ 41.908767] btrfs: disk space caching is enabled [ 42.232185] btrfs: unlinked 17 orphans [ 42.232189] btrfs: truncated 2 orphans dmesg in 126.96.36.199 says:  [ 15.004255] kernel BUG at fs/btrfs/inode.c:4676!  I've been experiencing the same issue also. Josef/Chris, would an metadata snapshot or full block snapshot help debug this regression? I can probably setup a small testcase to trigger this. If you can come up with a testcase to reproduce I would love you forever ;). If I get done what I wanted to do today I will try and reproduce. Thanks, Josef ...I was getting ready for you eternal love, Josef :P...but I can't reproduce it 100%, like 70% success-rate. The test-case is quite easy, 1. mount the FS, just with compress-force=lzo option // I didn't try without, but on my other btrfs partition that doesn't use compression the err never happened ...so, can the others who experience the bug confirm compress=lzo used? 2. cd to it create a file (not sure if needed) 3. hard power-off To reproduce my tests: dd /dev/zero /btrfstest bs=1M count=256 (min required for default mksf.btrfs) losetup /dev/loop0 /btrfstest mkfs.btrfs /dev/loop0 mount -o compress-force=lzo /dev/loop0 /mnt/tmp vim /mnt/tmp/hello.txt ---power off! How long do you wait between these two steps? I've not been able to reproduce this and I've done it maybe 5 times. Either I've fixed it in my tree (yay!) or I'm doing something wrong (boo!). Thanks, Josef Not much but not immediately too, I'd say like ~5s. Did ls, df and quit. Tomorrow I'll try if I can spot a difference. Btw, is there a way to simulate power-off on a loopback-fs? Like to kill the loopback device while fs is mounted or some way? So I don't have to stress the poor hw :) Thank you, Mark -- Marek Otahal :o) signature.asc Description: This is a digitally signed message part.
Hi, sorry for repost, i'm not sure if my first mail was delivered. On Monday 06 of June 2011 12:19:56 Marek Otahal wrote: Hello, the issue happens every time when i have to hard power-off my notebook (suspend problems). With kernel 2.6.39 the partition is unmountable, solution is to boot 2.6.38 kernel which 1/ is able to mount the partition, 2/ by doing that fixes the problem so later .39 (after clean shutdown) can mount it also. Attached dmesg follows. Thank you, Mark mount options: /dev/mapper/homeDevice /home btrfs defaults,relatime,nodev,nosuid,compress-force=lzo 0 2 # /dev/sda9 home dmesg: [ 56.994241] loop: module loaded [ 57.172283] Btrfs loaded [ 57.191655] device label store devid 1 transid 26106 /dev/dm-3 [ 57.218783] device label home devid 1 transid 450932 /dev/dm-2 [ 57.459448] scsi 4:0:0:0: Direct-Access Generic- Multi-Card 1.00 PQ: 0 ANSI: 0 CCS [ 57.460293] sd 4:0:0:0: Attached scsi generic sg1 type 0 [ 57.467030] sd 4:0:0:0: [sdb] Attached SCSI removable disk [ 61.585618] EXT4-fs (sda4): warning: checktime reached, running e2fsck is recommended [ 61.671534] EXT4-fs (sda4): re-mounted. Opts: (null) [ 62.211037] device label home devid 1 transid 450932 /dev/mapper/homeDevice [ 62.212058] btrfs: force lzo compression [ 65.335194] [ cut here ] [ 65.335308] kernel BUG at fs/btrfs/inode.c:4676! [ 65.335406] invalid opcode: [#1] PREEMPT SMP [ 65.335532] last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent [ 65.337833] Modules linked in: btrfs zlib_deflate crc32c libcrc32c loop uas ums_realtek uvcvideo usb_storage msr videodev media btusb bluetooth sbs sbshc arc4 ecb b43 mac80211 joydev cfg80211 ssb mmc_core pcmcia sg fuse tg3 uhci_hcd ideapad_laptop evdev sparse_keymap psmouse pcspkr snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support rfkill serio_raw ehci_hcd snd_hda_intel pcmcia_core i2c_i801 libphy usbcore ac wmi battery thermal snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc acpi_cpufreq freq_table processor mperf sha256_generic sha512_generic ext4 mbcache jbd2 crc16 cryptd aes_i586 aes_generic xts gf128mul dm_crypt dm_mod sd_mod ata_piix libata scsi_mod i915 drm_kms_helper drm i2c_algo_bit button i2c_core video intel_agp intel_gtt agpgart [ 65.337833] [ 65.337833] Pid: 883, comm: mount Not tainted 2.6.39-ARCH #1 LENOVO 41875QG /Kuril [ 65.337833] EIP: 0060:[f9604072] EFLAGS: 00010282 CPU: 1 [ 65.337833] EIP is at btrfs_add_link+0x172/0x200 [btrfs] [ 65.337833] EAX: ffef EBX: ef448908 ECX: 0119 EDX: 0111 [ 65.337833] ESI: 004255d9 EDI: 0020 EBP: eec77ba4 ESP: eec77b48 [ 65.337833] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 65.337833] Process mount (pid: 883, ti=eec76000 task=f4c8c450 task.ti=eec76000) [ 65.337833] Stack: [ 65.396712] 0020 004255d9 eec77b87 0001 e55e f960d6f8 [ 65.396712] eec77b88 eec77b8c eec77b90 eec77b94 ef472000 f5076800 ef448ba8 6f43c090 [ 65.396712] 46ab 0100 0046ab6f eec77c0c [ 65.396712] Call Trace: [ 65.396712] [f960d6f8] ? btrfs_inode_ref_index+0xd8/0xe0 [btrfs] [ 65.396712] [f962cfcf] add_inode_ref+0x28f/0x320 [btrfs] [ 65.396712] [f962de69] replay_one_buffer+0x239/0x320 [btrfs] [ 65.396712] [f961cc97] ? alloc_extent_buffer+0x77/0x3a0 [btrfs] [ 65.396712] [f962b7a9] walk_down_log_tree+0x1d9/0x370 [btrfs] [ 65.396712] [f962b9d9] walk_log_tree+0x99/0x1c0 [btrfs] [ 65.396712] [f962f2fa] btrfs_recover_log_trees+0x1da/0x2a0 [btrfs] [ 65.396712] [f962dc30] ? replay_one_dir_item+0xb0/0xb0 [btrfs] [ 65.396712] [f95f6749] open_ctree+0x1129/0x1490 [btrfs] [ 65.396712] [c11ac7a9] ? strlcpy+0x39/0x50 [ 65.396712] [f95d756b] btrfs_mount+0x4ab/0x5b0 [btrfs] [ 65.396712] [c1109d31] mount_fs+0x31/0x170 [ 65.396712] [c11207ac] vfs_kern_mount+0x4c/0x90 [ 65.396712] [c1120b49] do_kern_mount+0x39/0xd0 [ 65.396712] [c1121e31] do_mount+0x161/0x700 [ 65.396712] [c11226f6] sys_mount+0x66/0xa0 [ 65.396712] [c1330edf] sysenter_do_call+0x12/0x28 [ 65.396712] Code: 44 24 08 00 00 00 00 89 4c 24 0c 8b 4d 08 89 34 24 e8 73 cc fe ff 85 c0 0f 84 f0 fe ff ff 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 0f 0b 8b 81 d8 fe ff ff 8d 55 e3 b9 11 00 00 00 89 d7 05 03 01 [ 65.396712] EIP: [f9604072] btrfs_add_link+0x172/0x200 [btrfs] SS:ESP 0068:eec77b48 [ 65.397464] ---[ end trace 5f278c10a67bc917 ]--- [ 65.519660] Adding 2561304k swap on /dev/mapper/swapDevice. Priority:-1 extents:1 across:2561304k [ 67.243199] microcode: CPU0 sig=0x106c2, pf=0x4, revision=0x20a [ 67.292031] microcode: CPU1 sig=0x106c2, pf=0x4, revision=0x20a [ 67.298402] microcode: Microcode Update Driver: v2.00 tig...@aivazian.fsnet.co.uk, Peter Oruba
] Bluetooth: RFCOMM TTY layer initialized [ 71.156497] Bluetooth: RFCOMM socket layer initialized [ 71.160691] Bluetooth: RFCOMM ver 1.11 [ 118.324105] ip_tables: (C) 2000-2006 Netfilter Core Team [ 118.574729] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) -- Marek Otahal :o) signature.asc Description: This is a digitally signed message part.
On Friday 12 of November 2010 18:44:12 you wrote: On Thu, Nov 11, 2010 at 11:41 PM, Josef Bacik jo...@redhat.com wrote: On Fri, Nov 12, 2010 at 05:47:14PM +1100, Chris Samuel wrote: On 11/11/10 23:52, Josef Bacik wrote: This feature incurs a performance penalty in larger filesystems, it is recommended for use with filesystems of 1 GiB or smaller. Maybe slightly stronger, for example: This feature incurs a performance penalty for larger filesystems and it is ONLY recommended for use with filesystems of 1 GiB or smaller. Is it worth having a check and a warning printed if a user does try and make a filesystem larger than 1GiB with this option ? Just in case they don't RTFM... No because depending on your usage it's actually kind of usefull for anything less than 5 GiB, and you're only looking at about a 5-10% perf degredation when using it on larger filesystems. Thanks, Then a warning of 10% slowdown if 10GB would be good. It's surprising how many will just read some forum post and not concern themselves with the docs at all. And making them type yes if 100GB is probably a good idea too... My 2c: I'm against bloating the program just because of people who don't RTFM. Just mention it clearly in docs and that's enough, linux does what it's asked for, not the Are you really really sure you want to do this? known from some other OS. Anyway, btrfs-progs would be probably run by a user with root privileges and such should be aware of what actions they do, or read the man page. My opinion. Cheers, Mark -- Marek Otahal :o) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
here is the output of btrfs-debug-tree /dev/mapper/homeDevice cca 115MB http://leteckaposta.cz/file/156803395.1/02a22dde7c98235100610b24f28db4b064e7be4c or http://leteckaposta.cz/156803395 Regards, Marek -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html