Re: dduper - Offline btrfs deduplication tool
On Fri, Sep 07, 2018 at 09:27:28AM +0530, Lakshmipathi.G wrote: > > > > One question: > > Why not ioctl_fideduperange? > > i.e. you kill most of benefits from that ioctl - atomicity. > > > I plan to add fideduperange as an option too. User can > choose between fideduperange and ficlonerange call. > > If I'm not wrong, with fideduperange, kernel performs > comparsion check before dedupe. And it will increase > time to dedupe files. Creating the backup reflink file takes far more time than you will ever save from fideduperange. You don't need the md5sum either, unless you have a data set that is full of crc32 collisions (e.g. a file format that puts a CRC32 at the end of each 4K block). The few people who have such a data set can enable md5sums, everyone else can have md5sums disabled by default. > I believe the risk involved with ficlonerange is minimized > by having a backup file(reflinked). We can revert to older > original file, if we encounter some problems. With fideduperange the risk is more than minimized--it's completely eliminated. If you don't use fideduperange you can't use the tool on a live data set at all. > > > > -- > > Have a nice day, > > Timofey. > > Cheers. > Lakshmipathi.G signature.asc Description: PGP signature
Re: dduper - Offline btrfs deduplication tool
On Fri, Sep 07, 2018 at 09:27:28AM +0530, Lakshmipathi.G wrote: > > One question: > > Why not ioctl_fideduperange? > > i.e. you kill most of benefits from that ioctl - atomicity. > > > I plan to add fideduperange as an option too. User can > choose between fideduperange and ficlonerange call. > > If I'm not wrong, with fideduperange, kernel performs > comparsion check before dedupe. And it will increase > time to dedupe files. You already read the files to md5sum them, so you have no speed gain. You get nasty data-losing races, and risk collisions as well. md5sum is safe against random occurences (compared eg. to the chance of lightning hitting you today), but is exploitable by a hostile user. On the other hand, full bit-to-bit comparison is faster and 100% safe. You can't skip verification -- the checksums are only 32-bit. They have a 1:4G chance to mismatch, which means you can expect one false positive with 64K extents, rising quadratically as the number of files grows. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ ⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second ⠈⠳⣄ preimage for double rot13!
Re: dduper - Offline btrfs deduplication tool
> > One question: > Why not ioctl_fideduperange? > i.e. you kill most of benefits from that ioctl - atomicity. > I plan to add fideduperange as an option too. User can choose between fideduperange and ficlonerange call. If I'm not wrong, with fideduperange, kernel performs comparsion check before dedupe. And it will increase time to dedupe files. I believe the risk involved with ficlonerange is minimized by having a backup file(reflinked). We can revert to older original file, if we encounter some problems. > > -- > Have a nice day, > Timofey. Cheers. Lakshmipathi.G
Re: dduper - Offline btrfs deduplication tool
пт, 24 авг. 2018 г. в 7:41, Lakshmipathi.G : > > Hi - > > dduper is an offline dedupe tool. Instead of reading whole file blocks and > computing checksum, It works by fetching checksum from BTRFS csum tree. This > hugely improves the performance. > > dduper works like: > - Read csum for given two files. > - Find matching location. > - Pass the location to ioctl_ficlonerange directly > instead of ioctl_fideduperange > > By default, dduper adds safty check to above steps by creating a > backup reflink file and compares the md5sum after dedupe. > If the backup file matches new deduped file, then backup file is > removed. You can skip this check by passing --skip option. Here is > sample cli usage [1] and quick demo [2] > > Some performance numbers: (with -skip option) > > Dedupe two 1GB files with same content - 1.2 seconds > Dedupe two 5GB files with same content - 8.2 seconds > Dedupe two 10GB files with same content - 13.8 seconds > > dduper requires `btrfs inspect-internal dump-csum` command, you can use > this branch [3] or apply patch by yourself [4] > > [1] > https://gitlab.collabora.com/laks/btrfs-progs/blob/dump_csum/Documentation/dduper_usage.md > [2] http://giis.co.in/btrfs_dedupe.gif > [3] git clone https://gitlab.collabora.com/laks/btrfs-progs.git -b dump_csum > [4] https://patchwork.kernel.org/patch/10540229/ > > Please remember its version-0.1, so test it out, if you plan to use dduper > real data. > Let me know, if you have suggestions or feedback or bugs :) > > Cheers. > Lakshmipathi.G > One question: Why not ioctl_fideduperange? i.e. you kill most of benefits from that ioctl - atomicity. -- Have a nice day, Timofey.
dduper - Offline btrfs deduplication tool
Hi - dduper is an offline dedupe tool. Instead of reading whole file blocks and computing checksum, It works by fetching checksum from BTRFS csum tree. This hugely improves the performance. dduper works like: - Read csum for given two files. - Find matching location. - Pass the location to ioctl_ficlonerange directly instead of ioctl_fideduperange By default, dduper adds safty check to above steps by creating a backup reflink file and compares the md5sum after dedupe. If the backup file matches new deduped file, then backup file is removed. You can skip this check by passing --skip option. Here is sample cli usage [1] and quick demo [2] Some performance numbers: (with -skip option) Dedupe two 1GB files with same content - 1.2 seconds Dedupe two 5GB files with same content - 8.2 seconds Dedupe two 10GB files with same content - 13.8 seconds dduper requires `btrfs inspect-internal dump-csum` command, you can use this branch [3] or apply patch by yourself [4] [1] https://gitlab.collabora.com/laks/btrfs-progs/blob/dump_csum/Documentation/dduper_usage.md [2] http://giis.co.in/btrfs_dedupe.gif [3] git clone https://gitlab.collabora.com/laks/btrfs-progs.git -b dump_csum [4] https://patchwork.kernel.org/patch/10540229/ Please remember its version-0.1, so test it out, if you plan to use dduper real data. Let me know, if you have suggestions or feedback or bugs :) Cheers. Lakshmipathi.G
Re: BTRFS Deduplication
On Mon, Sep 11, 2017 at 2:55 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > > On 2017年09月11日 17:14, Qu Wenruo wrote: >> >> >> >> On 2017年09月11日 16:57, shally verma wrote: >>> >>> On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com> >>> wrote: >>>> >>>> >>>> >>>> On 2017年09月11日 15:54, shally verma wrote: >>>>> >>>>> >>>>> On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2017年09月11日 14:05, shally verma wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I was going through BTRFS Deduplication page >>>>>>> (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read >>>>>>> >>>>>>> "As such, xfs_io, is able to perform deduplication on a BTRFS file >>>>>>> system," .. >>>>>>> >>>>>>> following this, I followed on to xfs_io link >>>>>>> https://linux.die.net/man/8/xfs_io >>>>>>> >>>>>>> As I understand, these are set of commands allow us to do different >>>>>>> operations on "xfs" filesystem. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Nope, it's just a tool triggering different read/write or ioctls. >>>>>> In fact most of its command is fs independent. >>>>>> Only a limited number of operations are only supported by XFS. >>>>>> >>>>>> It's just due to historical reasons it's still named as xfs_io. >>>>>> >>>>>> I won't be surprised if one day it's split as an independent tool. >>>>>> >>>>>>> and command set mentioned here, couldn't see which is command to >>>>>>> invoke dedupe task. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> "dedupe" and "reflink" command. >>>>> >>>>> >>>>> Oh. That means page link referred on BTRFS Wiki page is not updated >>>>> with this. I googled another page that has reference of these two >>>>> command in xfs_io here >>>>> https://www.systutorials.com/docs/linux/man/8-xfs_io/ >>>>> May be Wiki need an update here. >>>> >>>> >>>> >>>> If XFS has a regularly updated online man page, we can just use that. >>>> (But unfortunately, not every fs user tools use asciidoc like btrfs, >>>> which >>>> can generate both man page and html). >>>> >>>>> >>>>>> >>>>>>> and how this works with BTRFS. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use >>>>>> it >>>>>> to >>>>>> determine if two ranges are containing identical data. >>>>>> >>>>>> And if they are identical, we use FICLONERANGE or >>>>>> BTRFS_IOC_CLONE_RANGE >>>>>> ioctl to reflink one to another, freeing one of them. >>>>>> >>>>>> BTW nowadays, such dedupe and reflink ioctl is genericized in VFS. >>>>>> file_operations structure now includes both clone_file_range() and >>>>>> dedupe_file_range() callbacks now. >>>>> >>>>> >>>>> Yea. Understand that part. So going by description of "dedupe" and >>>>> "reflink", seems through these commands, one can do deduplication part >>>>> and NOT duplicate find part. >>>> >>>> >>>> >>>> Yes, one don't need to call "dedupe" ioctl if they already knows some >>>> data >>>> is identical and can go reflink straightforward. >>>> >>>>> That's still out of xfs_io command scope. >>>> >>>> >>>> >>>> Not sure what the scope here you mean, sorry for that. >>>> >>> By "scope", I meant duplicate find part but that contradicts statement >>
Re: BTRFS Deduplication
On 2017年09月11日 17:14, Qu Wenruo wrote: On 2017年09月11日 16:57, shally verma wrote: On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: On 2017年09月11日 15:54, shally verma wrote: On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: On 2017年09月11日 14:05, shally verma wrote: I was going through BTRFS Deduplication page (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read "As such, xfs_io, is able to perform deduplication on a BTRFS file system," .. following this, I followed on to xfs_io link https://linux.die.net/man/8/xfs_io As I understand, these are set of commands allow us to do different operations on "xfs" filesystem. Nope, it's just a tool triggering different read/write or ioctls. In fact most of its command is fs independent. Only a limited number of operations are only supported by XFS. It's just due to historical reasons it's still named as xfs_io. I won't be surprised if one day it's split as an independent tool. and command set mentioned here, couldn't see which is command to invoke dedupe task. "dedupe" and "reflink" command. Oh. That means page link referred on BTRFS Wiki page is not updated with this. I googled another page that has reference of these two command in xfs_io here https://www.systutorials.com/docs/linux/man/8-xfs_io/ May be Wiki need an update here. If XFS has a regularly updated online man page, we can just use that. (But unfortunately, not every fs user tools use asciidoc like btrfs, which can generate both man page and html). and how this works with BTRFS. Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it to determine if two ranges are containing identical data. And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE ioctl to reflink one to another, freeing one of them. BTW nowadays, such dedupe and reflink ioctl is genericized in VFS. file_operations structure now includes both clone_file_range() and dedupe_file_range() callbacks now. Yea. Understand that part. So going by description of "dedupe" and "reflink", seems through these commands, one can do deduplication part and NOT duplicate find part. Yes, one don't need to call "dedupe" ioctl if they already knows some data is identical and can go reflink straightforward. That's still out of xfs_io command scope. Not sure what the scope here you mean, sorry for that. By "scope", I meant duplicate find part but that contradicts statement you just written below: Since xfs_io can be used to find duplication, Since "dedupe" command input only a "source file" and src and dst_offset within that, so it can deduplicate the content within a file where actual FS dedupe IOCTL can first ensure if two extents are identical and if yes, then deduplicate them. By "deduplicate", if you mean "removing duplication" then xfs_io "dedupe" command itself doesn't do that. The old btrfs ioctl describe this better, FILE_EXTENT_SAME. "dedupe" command itself is only verifying if they have the same content. So to make it clear, "dedupe" command and ioctl only do the *verification* work. Sorry, I just checked the code and tried the ioctl. If they are the same, "dedupe" will do "reflink" part also. Code also shows that: --- /* pass original length for comparison so we stay within i_size */ ret = btrfs_cmp_data(olen, ); if (ret == 0) ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1); --- So "dedupe" ioctl itself can do de-duplication. And my previous answer is just totally wrong. Sorry for that, Qu "Reflink" will really remove the duplication (or even non-duplicated data if you really want). But please be careful, "reflink" is much like copy, so it can be executed on file ranges with different contents. In that case, reflink can free some space, but it also modifies the content. So for full de-duplication, one must go through the full *verify* then *reflink* circle. Although "dedupe"(FILE_EXTENT_SAME) ioctl provides one verification method, it's not the only solution. But anyway, "dedupe" and "reflink" command provided by xfs_io does provide every pieces to do de-duplication, so the wiki is still correct IMHO. Thanks, Qu Is that correct? Thanks Shally and can remove duplication, I don't find anything strange in that wiki page. (Especially considering how popular the tool is, you can't find any more handy tool than xfs_io) Thanks, Qu Is that understanding correct? Thanks Shally Thanks, Qu So, can anyone help here and point me what am I missing here. Thanks Shally -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the
Re: BTRFS Deduplication
On 2017年09月11日 16:57, shally verma wrote: On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: On 2017年09月11日 15:54, shally verma wrote: On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: On 2017年09月11日 14:05, shally verma wrote: I was going through BTRFS Deduplication page (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read "As such, xfs_io, is able to perform deduplication on a BTRFS file system," .. following this, I followed on to xfs_io link https://linux.die.net/man/8/xfs_io As I understand, these are set of commands allow us to do different operations on "xfs" filesystem. Nope, it's just a tool triggering different read/write or ioctls. In fact most of its command is fs independent. Only a limited number of operations are only supported by XFS. It's just due to historical reasons it's still named as xfs_io. I won't be surprised if one day it's split as an independent tool. and command set mentioned here, couldn't see which is command to invoke dedupe task. "dedupe" and "reflink" command. Oh. That means page link referred on BTRFS Wiki page is not updated with this. I googled another page that has reference of these two command in xfs_io here https://www.systutorials.com/docs/linux/man/8-xfs_io/ May be Wiki need an update here. If XFS has a regularly updated online man page, we can just use that. (But unfortunately, not every fs user tools use asciidoc like btrfs, which can generate both man page and html). and how this works with BTRFS. Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it to determine if two ranges are containing identical data. And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE ioctl to reflink one to another, freeing one of them. BTW nowadays, such dedupe and reflink ioctl is genericized in VFS. file_operations structure now includes both clone_file_range() and dedupe_file_range() callbacks now. Yea. Understand that part. So going by description of "dedupe" and "reflink", seems through these commands, one can do deduplication part and NOT duplicate find part. Yes, one don't need to call "dedupe" ioctl if they already knows some data is identical and can go reflink straightforward. That's still out of xfs_io command scope. Not sure what the scope here you mean, sorry for that. By "scope", I meant duplicate find part but that contradicts statement you just written below: Since xfs_io can be used to find duplication, Since "dedupe" command input only a "source file" and src and dst_offset within that, so it can deduplicate the content within a file where actual FS dedupe IOCTL can first ensure if two extents are identical and if yes, then deduplicate them. By "deduplicate", if you mean "removing duplication" then xfs_io "dedupe" command itself doesn't do that. The old btrfs ioctl describe this better, FILE_EXTENT_SAME. "dedupe" command itself is only verifying if they have the same content. So to make it clear, "dedupe" command and ioctl only do the *verification* work. "Reflink" will really remove the duplication (or even non-duplicated data if you really want). But please be careful, "reflink" is much like copy, so it can be executed on file ranges with different contents. In that case, reflink can free some space, but it also modifies the content. So for full de-duplication, one must go through the full *verify* then *reflink* circle. Although "dedupe"(FILE_EXTENT_SAME) ioctl provides one verification method, it's not the only solution. But anyway, "dedupe" and "reflink" command provided by xfs_io does provide every pieces to do de-duplication, so the wiki is still correct IMHO. Thanks, Qu Is that correct? Thanks Shally and can remove duplication, I don't find anything strange in that wiki page. (Especially considering how popular the tool is, you can't find any more handy tool than xfs_io) Thanks, Qu Is that understanding correct? Thanks Shally Thanks, Qu So, can anyone help here and point me what am I missing here. Thanks Shally -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Deduplication
On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > > On 2017年09月11日 15:54, shally verma wrote: >> >> On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> >> wrote: >>> >>> >>> >>> On 2017年09月11日 14:05, shally verma wrote: >>>> >>>> >>>> I was going through BTRFS Deduplication page >>>> (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read >>>> >>>> "As such, xfs_io, is able to perform deduplication on a BTRFS file >>>> system," .. >>>> >>>> following this, I followed on to xfs_io link >>>> https://linux.die.net/man/8/xfs_io >>>> >>>> As I understand, these are set of commands allow us to do different >>>> operations on "xfs" filesystem. >>> >>> >>> >>> Nope, it's just a tool triggering different read/write or ioctls. >>> In fact most of its command is fs independent. >>> Only a limited number of operations are only supported by XFS. >>> >>> It's just due to historical reasons it's still named as xfs_io. >>> >>> I won't be surprised if one day it's split as an independent tool. >>> >>>> and command set mentioned here, couldn't see which is command to >>>> invoke dedupe task. >>> >>> >>> >>> "dedupe" and "reflink" command. >> >> Oh. That means page link referred on BTRFS Wiki page is not updated >> with this. I googled another page that has reference of these two >> command in xfs_io here >> https://www.systutorials.com/docs/linux/man/8-xfs_io/ >> May be Wiki need an update here. > > > If XFS has a regularly updated online man page, we can just use that. > (But unfortunately, not every fs user tools use asciidoc like btrfs, which > can generate both man page and html). > >> >>> >>>> and how this works with BTRFS. >>> >>> >>> >>> Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it >>> to >>> determine if two ranges are containing identical data. >>> >>> And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE >>> ioctl to reflink one to another, freeing one of them. >>> >>> BTW nowadays, such dedupe and reflink ioctl is genericized in VFS. >>> file_operations structure now includes both clone_file_range() and >>> dedupe_file_range() callbacks now. >> >> Yea. Understand that part. So going by description of "dedupe" and >> "reflink", seems through these commands, one can do deduplication part >> and NOT duplicate find part. > > > Yes, one don't need to call "dedupe" ioctl if they already knows some data > is identical and can go reflink straightforward. > >> That's still out of xfs_io command scope. > > > Not sure what the scope here you mean, sorry for that. > By "scope", I meant duplicate find part but that contradicts statement you just written below: > Since xfs_io can be used to find duplication, Since "dedupe" command input only a "source file" and src and dst_offset within that, so it can deduplicate the content within a file where actual FS dedupe IOCTL can first ensure if two extents are identical and if yes, then deduplicate them. Is that correct? Thanks Shally and can remove duplication, I > don't find anything strange in that wiki page. > (Especially considering how popular the tool is, you can't find any more > handy tool than xfs_io) > > Thanks, > Qu > > >> Is that understanding correct? >> Thanks >> Shally >>> >>> >>> Thanks, >>> Qu >>>> >>>> >>>> >>>> So, can anyone help here and point me what am I missing here. >>>> >>>> Thanks >>>> Shally >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>>> in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Deduplication
On 2017年09月11日 15:54, shally verma wrote: On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: On 2017年09月11日 14:05, shally verma wrote: I was going through BTRFS Deduplication page (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read "As such, xfs_io, is able to perform deduplication on a BTRFS file system," .. following this, I followed on to xfs_io link https://linux.die.net/man/8/xfs_io As I understand, these are set of commands allow us to do different operations on "xfs" filesystem. Nope, it's just a tool triggering different read/write or ioctls. In fact most of its command is fs independent. Only a limited number of operations are only supported by XFS. It's just due to historical reasons it's still named as xfs_io. I won't be surprised if one day it's split as an independent tool. and command set mentioned here, couldn't see which is command to invoke dedupe task. "dedupe" and "reflink" command. Oh. That means page link referred on BTRFS Wiki page is not updated with this. I googled another page that has reference of these two command in xfs_io here https://www.systutorials.com/docs/linux/man/8-xfs_io/ May be Wiki need an update here. If XFS has a regularly updated online man page, we can just use that. (But unfortunately, not every fs user tools use asciidoc like btrfs, which can generate both man page and html). and how this works with BTRFS. Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it to determine if two ranges are containing identical data. And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE ioctl to reflink one to another, freeing one of them. BTW nowadays, such dedupe and reflink ioctl is genericized in VFS. file_operations structure now includes both clone_file_range() and dedupe_file_range() callbacks now. Yea. Understand that part. So going by description of "dedupe" and "reflink", seems through these commands, one can do deduplication part and NOT duplicate find part. Yes, one don't need to call "dedupe" ioctl if they already knows some data is identical and can go reflink straightforward. That's still out of xfs_io command scope. Not sure what the scope here you mean, sorry for that. Since xfs_io can be used to find duplication, and can remove duplication, I don't find anything strange in that wiki page. (Especially considering how popular the tool is, you can't find any more handy tool than xfs_io) Thanks, Qu Is that understanding correct? Thanks Shally Thanks, Qu So, can anyone help here and point me what am I missing here. Thanks Shally -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Deduplication
On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > > On 2017年09月11日 14:05, shally verma wrote: >> >> I was going through BTRFS Deduplication page >> (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read >> >> "As such, xfs_io, is able to perform deduplication on a BTRFS file >> system," .. >> >> following this, I followed on to xfs_io link >> https://linux.die.net/man/8/xfs_io >> >> As I understand, these are set of commands allow us to do different >> operations on "xfs" filesystem. > > > Nope, it's just a tool triggering different read/write or ioctls. > In fact most of its command is fs independent. > Only a limited number of operations are only supported by XFS. > > It's just due to historical reasons it's still named as xfs_io. > > I won't be surprised if one day it's split as an independent tool. > >> and command set mentioned here, couldn't see which is command to >> invoke dedupe task. > > > "dedupe" and "reflink" command. Oh. That means page link referred on BTRFS Wiki page is not updated with this. I googled another page that has reference of these two command in xfs_io here https://www.systutorials.com/docs/linux/man/8-xfs_io/ May be Wiki need an update here. > >> and how this works with BTRFS. > > > Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it to > determine if two ranges are containing identical data. > > And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE > ioctl to reflink one to another, freeing one of them. > > BTW nowadays, such dedupe and reflink ioctl is genericized in VFS. > file_operations structure now includes both clone_file_range() and > dedupe_file_range() callbacks now. Yea. Understand that part. So going by description of "dedupe" and "reflink", seems through these commands, one can do deduplication part and NOT duplicate find part. That's still out of xfs_io command scope. Is that understanding correct? Thanks Shally > > Thanks, > Qu >> >> >> So, can anyone help here and point me what am I missing here. >> >> Thanks >> Shally >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Deduplication
On 2017年09月11日 14:05, shally verma wrote: I was going through BTRFS Deduplication page (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read "As such, xfs_io, is able to perform deduplication on a BTRFS file system," .. following this, I followed on to xfs_io link https://linux.die.net/man/8/xfs_io As I understand, these are set of commands allow us to do different operations on "xfs" filesystem. Nope, it's just a tool triggering different read/write or ioctls. In fact most of its command is fs independent. Only a limited number of operations are only supported by XFS. It's just due to historical reasons it's still named as xfs_io. I won't be surprised if one day it's split as an independent tool. and command set mentioned here, couldn't see which is command to invoke dedupe task. "dedupe" and "reflink" command. and how this works with BTRFS. Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it to determine if two ranges are containing identical data. And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE ioctl to reflink one to another, freeing one of them. BTW nowadays, such dedupe and reflink ioctl is genericized in VFS. file_operations structure now includes both clone_file_range() and dedupe_file_range() callbacks now. Thanks, Qu So, can anyone help here and point me what am I missing here. Thanks Shally -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS Deduplication
I was going through BTRFS Deduplication page (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read "As such, xfs_io, is able to perform deduplication on a BTRFS file system," .. following this, I followed on to xfs_io link https://linux.die.net/man/8/xfs_io As I understand, these are set of commands allow us to do different operations on "xfs" filesystem. and command set mentioned here, couldn't see which is command to invoke dedupe task. and how this works with BTRFS. So, can anyone help here and point me what am I missing here. Thanks Shally -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
On 12/09/16 16:43, Chris Murphy wrote: >> If compression has nothing to do with this, then this is heavy >> fragmentation. > > It's probably not that fragmented. Due to compression, metadata > describes 128KiB extents even though the data is actually contiguous. > > And it might be the same thing in my case also, even though no > compression is involved. In that case you can quickly collapse physically contiguous ranges by reflink-mv'ing (ie. a recent mv) the file across subvolume boundaries and back. :) -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
On Fri, Dec 9, 2016 at 6:45 AM, Swâmi Petarameshwrote: > Hi Chris, thanks for your answer, > > On 12/09/2016 03:58 AM, Chris Murphy wrote: >> Can you check some bigger files and see if they've become fragmented? >> I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to >> over 5000 fragments during dedupe. This is not something I recall >> happening some months ago. > > I have checked directories containing VM hard disks, that would be good > candidates. As they're backed up using full rsyncs, I wouldn't expect > them to be heavily fragmented (OTOH the whole BTRFS filesystem is lzo > compressed, and I believe that it may affect the number of extents > reported by filefrag...?) > > Anyway this is the number of fragments that I get for a bunch of VMS HD > files which are in the range from a couple GB to about 20 GB. > > The number of fragments reported by filefrag : 2907, 2560, 314, 10107 > > If compression has nothing to do with this, then this is heavy > fragmentation. It's probably not that fragmented. Due to compression, metadata describes 128KiB extents even though the data is actually contiguous. And it might be the same thing in my case also, even though no compression is involved. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
Hi Jeff, thanks for your reply, On 12/08/2016 09:07 PM, Jeff Mahoney wrote: > What version were you using? That's v0.11.beta4, installed rather recently > What throughput are you getting to that disk? I get that it's USB3, but > reading 1TB doesn't take a terribly long time so 15 days is pretty > ridiculous. This is run from inside a VM, onto a physical USB3 HD. Copying to/from this HD shows a speed that corresponds to what I would expect on the same HD connected to a physical (not virtual) setup. The only quick data that I can get are from "hdparm", that says : - Timed cached reads : 5976 MB/sec - Timed buffered disk reads : 105 MB/sec Kind regards. ॐ -- Swâmi PetarameshPGP 9076E32E -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
Hi Chris, thanks for your answer, On 12/09/2016 03:58 AM, Chris Murphy wrote: > Can you check some bigger files and see if they've become fragmented? > I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to > over 5000 fragments during dedupe. This is not something I recall > happening some months ago. I have checked directories containing VM hard disks, that would be good candidates. As they're backed up using full rsyncs, I wouldn't expect them to be heavily fragmented (OTOH the whole BTRFS filesystem is lzo compressed, and I believe that it may affect the number of extents reported by filefrag...?) Anyway this is the number of fragments that I get for a bunch of VMS HD files which are in the range from a couple GB to about 20 GB. The number of fragments reported by filefrag : 2907, 2560, 314, 10107 If compression has nothing to do with this, then this is heavy fragmentation. Kind regards. ॐ -- Swâmi PetarameshPGP 9076E32E -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
> 2016-12-08 16:11 GMT+01:00 Swâmi Petaramesh: > > Then it took another 48 hours just for "loading the hashes of duplicate > extents". > This issue i adressing currently with the following patches: https://github.com/Floyddotnet/duperemove/commits/digest_trigger Tested with a 3,9 TB directory, with 4723 objects: old implementation of dbfile_load_hashes took 36593ms new implementation of dbfile_load_hashes took 11ms You can use this versions save. But i have to do more work. (for example a migrationscript for existing hashfiles) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
On Thu, Dec 8, 2016 at 8:11 AM, Swâmi Petarameshwrote: > Well, the damn thing has been running for 15 days uninterrupted ! > ...Until I [Ctrl]-C it this morning as I had to move with the machine (I > wasn't expecting it to last THAT long...). Can you check some bigger files and see if they've become fragmented? I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to over 5000 fragments during dedupe. This is not something I recall happening some months ago. I inadvertently replied to the wrong dedupe thread about my test and what I'm finding, it's here. https://www.spinics.net/lists/linux-btrfs/msg61304.html But if you're seeing something similar, then it would explain why it's so slow in your case. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
On 2016-12-08 15:07, Jeff Mahoney wrote: On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote: On 2016-12-08 10:11, Swâmi Petaramesh wrote: Hi, Some real world figures about running duperemove deduplication on BTRFS : I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the BTRFS backups (full rsync) of 5 PCs, using 2 different distros, typically at the same update level, and all of them more of less sharing the entirety or part of the same set of user files. For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots for having complete backups at different points in time. The HD was full to 93% and made a good testbed for deduplicating. So I ran duperemove on this HD, on a machine doing "only this", using a hashfile. The machine being an Intel i5 with 6 GB of RAM. Well, the damn thing has been running for 15 days uninterrupted ! ...Until I [Ctrl]-C it this morning as I had to move with the machine (I wasn't expecting it to last THAT long...). It took about 48 hours just for calculating the files hashes. Then it took another 48 hours just for "loading the hashes of duplicate extents". Then it took 11 days deduplicating until I killed it. At the end, the disk that was 93% full is now 76% full, so I saved 17% of 1 TB (170 GB) by deduplicating for 15 days. Well the thing "works" and my disk isn't full anymore, so that's a very partial success, but still l wonder if the gain is worth the effort... So, some general explanation here: Duperemove hashes data in blocks of (by default) 128kB, which means for ~930GB, you've got about 7618560 blocks to hash, which partly explains why it took so long to hash. Once that's done, it then has to compare hashes for all combinations of those blocks, which totals to 58042456473600 comparisons (hence that taking a long time). The block size thus becomes a trade-off between performance when hashing and actual space savings (smaller block size makes hashing take longer, but gives overall slightly better results for deduplication). IIRC, the core of the duperemove duplicate matcher isn't an O(n^2) algorithm. I think Mark used a bloom filter to reduce the data set prior to matching, but I haven't looked at the code in a while. You're right, I had completely forgotten about that. Regardless of that though, it's still a lot of processing that needs done. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote: > On 2016-12-08 10:11, Swâmi Petaramesh wrote: >> Hi, Some real world figures about running duperemove deduplication on >> BTRFS : >> >> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the >> BTRFS backups (full rsync) of 5 PCs, using 2 different distros, >> typically at the same update level, and all of them more of less sharing >> the entirety or part of the same set of user files. >> >> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots >> for having complete backups at different points in time. >> >> The HD was full to 93% and made a good testbed for deduplicating. >> >> So I ran duperemove on this HD, on a machine doing "only this", using a >> hashfile. The machine being an Intel i5 with 6 GB of RAM. >> >> Well, the damn thing has been running for 15 days uninterrupted ! >> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I >> wasn't expecting it to last THAT long...). >> >> It took about 48 hours just for calculating the files hashes. >> >> Then it took another 48 hours just for "loading the hashes of duplicate >> extents". >> >> Then it took 11 days deduplicating until I killed it. >> >> At the end, the disk that was 93% full is now 76% full, so I saved 17% >> of 1 TB (170 GB) by deduplicating for 15 days. >> >> Well the thing "works" and my disk isn't full anymore, so that's a very >> partial success, but still l wonder if the gain is worth the effort... > So, some general explanation here: > Duperemove hashes data in blocks of (by default) 128kB, which means for > ~930GB, you've got about 7618560 blocks to hash, which partly explains > why it took so long to hash. Once that's done, it then has to compare > hashes for all combinations of those blocks, which totals to > 58042456473600 comparisons (hence that taking a long time). The block > size thus becomes a trade-off between performance when hashing and > actual space savings (smaller block size makes hashing take longer, but > gives overall slightly better results for deduplication). IIRC, the core of the duperemove duplicate matcher isn't an O(n^2) algorithm. I think Mark used a bloom filter to reduce the data set prior to matching, but I haven't looked at the code in a while. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: duperemove : some real world figures on BTRFS deduplication
On 12/8/16 10:11 AM, Swâmi Petaramesh wrote: > Hi, Some real world figures about running duperemove deduplication on > BTRFS : > > I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the > BTRFS backups (full rsync) of 5 PCs, using 2 different distros, > typically at the same update level, and all of them more of less sharing > the entirety or part of the same set of user files. > > For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots > for having complete backups at different points in time. > > The HD was full to 93% and made a good testbed for deduplicating. > > So I ran duperemove on this HD, on a machine doing "only this", using a > hashfile. The machine being an Intel i5 with 6 GB of RAM. > > Well, the damn thing has been running for 15 days uninterrupted ! > ...Until I [Ctrl]-C it this morning as I had to move with the machine (I > wasn't expecting it to last THAT long...). > > It took about 48 hours just for calculating the files hashes. > > Then it took another 48 hours just for "loading the hashes of duplicate > extents". > > Then it took 11 days deduplicating until I killed it. > > At the end, the disk that was 93% full is now 76% full, so I saved 17% > of 1 TB (170 GB) by deduplicating for 15 days. > > Well the thing "works" and my disk isn't full anymore, so that's a very > partial success, but still l wonder if the gain is worth the effort... What version were you using? I know Mark had put a bunch of effort into reducing the memory footprint and runtime. The earlier versions were "can we get this thing working" while the newer versions are more efficient. What throughput are you getting to that disk? I get that it's USB3, but reading 1TB doesn't take a terribly long time so 15 days is pretty ridiculous. At any rate, the good news is that when you run it again, assuming you used the hash file, it will not have to rescan most of your data set. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: duperemove : some real world figures on BTRFS deduplication
2016-12-08 18:42 GMT+03:00 Austin S. Hemmelgarn: > On 2016-12-08 10:11, Swâmi Petaramesh wrote: >> >> Hi, Some real world figures about running duperemove deduplication on >> BTRFS : >> >> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the >> BTRFS backups (full rsync) of 5 PCs, using 2 different distros, >> typically at the same update level, and all of them more of less sharing >> the entirety or part of the same set of user files. >> >> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots >> for having complete backups at different points in time. >> >> The HD was full to 93% and made a good testbed for deduplicating. >> >> So I ran duperemove on this HD, on a machine doing "only this", using a >> hashfile. The machine being an Intel i5 with 6 GB of RAM. >> >> Well, the damn thing has been running for 15 days uninterrupted ! >> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I >> wasn't expecting it to last THAT long...). >> >> It took about 48 hours just for calculating the files hashes. >> >> Then it took another 48 hours just for "loading the hashes of duplicate >> extents". >> >> Then it took 11 days deduplicating until I killed it. >> >> At the end, the disk that was 93% full is now 76% full, so I saved 17% >> of 1 TB (170 GB) by deduplicating for 15 days. >> >> Well the thing "works" and my disk isn't full anymore, so that's a very >> partial success, but still l wonder if the gain is worth the effort... > > So, some general explanation here: > Duperemove hashes data in blocks of (by default) 128kB, which means for > ~930GB, you've got about 7618560 blocks to hash, which partly explains why > it took so long to hash. Once that's done, it then has to compare hashes > for all combinations of those blocks, which totals to 58042456473600 > comparisons (hence that taking a long time). The block size thus becomes a > trade-off between performance when hashing and actual space savings (smaller > block size makes hashing take longer, but gives overall slightly better > results for deduplication). > > As far as the rest, given your hashing performance (which is not > particularly good I might add, roughly 5.6MB/s), the amount of time it was > taking to do the actual deduplication is reasonable since the deduplication > ioctl does a byte-wise comparison of the extents to be deduplicated prior to > actually ref-linking them to ensure you don't lose data. > > Because of this, generic batch deduplication is not all that great on BTRFS. > There are cases where it can work, but usually they're pretty specific > cases. In most cases though, you're better off doing a custom tool that > knows about how your data is laid out and what's likely to be duplicated > (I've actually got two tools for this for the two cases where I use > deduplication, they use knowledge of the data-set itself to figure out > what's duplicated, then just call the ioctl through a wrapper (previously > the one included in duperemove, currently xfs_io)). > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Zygo do the good job on this too. Try: https://github.com/Zygo/bees It's cool and can work better on large massive of data, because it dedup in the same time with scanning phase. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: duperemove : some real world figures on BTRFS deduplication
On 2016-12-08 10:11, Swâmi Petaramesh wrote: Hi, Some real world figures about running duperemove deduplication on BTRFS : I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the BTRFS backups (full rsync) of 5 PCs, using 2 different distros, typically at the same update level, and all of them more of less sharing the entirety or part of the same set of user files. For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots for having complete backups at different points in time. The HD was full to 93% and made a good testbed for deduplicating. So I ran duperemove on this HD, on a machine doing "only this", using a hashfile. The machine being an Intel i5 with 6 GB of RAM. Well, the damn thing has been running for 15 days uninterrupted ! ...Until I [Ctrl]-C it this morning as I had to move with the machine (I wasn't expecting it to last THAT long...). It took about 48 hours just for calculating the files hashes. Then it took another 48 hours just for "loading the hashes of duplicate extents". Then it took 11 days deduplicating until I killed it. At the end, the disk that was 93% full is now 76% full, so I saved 17% of 1 TB (170 GB) by deduplicating for 15 days. Well the thing "works" and my disk isn't full anymore, so that's a very partial success, but still l wonder if the gain is worth the effort... So, some general explanation here: Duperemove hashes data in blocks of (by default) 128kB, which means for ~930GB, you've got about 7618560 blocks to hash, which partly explains why it took so long to hash. Once that's done, it then has to compare hashes for all combinations of those blocks, which totals to 58042456473600 comparisons (hence that taking a long time). The block size thus becomes a trade-off between performance when hashing and actual space savings (smaller block size makes hashing take longer, but gives overall slightly better results for deduplication). As far as the rest, given your hashing performance (which is not particularly good I might add, roughly 5.6MB/s), the amount of time it was taking to do the actual deduplication is reasonable since the deduplication ioctl does a byte-wise comparison of the extents to be deduplicated prior to actually ref-linking them to ensure you don't lose data. Because of this, generic batch deduplication is not all that great on BTRFS. There are cases where it can work, but usually they're pretty specific cases. In most cases though, you're better off doing a custom tool that knows about how your data is laid out and what's likely to be duplicated (I've actually got two tools for this for the two cases where I use deduplication, they use knowledge of the data-set itself to figure out what's duplicated, then just call the ioctl through a wrapper (previously the one included in duperemove, currently xfs_io)). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
duperemove : some real world figures on BTRFS deduplication
Hi, Some real world figures about running duperemove deduplication on BTRFS : I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the BTRFS backups (full rsync) of 5 PCs, using 2 different distros, typically at the same update level, and all of them more of less sharing the entirety or part of the same set of user files. For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots for having complete backups at different points in time. The HD was full to 93% and made a good testbed for deduplicating. So I ran duperemove on this HD, on a machine doing "only this", using a hashfile. The machine being an Intel i5 with 6 GB of RAM. Well, the damn thing has been running for 15 days uninterrupted ! ...Until I [Ctrl]-C it this morning as I had to move with the machine (I wasn't expecting it to last THAT long...). It took about 48 hours just for calculating the files hashes. Then it took another 48 hours just for "loading the hashes of duplicate extents". Then it took 11 days deduplicating until I killed it. At the end, the disk that was 93% full is now 76% full, so I saved 17% of 1 TB (170 GB) by deduplicating for 15 days. Well the thing "works" and my disk isn't full anymore, so that's a very partial success, but still l wonder if the gain is worth the effort... Best regards. ॐ -- Swâmi PetarameshPGP 9076E32E -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
bees v0.1 - Best-Effort Extent-Same, a btrfs deduplication daemon
I made a thing! Bees ("Best-Effort Extent-Same") is a dedup daemon for btrfs. Bees is a block-oriented userspace dedup designed to avoid scalability problems on large filesystems. Bees is designed to degrade gracefully when underprovisioned with RAM. Bees does not use more RAM or storage as filesystem data size increases. The dedup hash table size is fixed at creation time and does not change. The effective dedup block size is dynamic and adjusts automatically to fit the hash table into the configured RAM limit. Hash table overflow is not implemented to eliminate the IO overhead of hash table overflow. Hash table entries are only 16 bytes per dedup block to keep the average dedup block size small. Bees does not require alignment between dedup blocks or extent boundaries (i.e. it can handle any multiple-of-4K offset between dup block pairs). Bees rearranges blocks into shared and unique extents if required to work within current btrfs kernel dedup limitations. Bees can dedup any combination of compressed and uncompressed extents. Bees operates in a single pass which removes duplicate extents immediately during scan. There are no separate scanning and dedup phases. Bees uses only data-safe btrfs kernel operations, so it can dedup live data (e.g. build servers, sqlite databases, VM disk images). It does not modify file attributes or timestamps. Bees does not store any information about filesystem structure, so it is not affected by the number or size of files (except to the extent that these cause performance problems for btrfs in general). It retrieves such information on demand through btrfs SEARCH_V2 and LOGICAL_INO ioctls. This eliminates the storage required to maintain the equivalents of these functions in userspace. It's also why bees has no XFS support. Bees is a daemon designed to run continuously and maintain its state across crahes and reboots. Bees uses checkpoints for persistence to eliminate the IO overhead of a transactional data store. On restart, bees will dedup any data that was added to the filesystem since the last checkpoint. I use bees to dedup filesystems ranging in size from 16GB to 35TB, with hash tables ranging in size from 128MB to 11GB. It's well past time for a v0.1 release, so here it is! Bees is available on Github: https://github.com/Zygo/bees Please enjoy this code. signature.asc Description: Digital signature
Re: [PATCH 8/9] vfs: hoist the btrfs deduplication ioctl to the vfs
Hi Darrick, On 01/12/2016 08:14 PM, Darrick J. Wong wrote: [adding btrfs to the cc since we're talking about a whole new dedupe interface] In the discussion below, many points of possible improvement were notedfor the man page Would you be willing to put together a patch please? Thanks, Michael On Tue, Jan 12, 2016 at 12:07:14AM -0600, Eric Biggers wrote: Some feedback on the VFS portion of the FIDEDUPERANGE ioctl and its man page... (note: I realize the patch is mostly just moving the code that already existed in btrfs, but in the VFS it deserves a more thorough review): Wheee. :) Yes, let's discuss the concerns about the btrfs extent same ioctl. I believe Christoph dislikes about the odd return mechanism (i.e. status and bytes_deduped) and doubts that the vectorization is really necessary. There's not a lot of documentation to go on aside from "Do whatever the BTRFS ioctl does". I suspect that will leave my explanations lackng, since I neither designed the btrfs interface nor know all that much about the decisions made to arrive at what we have now. (I agree with both of hch's complaints.) Really, the best argument for keeping this ioctl is to avoid breaking duperemove. Even then, given that current duperemove checks for btrfs before trying to use BTRFS_IOC_EXTENT_SAME, we could very well design a new dedupe ioctl for the VFS, hook the new dedupers (XFS) into the new VFS ioctl leaving the old btrfs ioctl intact, and train duperemove to try the new ioctl and fall back on the btrfs one if the VFS ioctl isn't supported. Frankly, I also wouldn't mind changing the VFS dedupe ioctl that to something that resembles the clone_range interface: int ioctl(int dest_fd, FIDEDUPERANGE, struct file_dedupe_range * arg); struct file_dedupe_range { __s64 src_fd; __u64 src_offset; __u64 length; __u64 dest_offset; __u64 flags; }; "See if the byte range src_offset:length in src_fd matches all of dest_offset:length in dest_fd; if so, share src_fd's physical storage with dest_fd. Both fds must be files; if they are the same file the ranges cannot overlap; src_fd must be readable; dest_fd must be writable or append-only. Offsets and lengths probably need to be block-aligned, but that is filesystem dependent." The error conditions would be superset of the ones we know about today. I'd return EOVERFLOW or something if length is longer than the FS wants to deal with. Now all the vectorization problems go away, and since it's a new VFS interface we can define everything from the start. Christoph, if this new interface solves your complaints I think I'd like to get started on the code/docs soon. At high level, I am confused about what is meant by the "source" and "destination" files. I understand that with more than two files, you effectively have to choose one file to treat specially and dedupe with all the other files (an NxN comparison isn't realistic). But with just two files, a deduplication operation should be completely symmetric, should it not? The end Not sure what you mean by 'symmetric', but in any case the convention seems to be that src_fd's storage is shared with dest_fd if there's a match. result should be that the data is deduplicated, regardless of the order in which I gave the file descriptors. So why is there some non-symmetric behavior? There are several examples but one is that the VFS is checking !S_ISREG() on the "source" file descriptor but not on the "destination" file descriptor. The dedupe_range function pointer should only be supplied for regular files. Another is that different permissions are required on the source versus on the destination. If there are good reasons for the nonsymmetry then this needs to be clearly explained in the man page; otherwise it may not be clear what to use as the "source" and what to use as the "destination". It seems odd to be adding "copy" as a system call but then have "dedupe" and "clone" as ioctls rather than system calls... it seems that they should all be one or the other (at least, if we put aside the fact that the ioctls already exist in btrfs). We can't put the clone ioctl aside; coreutils has already started using it. I'm not sure if clone_range or extent_same are all that popular, though. AFAIK duperemove is the only program using extent_same, and I don't know of anything using clone_range. (Well, xfs_io does...) The range checking in clone_verify_area() appears incomplete. Someone could provide len=UINT64_MAX and all the checks would still pass even though 'pos+len' would overflow. Yeah... Should the ioctl be interruptible? Right now it always goes through *all* the 'struct file_dedupe_range_info's you passed in --- potentially up to 65535 of them. There probably ought to be explicit signal checks, or we could just get rid of the vectorization entirely. :) Why 'info->bytes_deduped += deduped' rather than 'info->bytes_deduped = deduped'? 'bytes_deduped' is per file descriptor,
Re: btrfs deduplication and linux cache management
On Mon, Nov 03, 2014 at 03:09:11PM +0100, LuVar wrote: Thanks for nice and replicate at home yourself example. On my machine it is behaving precisely like in your: code root@blackdawn:/home/luvar# sync; sysctl vm.drop_caches=1 vm.drop_caches = 1 root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m6.768s user0m0.016s sys 0m0.599s root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m5.259s user0m0.018s sys 0m0.695s root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m0.701s user0m0.014s sys 0m0.288s root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m0.286s user0m0.013s sys 0m0.272s /code If you would mind asking, is there any plan to optimize this behaviour? I know that btrfs is not like ZFS (whole system from blockdevice, through cache, to VFS), so vould be possible to implement such optimization without major patch in linux block cache/VFS cache? I'd like to know this too. I think not any time soon though. AIUI (I'm not really an expert here), the VFS cache is keyed on tuples of (device:inode, offset), so it has no way to cope with aliasing the same physical blocks through distinct inodes. It would have to learn about reference counting (so multiple inodes can refer to shared blocks, one inode can refer to the same blocks twice, etc) and copy-on-write (so we can modify just one share of a shared-extent cache page). For compressed data caching, the filesystem would be volunteering references to blocks that were not asked for (e.g. unread portions of compressed extents). It's not impossible to make those changes to the VFS cache, but the only filesystem on mainline Linux that would benefit is btrfs (ZFS is not on mainline Linux, the ZFS maintainers probably prefer to use their own cache layer anyway, and nobody else shares extents between files). For filesystems that don't share extents, adding the necessary stuff to VFS is a lot of extra overhead they will never use. Back in the day, the Linux cache used to use tuples of (device, block_number), but this approach doesn't work on non-block filesystems like NFS, so it was dropped in favor of the inode+offset caching. A block-based scheme would handle shared extents but not compressed ones (e.g. you've got a 4K cacheable page that was compressed to 312 bytes somewhere in the middle of a 57K compressed data extent...what's that page's block number, again?). Thanks, have a nice day, -- LuVar - Zygo Blaxell zblax...@furryterror.org wrote: On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu...@plaintext.sk wrote: Hi, I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files. To explain in deep: - I use btrfs for whole system with few subvolumes with some compression on some subvolumes. - I have two directories with eclipse SDK with slightly differences (same version, different config) - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation) - I will start one of given eclipse - linux kernel will cache all opened files during start of eclipse (I have enough free ram) - I am just happy stupid linux user: 1. will kernel cache file content after decompression? (I think yes) 2. cached data will be in VFS layer or in block device layer? My guess based on behavior is the VFS layer. See below. - When I will lunch second eclipse (different from first, but deduplicated from first) after first one: 1. will second start require less data to be read from HDD? No. 2. will be metadata for second instance read from hdd? (I asume yes) Yes (how could it not?). 3. will be actual data read second time? (I hope not) Unfortunately, yes. This is my test: 1. Create a file full of compressible data that is big enough to take a few seconds to read from disk, but not too big to fit in RAM: yes $(date) | head -c 500m a 2. Create a deduplicated (shared extent) copy of same: cp --reflink=always a b (use filefrag -v to verify both files have same physical extents) 3. Drop caches sync; sysctl vm.drop_caches=1 4. Time reading both files with cold and hot cache: time cat a /dev/null time cat b /dev/null time cat a /dev/null time cat b /dev/null Ideally, the first 'cat a' would load the file back from disk, so it will take a long
Re: btrfs deduplication and linux cache management
Thanks for nice and replicate at home yourself example. On my machine it is behaving precisely like in your: code root@blackdawn:/home/luvar# sync; sysctl vm.drop_caches=1 vm.drop_caches = 1 root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m6.768s user0m0.016s sys 0m0.599s root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m5.259s user0m0.018s sys 0m0.695s root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m0.701s user0m0.014s sys 0m0.288s root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img /dev/null real0m0.286s user0m0.013s sys 0m0.272s /code If you would mind asking, is there any plan to optimize this behaviour? I know that btrfs is not like ZFS (whole system from blockdevice, through cache, to VFS), so vould be possible to implement such optimization without major patch in linux block cache/VFS cache? Thanks, have a nice day, -- LuVar - Zygo Blaxell zblax...@furryterror.org wrote: On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu...@plaintext.sk wrote: Hi, I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files. To explain in deep: - I use btrfs for whole system with few subvolumes with some compression on some subvolumes. - I have two directories with eclipse SDK with slightly differences (same version, different config) - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation) - I will start one of given eclipse - linux kernel will cache all opened files during start of eclipse (I have enough free ram) - I am just happy stupid linux user: 1. will kernel cache file content after decompression? (I think yes) 2. cached data will be in VFS layer or in block device layer? My guess based on behavior is the VFS layer. See below. - When I will lunch second eclipse (different from first, but deduplicated from first) after first one: 1. will second start require less data to be read from HDD? No. 2. will be metadata for second instance read from hdd? (I asume yes) Yes (how could it not?). 3. will be actual data read second time? (I hope not) Unfortunately, yes. This is my test: 1. Create a file full of compressible data that is big enough to take a few seconds to read from disk, but not too big to fit in RAM: yes $(date) | head -c 500m a 2. Create a deduplicated (shared extent) copy of same: cp --reflink=always a b (use filefrag -v to verify both files have same physical extents) 3. Drop caches sync; sysctl vm.drop_caches=1 4. Time reading both files with cold and hot cache: time cat a /dev/null time cat b /dev/null time cat a /dev/null time cat b /dev/null Ideally, the first 'cat a' would load the file back from disk, so it will take a long time, and the other three would be very fast as the shared extent data would already be in RAM. That is what happens on 3.17.1: time cat a /dev/null real0m18.870s user0m0.017s sys 0m3.432s time cat b /dev/null real0m16.931s user0m0.007s sys 0m3.357s time cat a /dev/null real0m0.141s user0m0.001s sys 0m0.136s time cat b /dev/null real0m0.121s user0m0.002s sys 0m0.116s Above we see that reading 'b' the first time takes almost as long as 'a'. The second reads are cached, so they finish two orders of magnitude faster. That suggests that deduplicated extents are read and cached as entirely separate copies of the data. The sys time for the first read of 'b' would imply separate decompression as well. Compare the above result with a hardlink, which might behave more like what we expect: rm -f b ln a b sync; sysctl vm.drop_caches=1 time cat a /dev/null real0m20.262s user0m0.010s sys 0m3.376s time cat b /dev/null real0m0.125s user0m0.003s sys 0m0.120s time cat a /dev/null real0m0.103s user0m0.004s sys 0m0.097s time cat b /dev/null real0m0.098s user0m0.002s sys 0m0.091s Above we clearly see that we read 'a' from disk only once, and use the cache three times. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in
btrfs deduplication and linux cache management
Hi, I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files. To explain in deep: - I use btrfs for whole system with few subvolumes with some compression on some subvolumes. - I have two directories with eclipse SDK with slightly differences (same version, different config) - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation) - I will start one of given eclipse - linux kernel will cache all opened files during start of eclipse (I have enough free ram) - I am just happy stupid linux user: 1. will kernel cache file content after decompression? (I think yes) 2. cached data will be in VFS layer or in block device layer? - When I will lunch second eclipse (different from first, but deduplicated from first) after first one: 1. will second start require less data to be read from HDD? 2. will be metadata for second instance read from hdd? (I asume yes) 3. will be actual data read second time? (I hope not) Thanks for answers, have a nice day, -- LuVar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs deduplication and linux cache management
On 2014-10-30 05:26, lu...@plaintext.sk wrote: Hi, I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files. To explain in deep: - I use btrfs for whole system with few subvolumes with some compression on some subvolumes. - I have two directories with eclipse SDK with slightly differences (same version, different config) - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation) - I will start one of given eclipse - linux kernel will cache all opened files during start of eclipse (I have enough free ram) - I am just happy stupid linux user: 1. will kernel cache file content after decompression? (I think yes) 2. cached data will be in VFS layer or in block device layer? - When I will lunch second eclipse (different from first, but deduplicated from first) after first one: 1. will second start require less data to be read from HDD? 2. will be metadata for second instance read from hdd? (I asume yes) 3. will be actual data read second time? (I hope not) Thanks for answers, have a nice day, I don't know for certain, but here is how I understand things work in this case: 1. Individual blocks are cached in the block device layer, which means that the de-duplicated data would only be cached at most as many times as there are disks it is on (ie at most 1 time for a single device filesystem, up to twice for a multi-device btrfs raid1 setup). 2. In the vfs layer, the cache handles decoded inodes (the actual file metadata), dentries (the file's entry in the parent directory), and individual pages of file content (after decompression). AFAIK, the vfs layer's cache is pathname based, so that would probably cache two copies of the data, but after the metadata look-up, wouldn't need to read from the disk cause of the block layer cache. Overall, this means that while de-duplicated data may be cached more than once, it shouldn't need to be reread from disk if there is still a copy in cache. Metadata may or may not need to be read from the disk, depending on what is in the VFS cache. smime.p7s Description: S/MIME Cryptographic Signature
Re: btrfs deduplication and linux cache management
On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu...@plaintext.sk wrote: Hi, I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files. To explain in deep: - I use btrfs for whole system with few subvolumes with some compression on some subvolumes. - I have two directories with eclipse SDK with slightly differences (same version, different config) - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation) - I will start one of given eclipse - linux kernel will cache all opened files during start of eclipse (I have enough free ram) - I am just happy stupid linux user: 1. will kernel cache file content after decompression? (I think yes) 2. cached data will be in VFS layer or in block device layer? My guess based on behavior is the VFS layer. See below. - When I will lunch second eclipse (different from first, but deduplicated from first) after first one: 1. will second start require less data to be read from HDD? No. 2. will be metadata for second instance read from hdd? (I asume yes) Yes (how could it not?). 3. will be actual data read second time? (I hope not) Unfortunately, yes. This is my test: 1. Create a file full of compressible data that is big enough to take a few seconds to read from disk, but not too big to fit in RAM: yes $(date) | head -c 500m a 2. Create a deduplicated (shared extent) copy of same: cp --reflink=always a b (use filefrag -v to verify both files have same physical extents) 3. Drop caches sync; sysctl vm.drop_caches=1 4. Time reading both files with cold and hot cache: time cat a /dev/null time cat b /dev/null time cat a /dev/null time cat b /dev/null Ideally, the first 'cat a' would load the file back from disk, so it will take a long time, and the other three would be very fast as the shared extent data would already be in RAM. That is what happens on 3.17.1: time cat a /dev/null real0m18.870s user0m0.017s sys 0m3.432s time cat b /dev/null real0m16.931s user0m0.007s sys 0m3.357s time cat a /dev/null real0m0.141s user0m0.001s sys 0m0.136s time cat b /dev/null real0m0.121s user0m0.002s sys 0m0.116s Above we see that reading 'b' the first time takes almost as long as 'a'. The second reads are cached, so they finish two orders of magnitude faster. That suggests that deduplicated extents are read and cached as entirely separate copies of the data. The sys time for the first read of 'b' would imply separate decompression as well. Compare the above result with a hardlink, which might behave more like what we expect: rm -f b ln a b sync; sysctl vm.drop_caches=1 time cat a /dev/null real0m20.262s user0m0.010s sys 0m3.376s time cat b /dev/null real0m0.125s user0m0.003s sys 0m0.120s time cat a /dev/null real0m0.103s user0m0.004s sys 0m0.097s time cat b /dev/null real0m0.098s user0m0.002s sys 0m0.091s Above we clearly see that we read 'a' from disk only once, and use the cache three times. signature.asc Description: Digital signature
Re: BTRFS deduplication
On Thu, May 12, 2011 at 07:52:20AM +0200, Swâmi Petaramesh wrote: Hi again list, I've seen in a message dating back to january that offline deduplication has been implemented in BTRFS, but I can't find it in my btrfs-tools 0.19+20100601-3ubuntu2 Has it reached release, or not yet ? How could I give it a try ? I've seen a discussion about whether deduplication should be made offline or online ; my usage case it to backup a number of laptops having all about the same software and many files in common, to a single backup server using rsync, I would be very much interested in online deduplication - because I don't have n times the storage space, that offline dedup might temporarily need, and because performance isn't crucial for this application, as backups can be done overnight... Thanks in advance :-) So the btrfs-progs patch only exists on the mailing list and the kernel patch is sitting in my git tree. This was more of a weekend project and less of a serious attempt at an actual solution. It could be cleaned up and actually used, but I'm not at all interested in doing that :). Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS deduplication
Hi again list, I've seen in a message dating back to january that offline deduplication has been implemented in BTRFS, but I can't find it in my btrfs-tools 0.19+20100601-3ubuntu2 Has it reached release, or not yet ? How could I give it a try ? I've seen a discussion about whether deduplication should be made offline or online ; my usage case it to backup a number of laptops having all about the same software and many files in common, to a single backup server using rsync, I would be very much interested in online deduplication - because I don't have n times the storage space, that offline dedup might temporarily need, and because performance isn't crucial for this application, as backups can be done overnight... Thanks in advance :-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html