Re: dduper - Offline btrfs deduplication tool

2018-09-07 Thread Zygo Blaxell
On Fri, Sep 07, 2018 at 09:27:28AM +0530, Lakshmipathi.G wrote:
> > 
> > One question:
> > Why not ioctl_fideduperange?
> > i.e. you kill most of benefits from that ioctl - atomicity.
> > 
> I plan to add fideduperange as an option too. User can
> choose between fideduperange and ficlonerange call.
> 
> If I'm not wrong, with fideduperange, kernel performs
> comparsion check before dedupe. And it will increase
> time to dedupe files.

Creating the backup reflink file takes far more time than you will ever
save from fideduperange.

You don't need the md5sum either, unless you have a data set that is
full of crc32 collisions (e.g. a file format that puts a CRC32 at the
end of each 4K block).  The few people who have such a data set can
enable md5sums, everyone else can have md5sums disabled by default.

> I believe the risk involved with ficlonerange is  minimized 
> by having a backup file(reflinked). We can revert to older 
> original file, if we encounter some problems.

With fideduperange the risk is more than minimized--it's completely
eliminated.

If you don't use fideduperange you can't use the tool on a live data
set at all.

> > 
> > -- 
> > Have a nice day,
> > Timofey.
> 
> Cheers.
> Lakshmipathi.G


signature.asc
Description: PGP signature


Re: dduper - Offline btrfs deduplication tool

2018-09-07 Thread Adam Borowski
On Fri, Sep 07, 2018 at 09:27:28AM +0530, Lakshmipathi.G wrote:
> > One question:
> > Why not ioctl_fideduperange?
> > i.e. you kill most of benefits from that ioctl - atomicity.
> > 
> I plan to add fideduperange as an option too. User can
> choose between fideduperange and ficlonerange call.
> 
> If I'm not wrong, with fideduperange, kernel performs
> comparsion check before dedupe. And it will increase
> time to dedupe files.

You already read the files to md5sum them, so you have no speed gain.
You get nasty data-losing races, and risk collisions as well.  md5sum is
safe against random occurences (compared eg. to the chance of lightning
hitting you today), but is exploitable by a hostile user.  On the other
hand, full bit-to-bit comparison is faster and 100% safe.

You can't skip verification -- the checksums are only 32-bit.  They have a
1:4G chance to mismatch, which means you can expect one false positive with
64K extents, rising quadratically as the number of files grows.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄ preimage for double rot13!


Re: dduper - Offline btrfs deduplication tool

2018-09-06 Thread Lakshmipathi.G
> 
> One question:
> Why not ioctl_fideduperange?
> i.e. you kill most of benefits from that ioctl - atomicity.
> 
I plan to add fideduperange as an option too. User can
choose between fideduperange and ficlonerange call.

If I'm not wrong, with fideduperange, kernel performs
comparsion check before dedupe. And it will increase
time to dedupe files.

I believe the risk involved with ficlonerange is  minimized 
by having a backup file(reflinked). We can revert to older 
original file, if we encounter some problems.

> 
> -- 
> Have a nice day,
> Timofey.

Cheers.
Lakshmipathi.G


Re: dduper - Offline btrfs deduplication tool

2018-09-05 Thread Timofey Titovets
пт, 24 авг. 2018 г. в 7:41, Lakshmipathi.G :
>
> Hi -
>
> dduper is an offline dedupe tool. Instead of reading whole file blocks and
> computing checksum, It works by fetching checksum from BTRFS csum tree. This
> hugely improves the performance.
>
> dduper works like:
> - Read csum for given two files.
> - Find matching location.
> - Pass the location to ioctl_ficlonerange directly
>   instead of ioctl_fideduperange
>
> By default, dduper adds safty check to above steps by creating a
> backup reflink file and compares the md5sum after dedupe.
> If the backup file matches new deduped file, then backup file is
> removed. You can skip this check by passing --skip option. Here is
> sample cli usage [1] and quick demo [2]
>
> Some performance numbers: (with -skip option)
>
> Dedupe two 1GB files with same  content - 1.2 seconds
> Dedupe two 5GB files with same  content - 8.2 seconds
> Dedupe two 10GB files with same  content - 13.8 seconds
>
> dduper requires `btrfs inspect-internal dump-csum` command, you can use
> this branch [3] or apply patch by yourself [4]
>
> [1] 
> https://gitlab.collabora.com/laks/btrfs-progs/blob/dump_csum/Documentation/dduper_usage.md
> [2] http://giis.co.in/btrfs_dedupe.gif
> [3] git clone https://gitlab.collabora.com/laks/btrfs-progs.git -b  dump_csum
> [4] https://patchwork.kernel.org/patch/10540229/
>
> Please remember its version-0.1, so test it out, if you plan to use dduper 
> real data.
> Let me know, if you have suggestions or feedback or bugs :)
>
> Cheers.
> Lakshmipathi.G
>

One question:
Why not ioctl_fideduperange?
i.e. you kill most of benefits from that ioctl - atomicity.


-- 
Have a nice day,
Timofey.


dduper - Offline btrfs deduplication tool

2018-08-23 Thread Lakshmipathi.G
Hi -

dduper is an offline dedupe tool. Instead of reading whole file blocks and
computing checksum, It works by fetching checksum from BTRFS csum tree. This 
hugely improves the performance. 

dduper works like:
- Read csum for given two files.
- Find matching location.
- Pass the location to ioctl_ficlonerange directly
  instead of ioctl_fideduperange

By default, dduper adds safty check to above steps by creating a 
backup reflink file and compares the md5sum after dedupe. 
If the backup file matches new deduped file, then backup file is 
removed. You can skip this check by passing --skip option. Here is 
sample cli usage [1] and quick demo [2]  

Some performance numbers: (with -skip option)

Dedupe two 1GB files with same  content - 1.2 seconds
Dedupe two 5GB files with same  content - 8.2 seconds
Dedupe two 10GB files with same  content - 13.8 seconds

dduper requires `btrfs inspect-internal dump-csum` command, you can use 
this branch [3] or apply patch by yourself [4] 

[1] 
https://gitlab.collabora.com/laks/btrfs-progs/blob/dump_csum/Documentation/dduper_usage.md
[2] http://giis.co.in/btrfs_dedupe.gif
[3] git clone https://gitlab.collabora.com/laks/btrfs-progs.git -b  dump_csum
[4] https://patchwork.kernel.org/patch/10540229/ 

Please remember its version-0.1, so test it out, if you plan to use dduper real 
data.
Let me know, if you have suggestions or feedback or bugs :)

Cheers.
Lakshmipathi.G



Re: BTRFS Deduplication

2017-09-11 Thread shally verma
On Mon, Sep 11, 2017 at 2:55 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
> On 2017年09月11日 17:14, Qu Wenruo wrote:
>>
>>
>>
>> On 2017年09月11日 16:57, shally verma wrote:
>>>
>>> On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 2017年09月11日 15:54, shally verma wrote:
>>>>>
>>>>>
>>>>> On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2017年09月11日 14:05, shally verma wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I was going through  BTRFS Deduplication page
>>>>>>> (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read
>>>>>>>
>>>>>>> "As such, xfs_io, is able to perform deduplication on a BTRFS file
>>>>>>> system," ..
>>>>>>>
>>>>>>> following this, I followed on to xfs_io link
>>>>>>> https://linux.die.net/man/8/xfs_io
>>>>>>>
>>>>>>> As I understand, these are set of commands allow us to do different
>>>>>>> operations on "xfs" filesystem.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Nope, it's just a tool triggering different read/write or ioctls.
>>>>>> In fact most of its command is fs independent.
>>>>>> Only a limited number of operations are only supported by XFS.
>>>>>>
>>>>>> It's just due to historical reasons it's still named as xfs_io.
>>>>>>
>>>>>> I won't be surprised if one day it's split as an independent tool.
>>>>>>
>>>>>>> and command set mentioned here, couldn't see which is command to
>>>>>>> invoke dedupe task.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> "dedupe" and "reflink" command.
>>>>>
>>>>>
>>>>> Oh. That means page link referred on BTRFS Wiki page is not updated
>>>>> with this. I googled another page that has reference of these two
>>>>> command in xfs_io here
>>>>> https://www.systutorials.com/docs/linux/man/8-xfs_io/
>>>>> May be Wiki need an update here.
>>>>
>>>>
>>>>
>>>> If XFS has a regularly updated online man page, we can just use that.
>>>> (But unfortunately, not every fs user tools use asciidoc like btrfs,
>>>> which
>>>> can generate both man page and html).
>>>>
>>>>>
>>>>>>
>>>>>>> and how this works with BTRFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use
>>>>>> it
>>>>>> to
>>>>>> determine if two ranges are containing identical data.
>>>>>>
>>>>>> And if they are identical, we use FICLONERANGE or
>>>>>> BTRFS_IOC_CLONE_RANGE
>>>>>> ioctl to reflink one to another, freeing one of them.
>>>>>>
>>>>>> BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
>>>>>> file_operations structure now includes both clone_file_range() and
>>>>>> dedupe_file_range() callbacks now.
>>>>>
>>>>>
>>>>> Yea. Understand that part. So going by description of "dedupe" and
>>>>> "reflink", seems through these commands, one can do deduplication part
>>>>> and NOT duplicate find part.
>>>>
>>>>
>>>>
>>>> Yes, one don't need to call "dedupe" ioctl if they already knows some
>>>> data
>>>> is identical and can go reflink straightforward.
>>>>
>>>>> That's still out of xfs_io command scope.
>>>>
>>>>
>>>>
>>>> Not sure what the scope here you mean, sorry for that.
>>>>
>>> By "scope", I meant duplicate find part but that contradicts statement
>>

Re: BTRFS Deduplication

2017-09-11 Thread Qu Wenruo



On 2017年09月11日 17:14, Qu Wenruo wrote:



On 2017年09月11日 16:57, shally verma wrote:
On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com> 
wrote:



On 2017年09月11日 15:54, shally verma wrote:


On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com>
wrote:




On 2017年09月11日 14:05, shally verma wrote:



I was going through  BTRFS Deduplication page
(https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read

"As such, xfs_io, is able to perform deduplication on a BTRFS file
system," ..

following this, I followed on to xfs_io link
https://linux.die.net/man/8/xfs_io

As I understand, these are set of commands allow us to do different
operations on "xfs" filesystem.




Nope, it's just a tool triggering different read/write or ioctls.
In fact most of its command is fs independent.
Only a limited number of operations are only supported by XFS.

It's just due to historical reasons it's still named as xfs_io.

I won't be surprised if one day it's split as an independent tool.


and command set mentioned here, couldn't see which is command to
invoke dedupe task.




"dedupe" and "reflink" command.


Oh. That means page link referred on BTRFS Wiki page is not updated
with this. I googled another page that has reference of these two
command in xfs_io here
https://www.systutorials.com/docs/linux/man/8-xfs_io/
May be Wiki need an update here.



If XFS has a regularly updated online man page, we can just use that.
(But unfortunately, not every fs user tools use asciidoc like btrfs, 
which

can generate both man page and html).






and how this works with BTRFS.




Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can 
use it

to
determine if two ranges are containing identical data.

And if they are identical, we use FICLONERANGE or 
BTRFS_IOC_CLONE_RANGE

ioctl to reflink one to another, freeing one of them.

BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
file_operations structure now includes both clone_file_range() and
dedupe_file_range() callbacks now.


Yea. Understand that part. So going by description of "dedupe" and
"reflink", seems through these commands, one can do deduplication part
and NOT duplicate find part.



Yes, one don't need to call "dedupe" ioctl if they already knows some 
data

is identical and can go reflink straightforward.


That's still out of xfs_io command scope.



Not sure what the scope here you mean, sorry for that.


By "scope", I meant duplicate find part but that contradicts statement
you just written below:

Since xfs_io can be used to find duplication,


Since "dedupe" command input only a "source file" and src and
dst_offset within that, so it can deduplicate the content within a
file where actual FS dedupe IOCTL can first ensure if two extents are
identical and if yes, then deduplicate them.


By "deduplicate", if you mean "removing duplication" then xfs_io 
"dedupe" command itself doesn't do that.


The old btrfs ioctl describe this better, FILE_EXTENT_SAME.
"dedupe" command itself is only verifying if they have the same content.

So to make it clear, "dedupe" command and ioctl only do the 
*verification* work.


Sorry, I just checked the code and tried the ioctl.

If they are the same, "dedupe" will do "reflink" part also.

Code also shows that:
---
/* pass original length for comparison so we stay within i_size */
ret = btrfs_cmp_data(olen, );
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
---

So "dedupe" ioctl itself can do de-duplication.
And my previous answer is just totally wrong.

Sorry for that,
Qu



"Reflink" will really remove the duplication (or even non-duplicated 
data if you really want).



But please be careful, "reflink" is much like copy, so it can be 
executed on file ranges with different contents.
In that case, reflink can free some space, but it also modifies the 
content.


So for full de-duplication, one must go through the full *verify* then 
*reflink* circle.
Although "dedupe"(FILE_EXTENT_SAME) ioctl provides one verification 
method, it's not the only solution.


But anyway, "dedupe" and "reflink" command provided by xfs_io does 
provide every pieces to do de-duplication, so the wiki is still correct 
IMHO.


Thanks,
Qu



Is that correct?

Thanks
Shally

  and can remove duplication, I

don't find anything strange in that wiki page.
(Especially considering how popular the tool is, you can't find any more
handy tool than xfs_io)

Thanks,
Qu



Is that understanding correct?
Thanks
Shally



Thanks,
Qu




So, can anyone help here and point me what am I missing here.

Thanks
Shally
--
To unsubscribe from this list: send the line "unsubscribe 
linux-btrfs"

in
the 

Re: BTRFS Deduplication

2017-09-11 Thread Qu Wenruo



On 2017年09月11日 16:57, shally verma wrote:

On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:



On 2017年09月11日 15:54, shally verma wrote:


On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com>
wrote:




On 2017年09月11日 14:05, shally verma wrote:



I was going through  BTRFS Deduplication page
(https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read

"As such, xfs_io, is able to perform deduplication on a BTRFS file
system," ..

following this, I followed on to xfs_io link
https://linux.die.net/man/8/xfs_io

As I understand, these are set of commands allow us to do different
operations on "xfs" filesystem.




Nope, it's just a tool triggering different read/write or ioctls.
In fact most of its command is fs independent.
Only a limited number of operations are only supported by XFS.

It's just due to historical reasons it's still named as xfs_io.

I won't be surprised if one day it's split as an independent tool.


and command set mentioned here, couldn't see which is command to
invoke dedupe task.




"dedupe" and "reflink" command.


Oh. That means page link referred on BTRFS Wiki page is not updated
with this. I googled another page that has reference of these two
command in xfs_io here
https://www.systutorials.com/docs/linux/man/8-xfs_io/
May be Wiki need an update here.



If XFS has a regularly updated online man page, we can just use that.
(But unfortunately, not every fs user tools use asciidoc like btrfs, which
can generate both man page and html).






and how this works with BTRFS.




Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it
to
determine if two ranges are containing identical data.

And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE
ioctl to reflink one to another, freeing one of them.

BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
file_operations structure now includes both clone_file_range() and
dedupe_file_range() callbacks now.


Yea. Understand that part. So going by description of "dedupe" and
"reflink", seems through these commands, one can do deduplication part
and NOT duplicate find part.



Yes, one don't need to call "dedupe" ioctl if they already knows some data
is identical and can go reflink straightforward.


That's still out of xfs_io command scope.



Not sure what the scope here you mean, sorry for that.


By "scope", I meant duplicate find part but that contradicts statement
you just written below:

Since xfs_io can be used to find duplication,


Since "dedupe" command input only a "source file" and src and
dst_offset within that, so it can deduplicate the content within a
file where actual FS dedupe IOCTL can first ensure if two extents are
identical and if yes, then deduplicate them.


By "deduplicate", if you mean "removing duplication" then xfs_io 
"dedupe" command itself doesn't do that.


The old btrfs ioctl describe this better, FILE_EXTENT_SAME.
"dedupe" command itself is only verifying if they have the same content.

So to make it clear, "dedupe" command and ioctl only do the 
*verification* work.


"Reflink" will really remove the duplication (or even non-duplicated 
data if you really want).



But please be careful, "reflink" is much like copy, so it can be 
executed on file ranges with different contents.

In that case, reflink can free some space, but it also modifies the content.

So for full de-duplication, one must go through the full *verify* then 
*reflink* circle.
Although "dedupe"(FILE_EXTENT_SAME) ioctl provides one verification 
method, it's not the only solution.


But anyway, "dedupe" and "reflink" command provided by xfs_io does 
provide every pieces to do de-duplication, so the wiki is still correct 
IMHO.


Thanks,
Qu



Is that correct?

Thanks
Shally

  and can remove duplication, I

don't find anything strange in that wiki page.
(Especially considering how popular the tool is, you can't find any more
handy tool than xfs_io)

Thanks,
Qu



Is that understanding correct?
Thanks
Shally



Thanks,
Qu




So, can anyone help here and point me what am I missing here.

Thanks
Shally
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Deduplication

2017-09-11 Thread shally verma
On Mon, Sep 11, 2017 at 1:42 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
> On 2017年09月11日 15:54, shally verma wrote:
>>
>> On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com>
>> wrote:
>>>
>>>
>>>
>>> On 2017年09月11日 14:05, shally verma wrote:
>>>>
>>>>
>>>> I was going through  BTRFS Deduplication page
>>>> (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read
>>>>
>>>> "As such, xfs_io, is able to perform deduplication on a BTRFS file
>>>> system," ..
>>>>
>>>> following this, I followed on to xfs_io link
>>>> https://linux.die.net/man/8/xfs_io
>>>>
>>>> As I understand, these are set of commands allow us to do different
>>>> operations on "xfs" filesystem.
>>>
>>>
>>>
>>> Nope, it's just a tool triggering different read/write or ioctls.
>>> In fact most of its command is fs independent.
>>> Only a limited number of operations are only supported by XFS.
>>>
>>> It's just due to historical reasons it's still named as xfs_io.
>>>
>>> I won't be surprised if one day it's split as an independent tool.
>>>
>>>> and command set mentioned here, couldn't see which is command to
>>>> invoke dedupe task.
>>>
>>>
>>>
>>> "dedupe" and "reflink" command.
>>
>> Oh. That means page link referred on BTRFS Wiki page is not updated
>> with this. I googled another page that has reference of these two
>> command in xfs_io here
>> https://www.systutorials.com/docs/linux/man/8-xfs_io/
>> May be Wiki need an update here.
>
>
> If XFS has a regularly updated online man page, we can just use that.
> (But unfortunately, not every fs user tools use asciidoc like btrfs, which
> can generate both man page and html).
>
>>
>>>
>>>> and how this works with BTRFS.
>>>
>>>
>>>
>>> Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it
>>> to
>>> determine if two ranges are containing identical data.
>>>
>>> And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE
>>> ioctl to reflink one to another, freeing one of them.
>>>
>>> BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
>>> file_operations structure now includes both clone_file_range() and
>>> dedupe_file_range() callbacks now.
>>
>> Yea. Understand that part. So going by description of "dedupe" and
>> "reflink", seems through these commands, one can do deduplication part
>> and NOT duplicate find part.
>
>
> Yes, one don't need to call "dedupe" ioctl if they already knows some data
> is identical and can go reflink straightforward.
>
>> That's still out of xfs_io command scope.
>
>
> Not sure what the scope here you mean, sorry for that.
>
By "scope", I meant duplicate find part but that contradicts statement
you just written below:
> Since xfs_io can be used to find duplication,

Since "dedupe" command input only a "source file" and src and
dst_offset within that, so it can deduplicate the content within a
file where actual FS dedupe IOCTL can first ensure if two extents are
identical and if yes, then deduplicate them.

Is that correct?

Thanks
Shally

 and can remove duplication, I
> don't find anything strange in that wiki page.
> (Especially considering how popular the tool is, you can't find any more
> handy tool than xfs_io)
>
> Thanks,
> Qu
>
>
>> Is that understanding correct?
>> Thanks
>> Shally
>>>
>>>
>>> Thanks,
>>> Qu
>>>>
>>>>
>>>>
>>>> So, can anyone help here and point me what am I missing here.
>>>>
>>>> Thanks
>>>> Shally
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>> in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Deduplication

2017-09-11 Thread Qu Wenruo



On 2017年09月11日 15:54, shally verma wrote:

On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:



On 2017年09月11日 14:05, shally verma wrote:


I was going through  BTRFS Deduplication page
(https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read

"As such, xfs_io, is able to perform deduplication on a BTRFS file
system," ..

following this, I followed on to xfs_io link
https://linux.die.net/man/8/xfs_io

As I understand, these are set of commands allow us to do different
operations on "xfs" filesystem.



Nope, it's just a tool triggering different read/write or ioctls.
In fact most of its command is fs independent.
Only a limited number of operations are only supported by XFS.

It's just due to historical reasons it's still named as xfs_io.

I won't be surprised if one day it's split as an independent tool.


and command set mentioned here, couldn't see which is command to
invoke dedupe task.



"dedupe" and "reflink" command.

Oh. That means page link referred on BTRFS Wiki page is not updated
with this. I googled another page that has reference of these two
command in xfs_io here
https://www.systutorials.com/docs/linux/man/8-xfs_io/
May be Wiki need an update here.


If XFS has a regularly updated online man page, we can just use that.
(But unfortunately, not every fs user tools use asciidoc like btrfs, 
which can generate both man page and html).







and how this works with BTRFS.



Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it to
determine if two ranges are containing identical data.

And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE
ioctl to reflink one to another, freeing one of them.

BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
file_operations structure now includes both clone_file_range() and
dedupe_file_range() callbacks now.

Yea. Understand that part. So going by description of "dedupe" and
"reflink", seems through these commands, one can do deduplication part
and NOT duplicate find part.


Yes, one don't need to call "dedupe" ioctl if they already knows some 
data is identical and can go reflink straightforward.



That's still out of xfs_io command scope.


Not sure what the scope here you mean, sorry for that.

Since xfs_io can be used to find duplication, and can remove 
duplication, I don't find anything strange in that wiki page.
(Especially considering how popular the tool is, you can't find any more 
handy tool than xfs_io)


Thanks,
Qu


Is that understanding correct?
Thanks
Shally


Thanks,
Qu



So, can anyone help here and point me what am I missing here.

Thanks
Shally
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Deduplication

2017-09-11 Thread shally verma
On Mon, Sep 11, 2017 at 12:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
> On 2017年09月11日 14:05, shally verma wrote:
>>
>> I was going through  BTRFS Deduplication page
>> (https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read
>>
>> "As such, xfs_io, is able to perform deduplication on a BTRFS file
>> system," ..
>>
>> following this, I followed on to xfs_io link
>> https://linux.die.net/man/8/xfs_io
>>
>> As I understand, these are set of commands allow us to do different
>> operations on "xfs" filesystem.
>
>
> Nope, it's just a tool triggering different read/write or ioctls.
> In fact most of its command is fs independent.
> Only a limited number of operations are only supported by XFS.
>
> It's just due to historical reasons it's still named as xfs_io.
>
> I won't be surprised if one day it's split as an independent tool.
>
>> and command set mentioned here, couldn't see which is command to
>> invoke dedupe task.
>
>
> "dedupe" and "reflink" command.
Oh. That means page link referred on BTRFS Wiki page is not updated
with this. I googled another page that has reference of these two
command in xfs_io here
https://www.systutorials.com/docs/linux/man/8-xfs_io/
May be Wiki need an update here.

>
>> and how this works with BTRFS.
>
>
> Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it to
> determine if two ranges are containing identical data.
>
> And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE
> ioctl to reflink one to another, freeing one of them.
>
> BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
> file_operations structure now includes both clone_file_range() and
> dedupe_file_range() callbacks now.
Yea. Understand that part. So going by description of "dedupe" and
"reflink", seems through these commands, one can do deduplication part
and NOT duplicate find part. That's still out of xfs_io command scope.
Is that understanding correct?
Thanks
Shally
>
> Thanks,
> Qu
>>
>>
>> So, can anyone help here and point me what am I missing here.
>>
>> Thanks
>> Shally
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Deduplication

2017-09-11 Thread Qu Wenruo



On 2017年09月11日 14:05, shally verma wrote:

I was going through  BTRFS Deduplication page
(https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read

"As such, xfs_io, is able to perform deduplication on a BTRFS file system," ..

following this, I followed on to xfs_io link https://linux.die.net/man/8/xfs_io

As I understand, these are set of commands allow us to do different
operations on "xfs" filesystem.


Nope, it's just a tool triggering different read/write or ioctls.
In fact most of its command is fs independent.
Only a limited number of operations are only supported by XFS.

It's just due to historical reasons it's still named as xfs_io.

I won't be surprised if one day it's split as an independent tool.


and command set mentioned here, couldn't see which is command to
invoke dedupe task.


"dedupe" and "reflink" command.


and how this works with BTRFS.


Fs support FIDEDUPERANGE or BTRFS_IOC_FILE_EXTENT_SAME ioctl can use it 
to determine if two ranges are containing identical data.


And if they are identical, we use FICLONERANGE or BTRFS_IOC_CLONE_RANGE 
ioctl to reflink one to another, freeing one of them.


BTW nowadays, such dedupe and reflink ioctl is genericized in VFS.
file_operations structure now includes both clone_file_range() and 
dedupe_file_range() callbacks now.


Thanks,
Qu


So, can anyone help here and point me what am I missing here.

Thanks
Shally
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS Deduplication

2017-09-11 Thread shally verma
I was going through  BTRFS Deduplication page
(https://btrfs.wiki.kernel.org/index.php/Deduplication) and I read

"As such, xfs_io, is able to perform deduplication on a BTRFS file system," ..

following this, I followed on to xfs_io link https://linux.die.net/man/8/xfs_io

As I understand, these are set of commands allow us to do different
operations on "xfs" filesystem.
and command set mentioned here, couldn't see which is command to
invoke dedupe task.
and how this works with BTRFS.

So, can anyone help here and point me what am I missing here.

Thanks
Shally
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-09 Thread Holger Hoffstätte
On 12/09/16 16:43, Chris Murphy wrote:
>> If compression has nothing to do with this, then this is heavy
>> fragmentation.
> 
> It's probably not that fragmented. Due to compression, metadata
> describes 128KiB extents even though the data is actually contiguous.
> 
> And it might be the same thing in my case also, even though no
> compression is involved.

In that case you can quickly collapse physically contiguous ranges by
reflink-mv'ing (ie. a recent mv) the file across subvolume boundaries
and back. :)

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-09 Thread Chris Murphy
On Fri, Dec 9, 2016 at 6:45 AM, Swâmi Petaramesh  wrote:
> Hi Chris, thanks for your answer,
>
> On 12/09/2016 03:58 AM, Chris Murphy wrote:
>> Can you check some bigger files and see if they've become fragmented?
>> I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to
>> over 5000 fragments during dedupe. This is not something I recall
>> happening some months ago.
>
> I have checked directories containing VM hard disks, that would be good
> candidates. As they're backed up using full rsyncs, I wouldn't expect
> them to be heavily fragmented (OTOH the whole BTRFS filesystem is lzo
> compressed, and I believe that it may affect the number of extents
> reported by filefrag...?)
>
> Anyway this is the number of fragments that I get for a bunch of VMS HD
> files which are in the range from a couple GB to about 20 GB.
>
> The number of fragments reported by filefrag : 2907, 2560, 314, 10107
>
> If compression has nothing to do with this, then this is heavy
> fragmentation.

It's probably not that fragmented. Due to compression, metadata
describes 128KiB extents even though the data is actually contiguous.

And it might be the same thing in my case also, even though no
compression is involved.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-09 Thread Swâmi Petaramesh
Hi Jeff, thanks for your reply,


On 12/08/2016 09:07 PM, Jeff Mahoney wrote:
> What version were you using? 
That's v0.11.beta4, installed rather recently
> What throughput are you getting to that disk?  I get that it's USB3, but
> reading 1TB doesn't take a terribly long time so 15 days is pretty
> ridiculous.
This is run from inside a VM, onto a physical USB3 HD. Copying to/from
this HD shows a speed that corresponds to what I would expect on the
same HD connected to a physical (not virtual) setup.

The only quick data that I can get are from "hdparm", that says :

- Timed cached reads : 5976 MB/sec
- Timed buffered disk reads : 105 MB/sec

Kind regards.

ॐ

-- 
Swâmi Petaramesh  PGP 9076E32E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-09 Thread Swâmi Petaramesh
Hi Chris, thanks for your answer,

On 12/09/2016 03:58 AM, Chris Murphy wrote:
> Can you check some bigger files and see if they've become fragmented?
> I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to
> over 5000 fragments during dedupe. This is not something I recall
> happening some months ago.

I have checked directories containing VM hard disks, that would be good
candidates. As they're backed up using full rsyncs, I wouldn't expect
them to be heavily fragmented (OTOH the whole BTRFS filesystem is lzo
compressed, and I believe that it may affect the number of extents
reported by filefrag...?)

Anyway this is the number of fragments that I get for a bunch of VMS HD
files which are in the range from a couple GB to about 20 GB.

The number of fragments reported by filefrag : 2907, 2560, 314, 10107

If compression has nothing to do with this, then this is heavy
fragmentation.

Kind regards.

ॐ

-- 
Swâmi Petaramesh  PGP 9076E32E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Peter Becker
> 2016-12-08 16:11 GMT+01:00 Swâmi Petaramesh :
>
> Then it took another 48 hours just for "loading the hashes of duplicate
> extents".
>

This issue i adressing currently with the following patches:
https://github.com/Floyddotnet/duperemove/commits/digest_trigger

Tested with a 3,9 TB directory, with 4723 objects:

old implementation of dbfile_load_hashes took 36593ms
new implementation of dbfile_load_hashes took 11ms

You can use this versions save. But i have to do more work. (for
example a migrationscript for existing hashfiles)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Chris Murphy
On Thu, Dec 8, 2016 at 8:11 AM, Swâmi Petaramesh  wrote:

> Well, the damn thing has been running for 15 days uninterrupted !
> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
> wasn't expecting it to last THAT long...).

Can you check some bigger files and see if they've become fragmented?
I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to
over 5000 fragments during dedupe. This is not something I recall
happening some months ago.

I inadvertently replied to the wrong dedupe thread about my test and
what I'm finding, it's here.
https://www.spinics.net/lists/linux-btrfs/msg61304.html

But if you're seeing something similar, then it would explain why it's
so slow in your case.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Austin S. Hemmelgarn

On 2016-12-08 15:07, Jeff Mahoney wrote:

On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote:

On 2016-12-08 10:11, Swâmi Petaramesh wrote:

Hi, Some real world figures about running duperemove deduplication on
BTRFS :

I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
typically at the same update level, and all of them more of less sharing
the entirety or part of the same set of user files.

For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
for having complete backups at different points in time.

The HD was full to 93% and made a good testbed for deduplicating.

So I ran duperemove on this HD, on a machine doing "only this", using a
hashfile. The machine being an Intel i5 with 6 GB of RAM.

Well, the damn thing has been running for 15 days uninterrupted !
...Until I [Ctrl]-C it this morning as I had to move with the machine (I
wasn't expecting it to last THAT long...).

It took about 48 hours just for calculating the files hashes.

Then it took another 48 hours just for "loading the hashes of duplicate
extents".

Then it took 11 days deduplicating until I killed it.

At the end, the disk that was 93% full is now 76% full, so I saved 17%
of 1 TB (170 GB) by deduplicating for 15 days.

Well the thing "works" and my disk isn't full anymore, so that's a very
partial success, but still l wonder if the gain is worth the effort...

So, some general explanation here:
Duperemove hashes data in blocks of (by default) 128kB, which means for
~930GB, you've got about 7618560 blocks to hash, which partly explains
why it took so long to hash.  Once that's done, it then has to compare
hashes for all combinations of those blocks, which totals to
58042456473600 comparisons (hence that taking a long time).  The block
size thus becomes a trade-off between performance when hashing and
actual space savings (smaller block size makes hashing take longer, but
gives overall slightly better results for deduplication).


IIRC, the core of the duperemove duplicate matcher isn't an O(n^2)
algorithm.  I think Mark used a bloom filter to reduce the data set
prior to matching, but I haven't looked at the code in a while.


You're right, I had completely forgotten about that.

Regardless of that though, it's still a lot of processing that needs done.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Jeff Mahoney
On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote:
> On 2016-12-08 10:11, Swâmi Petaramesh wrote:
>> Hi, Some real world figures about running duperemove deduplication on
>> BTRFS :
>>
>> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
>> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
>> typically at the same update level, and all of them more of less sharing
>> the entirety or part of the same set of user files.
>>
>> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
>> for having complete backups at different points in time.
>>
>> The HD was full to 93% and made a good testbed for deduplicating.
>>
>> So I ran duperemove on this HD, on a machine doing "only this", using a
>> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>>
>> Well, the damn thing has been running for 15 days uninterrupted !
>> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
>> wasn't expecting it to last THAT long...).
>>
>> It took about 48 hours just for calculating the files hashes.
>>
>> Then it took another 48 hours just for "loading the hashes of duplicate
>> extents".
>>
>> Then it took 11 days deduplicating until I killed it.
>>
>> At the end, the disk that was 93% full is now 76% full, so I saved 17%
>> of 1 TB (170 GB) by deduplicating for 15 days.
>>
>> Well the thing "works" and my disk isn't full anymore, so that's a very
>> partial success, but still l wonder if the gain is worth the effort...
> So, some general explanation here:
> Duperemove hashes data in blocks of (by default) 128kB, which means for
> ~930GB, you've got about 7618560 blocks to hash, which partly explains
> why it took so long to hash.  Once that's done, it then has to compare
> hashes for all combinations of those blocks, which totals to
> 58042456473600 comparisons (hence that taking a long time).  The block
> size thus becomes a trade-off between performance when hashing and
> actual space savings (smaller block size makes hashing take longer, but
> gives overall slightly better results for deduplication).

IIRC, the core of the duperemove duplicate matcher isn't an O(n^2)
algorithm.  I think Mark used a bloom filter to reduce the data set
prior to matching, but I haven't looked at the code in a while.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Jeff Mahoney
On 12/8/16 10:11 AM, Swâmi Petaramesh wrote:
> Hi, Some real world figures about running duperemove deduplication on
> BTRFS :
> 
> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
> typically at the same update level, and all of them more of less sharing
> the entirety or part of the same set of user files.
> 
> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
> for having complete backups at different points in time.
> 
> The HD was full to 93% and made a good testbed for deduplicating.
> 
> So I ran duperemove on this HD, on a machine doing "only this", using a
> hashfile. The machine being an Intel i5 with 6 GB of RAM.
> 
> Well, the damn thing has been running for 15 days uninterrupted !
> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
> wasn't expecting it to last THAT long...).
> 
> It took about 48 hours just for calculating the files hashes.
> 
> Then it took another 48 hours just for "loading the hashes of duplicate
> extents".
> 
> Then it took 11 days deduplicating until I killed it.
> 
> At the end, the disk that was 93% full is now 76% full, so I saved 17%
> of 1 TB (170 GB) by deduplicating for 15 days.
> 
> Well the thing "works" and my disk isn't full anymore, so that's a very
> partial success, but still l wonder if the gain is worth the effort...

What version were you using?  I know Mark had put a bunch of effort into
reducing the memory footprint and runtime.  The earlier versions were
"can we get this thing working" while the newer versions are more efficient.

What throughput are you getting to that disk?  I get that it's USB3, but
reading 1TB doesn't take a terribly long time so 15 days is pretty
ridiculous.

At any rate, the good news is that when you run it again, assuming you
used the hash file, it will not have to rescan most of your data set.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Timofey Titovets
2016-12-08 18:42 GMT+03:00 Austin S. Hemmelgarn :
> On 2016-12-08 10:11, Swâmi Petaramesh wrote:
>>
>> Hi, Some real world figures about running duperemove deduplication on
>> BTRFS :
>>
>> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
>> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
>> typically at the same update level, and all of them more of less sharing
>> the entirety or part of the same set of user files.
>>
>> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
>> for having complete backups at different points in time.
>>
>> The HD was full to 93% and made a good testbed for deduplicating.
>>
>> So I ran duperemove on this HD, on a machine doing "only this", using a
>> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>>
>> Well, the damn thing has been running for 15 days uninterrupted !
>> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
>> wasn't expecting it to last THAT long...).
>>
>> It took about 48 hours just for calculating the files hashes.
>>
>> Then it took another 48 hours just for "loading the hashes of duplicate
>> extents".
>>
>> Then it took 11 days deduplicating until I killed it.
>>
>> At the end, the disk that was 93% full is now 76% full, so I saved 17%
>> of 1 TB (170 GB) by deduplicating for 15 days.
>>
>> Well the thing "works" and my disk isn't full anymore, so that's a very
>> partial success, but still l wonder if the gain is worth the effort...
>
> So, some general explanation here:
> Duperemove hashes data in blocks of (by default) 128kB, which means for
> ~930GB, you've got about 7618560 blocks to hash, which partly explains why
> it took so long to hash.  Once that's done, it then has to compare hashes
> for all combinations of those blocks, which totals to 58042456473600
> comparisons (hence that taking a long time).  The block size thus becomes a
> trade-off between performance when hashing and actual space savings (smaller
> block size makes hashing take longer, but gives overall slightly better
> results for deduplication).
>
> As far as the rest, given your hashing performance (which is not
> particularly good I might add, roughly 5.6MB/s), the amount of time it was
> taking to do the actual deduplication is reasonable since the deduplication
> ioctl does a byte-wise comparison of the extents to be deduplicated prior to
> actually ref-linking them to ensure you don't lose data.
>
> Because of this, generic batch deduplication is not all that great on BTRFS.
> There are cases where it can work, but usually they're pretty specific
> cases.  In most cases though, you're better off doing a custom tool that
> knows about how your data is laid out and what's likely to be duplicated
> (I've actually got two tools for this for the two cases where I use
> deduplication, they use knowledge of the data-set itself to figure out
> what's duplicated, then just call the ioctl through a wrapper (previously
> the one included in duperemove, currently xfs_io)).
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zygo do the good job on this too.
Try:
https://github.com/Zygo/bees

It's cool and can work better on large massive of data, because it
dedup in the same time with scanning phase.
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Austin S. Hemmelgarn

On 2016-12-08 10:11, Swâmi Petaramesh wrote:

Hi, Some real world figures about running duperemove deduplication on
BTRFS :

I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
typically at the same update level, and all of them more of less sharing
the entirety or part of the same set of user files.

For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
for having complete backups at different points in time.

The HD was full to 93% and made a good testbed for deduplicating.

So I ran duperemove on this HD, on a machine doing "only this", using a
hashfile. The machine being an Intel i5 with 6 GB of RAM.

Well, the damn thing has been running for 15 days uninterrupted !
...Until I [Ctrl]-C it this morning as I had to move with the machine (I
wasn't expecting it to last THAT long...).

It took about 48 hours just for calculating the files hashes.

Then it took another 48 hours just for "loading the hashes of duplicate
extents".

Then it took 11 days deduplicating until I killed it.

At the end, the disk that was 93% full is now 76% full, so I saved 17%
of 1 TB (170 GB) by deduplicating for 15 days.

Well the thing "works" and my disk isn't full anymore, so that's a very
partial success, but still l wonder if the gain is worth the effort...

So, some general explanation here:
Duperemove hashes data in blocks of (by default) 128kB, which means for 
~930GB, you've got about 7618560 blocks to hash, which partly explains 
why it took so long to hash.  Once that's done, it then has to compare 
hashes for all combinations of those blocks, which totals to 
58042456473600 comparisons (hence that taking a long time).  The block 
size thus becomes a trade-off between performance when hashing and 
actual space savings (smaller block size makes hashing take longer, but 
gives overall slightly better results for deduplication).


As far as the rest, given your hashing performance (which is not 
particularly good I might add, roughly 5.6MB/s), the amount of time it 
was taking to do the actual deduplication is reasonable since the 
deduplication ioctl does a byte-wise comparison of the extents to be 
deduplicated prior to actually ref-linking them to ensure you don't lose 
data.


Because of this, generic batch deduplication is not all that great on 
BTRFS.  There are cases where it can work, but usually they're pretty 
specific cases.  In most cases though, you're better off doing a custom 
tool that knows about how your data is laid out and what's likely to be 
duplicated (I've actually got two tools for this for the two cases where 
I use deduplication, they use knowledge of the data-set itself to figure 
out what's duplicated, then just call the ioctl through a wrapper 
(previously the one included in duperemove, currently xfs_io)).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Swâmi Petaramesh
Hi, Some real world figures about running duperemove deduplication on
BTRFS :

I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
typically at the same update level, and all of them more of less sharing
the entirety or part of the same set of user files.

For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
for having complete backups at different points in time.

The HD was full to 93% and made a good testbed for deduplicating.

So I ran duperemove on this HD, on a machine doing "only this", using a
hashfile. The machine being an Intel i5 with 6 GB of RAM.

Well, the damn thing has been running for 15 days uninterrupted !
...Until I [Ctrl]-C it this morning as I had to move with the machine (I
wasn't expecting it to last THAT long...).

It took about 48 hours just for calculating the files hashes.

Then it took another 48 hours just for "loading the hashes of duplicate
extents".

Then it took 11 days deduplicating until I killed it.

At the end, the disk that was 93% full is now 76% full, so I saved 17%
of 1 TB (170 GB) by deduplicating for 15 days.

Well the thing "works" and my disk isn't full anymore, so that's a very
partial success, but still l wonder if the gain is worth the effort...

Best regards.

ॐ

-- 
Swâmi Petaramesh  PGP 9076E32E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bees v0.1 - Best-Effort Extent-Same, a btrfs deduplication daemon

2016-11-23 Thread Zygo Blaxell
I made a thing!

Bees ("Best-Effort Extent-Same") is a dedup daemon for btrfs.

Bees is a block-oriented userspace dedup designed to avoid scalability
problems on large filesystems.

Bees is designed to degrade gracefully when underprovisioned with RAM.
Bees does not use more RAM or storage as filesystem data size increases.
The dedup hash table size is fixed at creation time and does not change.
The effective dedup block size is dynamic and adjusts automatically to
fit the hash table into the configured RAM limit.  Hash table overflow
is not implemented to eliminate the IO overhead of hash table overflow.
Hash table entries are only 16 bytes per dedup block to keep the average
dedup block size small.

Bees does not require alignment between dedup blocks or extent boundaries
(i.e. it can handle any multiple-of-4K offset between dup block pairs).
Bees rearranges blocks into shared and unique extents if required to
work within current btrfs kernel dedup limitations.

Bees can dedup any combination of compressed and uncompressed extents.

Bees operates in a single pass which removes duplicate extents immediately
during scan.  There are no separate scanning and dedup phases.

Bees uses only data-safe btrfs kernel operations, so it can dedup live
data (e.g. build servers, sqlite databases, VM disk images).  It does
not modify file attributes or timestamps.

Bees does not store any information about filesystem structure, so it is
not affected by the number or size of files (except to the extent that
these cause performance problems for btrfs in general).  It retrieves such
information on demand through btrfs SEARCH_V2 and LOGICAL_INO ioctls.
This eliminates the storage required to maintain the equivalents of
these functions in userspace.  It's also why bees has no XFS support.

Bees is a daemon designed to run continuously and maintain its state
across crahes and reboots.  Bees uses checkpoints for persistence to
eliminate the IO overhead of a transactional data store.  On restart,
bees will dedup any data that was added to the filesystem since the
last checkpoint.

I use bees to dedup filesystems ranging in size from 16GB to 35TB, with
hash tables ranging in size from 128MB to 11GB.  It's well past time
for a v0.1 release, so here it is!

Bees is available on Github:

https://github.com/Zygo/bees

Please enjoy this code.


signature.asc
Description: Digital signature


Re: [PATCH 8/9] vfs: hoist the btrfs deduplication ioctl to the vfs

2016-08-07 Thread Michael Kerrisk (man-pages)

Hi Darrick,

On 01/12/2016 08:14 PM, Darrick J. Wong wrote:

[adding btrfs to the cc since we're talking about a whole new dedupe interface]


In the discussion below, many points of possible improvement were notedfor
the man page Would you be willing to put together a patch please?

Thanks,

Michael

 

On Tue, Jan 12, 2016 at 12:07:14AM -0600, Eric Biggers wrote:

Some feedback on the VFS portion of the FIDEDUPERANGE ioctl and its man page...
(note: I realize the patch is mostly just moving the code that already existed
in btrfs, but in the VFS it deserves a more thorough review):


Wheee. :)

Yes, let's discuss the concerns about the btrfs extent same ioctl.

I believe Christoph dislikes about the odd return mechanism (i.e. status and
bytes_deduped) and doubts that the vectorization is really necessary.  There's
not a lot of documentation to go on aside from "Do whatever the BTRFS ioctl
does".  I suspect that will leave my explanations lackng, since I neither
designed the btrfs interface nor know all that much about the decisions made to
arrive at what we have now.

(I agree with both of hch's complaints.)

Really, the best argument for keeping this ioctl is to avoid breaking
duperemove.  Even then, given that current duperemove checks for btrfs before
trying to use BTRFS_IOC_EXTENT_SAME, we could very well design a new dedupe
ioctl for the VFS, hook the new dedupers (XFS) into the new VFS ioctl
leaving the old btrfs ioctl intact, and train duperemove to try the new
ioctl and fall back on the btrfs one if the VFS ioctl isn't supported.

Frankly, I also wouldn't mind changing the VFS dedupe ioctl that to something
that resembles the clone_range interface:

int ioctl(int dest_fd, FIDEDUPERANGE, struct file_dedupe_range * arg);

struct file_dedupe_range {
__s64 src_fd;
__u64 src_offset;
__u64 length;
__u64 dest_offset;
__u64 flags;
};

"See if the byte range src_offset:length in src_fd matches all of
dest_offset:length in dest_fd; if so, share src_fd's physical storage with
dest_fd.  Both fds must be files; if they are the same file the ranges cannot
overlap; src_fd must be readable; dest_fd must be writable or append-only.
Offsets and lengths probably need to be block-aligned, but that is filesystem
dependent."

The error conditions would be superset of the ones we know about today.  I'd
return EOVERFLOW or something if length is longer than the FS wants to deal
with.

Now all the vectorization problems go away, and since it's a new VFS interface
we can define everything from the start.

Christoph, if this new interface solves your complaints I think I'd like to get
started on the code/docs soon.


At high level, I am confused about what is meant by the "source" and
"destination" files.  I understand that with more than two files, you
effectively have to choose one file to treat specially and dedupe with all
the other files (an NxN comparison isn't realistic).  But with just two
files, a deduplication operation should be completely symmetric, should it
not?  The end


Not sure what you mean by 'symmetric', but in any case the convention seems
to be that src_fd's storage is shared with dest_fd if there's a match.


result should be that the data is deduplicated, regardless of the order in
which I gave the file descriptors.  So why is there some non-symmetric
behavior?  There are several examples but one is that the VFS is checking
!S_ISREG() on the "source" file descriptor but not on the "destination" file
descriptor.


The dedupe_range function pointer should only be supplied for regular files.


Another is that different permissions are required on the source versus on
the destination.  If there are good reasons for the nonsymmetry then this
needs to be clearly explained in the man page; otherwise it may not be clear
what to use as the "source" and what to use as the "destination".

It seems odd to be adding "copy" as a system call but then have "dedupe" and
"clone" as ioctls rather than system calls... it seems that they should all
be one or the other (at least, if we put aside the fact that the ioctls
already exist in btrfs).


We can't put the clone ioctl aside; coreutils has already started using it.

I'm not sure if clone_range or extent_same are all that popular, though.
AFAIK duperemove is the only program using extent_same, and I don't know
of anything using clone_range.

(Well, xfs_io does...)


The range checking in clone_verify_area() appears incomplete.  Someone could
provide len=UINT64_MAX and all the checks would still pass even though
'pos+len' would overflow.


Yeah...


Should the ioctl be interruptible?  Right now it always goes through *all*
the 'struct file_dedupe_range_info's you passed in --- potentially up to
65535 of them.


There probably ought to be explicit signal checks, or we could just get rid
of the vectorization entirely. :)


Why 'info->bytes_deduped += deduped' rather than 'info->bytes_deduped =
deduped'?  'bytes_deduped' is per file descriptor, 

Re: btrfs deduplication and linux cache management

2014-11-04 Thread Zygo Blaxell
On Mon, Nov 03, 2014 at 03:09:11PM +0100, LuVar wrote:
 Thanks for nice and replicate at home yourself example. On my machine it is 
 behaving precisely like in your:
 
 code
 root@blackdawn:/home/luvar# sync; sysctl vm.drop_caches=1
 vm.drop_caches = 1
 root@blackdawn:/home/luvar# time cat 
 /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img
   /dev/null 
 real0m6.768s
 user0m0.016s
 sys 0m0.599s
 
 root@blackdawn:/home/luvar# time cat 
 /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img
   /dev/null 
 real0m5.259s
 user0m0.018s
 sys 0m0.695s
 
 root@blackdawn:/home/luvar# time cat 
 /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img
   /dev/null 
 real0m0.701s
 user0m0.014s
 sys 0m0.288s
 
 root@blackdawn:/home/luvar# time cat 
 /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img
   /dev/null
 real0m0.286s
 user0m0.013s
 sys 0m0.272s
 /code
 
 If you would mind asking, is there any plan to optimize this
 behaviour? I know that btrfs is not like ZFS (whole system from
 blockdevice, through cache, to VFS), so vould be possible to implement
 such optimization without major patch in linux block cache/VFS cache?

I'd like to know this too.  I think not any time soon though.

AIUI (I'm not really an expert here), the VFS cache is keyed on tuples of
(device:inode, offset), so it has no way to cope with aliasing the same
physical blocks through distinct inodes.  It would have to learn about
reference counting (so multiple inodes can refer to shared blocks, one
inode can refer to the same blocks twice, etc) and copy-on-write (so we
can modify just one share of a shared-extent cache page).  For compressed
data caching, the filesystem would be volunteering references to blocks
that were not asked for (e.g.  unread portions of compressed extents).

It's not impossible to make those changes to the VFS cache, but the
only filesystem on mainline Linux that would benefit is btrfs (ZFS is
not on mainline Linux, the ZFS maintainers probably prefer to use their
own cache layer anyway, and nobody else shares extents between files).
For filesystems that don't share extents, adding the necessary stuff to
VFS is a lot of extra overhead they will never use.

Back in the day, the Linux cache used to use tuples of (device,
block_number), but this approach doesn't work on non-block filesystems
like NFS, so it was dropped in favor of the inode+offset caching.
A block-based scheme would handle shared extents but not compressed ones
(e.g. you've got a 4K cacheable page that was compressed to 312 bytes
somewhere in the middle of a 57K compressed data extent...what's that
page's block number, again?).

 Thanks, have a nice day,
 --
 LuVar
 
 
 - Zygo Blaxell zblax...@furryterror.org wrote:
 
  On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu...@plaintext.sk wrote:
   Hi,
   I want to ask, if deduplicated file content will be cached in linux
  kernel just once for two deduplicated files.
   
   To explain in deep:
- I use btrfs for whole system with few subvolumes with some
  compression on some subvolumes.
- I have two directories with eclipse SDK with slightly differences
  (same version, different config)
- I assume that given directories is deduplicated and so two
  eclipse installations take place on hdd like one would (in rough
  estimation)
- I will start one of given eclipse
- linux kernel will cache all opened files during start of eclipse
  (I have enough free ram)
- I am just happy stupid linux user:
   1. will kernel cache file content after decompression? (I think
  yes)
   2. cached data will be in VFS layer or in block device layer?
  
  My guess based on behavior is the VFS layer.  See below.
  
- When I will lunch second eclipse (different from first, but
  deduplicated from first) after first one:
   1. will second start require less data to be read from HDD?
  
  No.
  
   2. will be metadata for second instance read from hdd? (I asume
  yes)
  
  Yes (how could it not?).
  
   3. will be actual data read second time? (I hope not)
  
  Unfortunately, yes.
  
  This is my test:
  
  1.  Create a file full of compressible data that is big enough to
  take
  a few seconds to read from disk, but not too big to fit in RAM:
  
  yes $(date) | head -c 500m  a
  
  2.  Create a deduplicated (shared extent) copy of same:
  
  cp --reflink=always a b
  
  (use filefrag -v to verify both files have same physical extents)
  
  3.  Drop caches
  
  sync; sysctl vm.drop_caches=1
  
  4.  Time reading both files with cold and hot cache:
  
  time cat a  /dev/null
  time cat b  /dev/null
  time cat a  /dev/null
  time cat b  /dev/null
  
  Ideally, the first 'cat a' would load the file back from disk, so it
  will take a long 

Re: btrfs deduplication and linux cache management

2014-11-03 Thread LuVar
Thanks for nice and replicate at home yourself example. On my machine it is 
behaving precisely like in your:

code
root@blackdawn:/home/luvar# sync; sysctl vm.drop_caches=1
vm.drop_caches = 1
root@blackdawn:/home/luvar# time cat 
/home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img
  /dev/null 
real0m6.768s
user0m0.016s
sys 0m0.599s

root@blackdawn:/home/luvar# time cat 
/home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img
  /dev/null 
real0m5.259s
user0m0.018s
sys 0m0.695s

root@blackdawn:/home/luvar# time cat 
/home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img
  /dev/null 
real0m0.701s
user0m0.014s
sys 0m0.288s

root@blackdawn:/home/luvar# time cat 
/home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img
  /dev/null
real0m0.286s
user0m0.013s
sys 0m0.272s
/code

If you would mind asking, is there any plan to optimize this behaviour? I know 
that btrfs is not like ZFS (whole system from blockdevice, through cache, to 
VFS), so vould be possible to implement such optimization without major patch 
in linux block cache/VFS cache?

Thanks, have a nice day,
--
LuVar


- Zygo Blaxell zblax...@furryterror.org wrote:

 On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu...@plaintext.sk wrote:
  Hi,
  I want to ask, if deduplicated file content will be cached in linux
 kernel just once for two deduplicated files.
  
  To explain in deep:
   - I use btrfs for whole system with few subvolumes with some
 compression on some subvolumes.
   - I have two directories with eclipse SDK with slightly differences
 (same version, different config)
   - I assume that given directories is deduplicated and so two
 eclipse installations take place on hdd like one would (in rough
 estimation)
   - I will start one of given eclipse
   - linux kernel will cache all opened files during start of eclipse
 (I have enough free ram)
   - I am just happy stupid linux user:
  1. will kernel cache file content after decompression? (I think
 yes)
  2. cached data will be in VFS layer or in block device layer?
 
 My guess based on behavior is the VFS layer.  See below.
 
   - When I will lunch second eclipse (different from first, but
 deduplicated from first) after first one:
  1. will second start require less data to be read from HDD?
 
 No.
 
  2. will be metadata for second instance read from hdd? (I asume
 yes)
 
 Yes (how could it not?).
 
  3. will be actual data read second time? (I hope not)
 
 Unfortunately, yes.
 
 This is my test:
 
 1.  Create a file full of compressible data that is big enough to
 take
 a few seconds to read from disk, but not too big to fit in RAM:
 
   yes $(date) | head -c 500m  a
 
 2.  Create a deduplicated (shared extent) copy of same:
 
   cp --reflink=always a b
 
   (use filefrag -v to verify both files have same physical extents)
 
 3.  Drop caches
 
   sync; sysctl vm.drop_caches=1
 
 4.  Time reading both files with cold and hot cache:
 
   time cat a  /dev/null
   time cat b  /dev/null
   time cat a  /dev/null
   time cat b  /dev/null
 
 Ideally, the first 'cat a' would load the file back from disk, so it
 will take a long time, and the other three would be very fast as the
 shared extent data would already be in RAM.
 
 That is what happens on 3.17.1:
 
   time cat a  /dev/null
   real0m18.870s
   user0m0.017s
   sys 0m3.432s
 
   time cat b  /dev/null
   real0m16.931s
   user0m0.007s
   sys 0m3.357s
 
   time cat a  /dev/null
   real0m0.141s
   user0m0.001s
   sys 0m0.136s
 
   time cat b  /dev/null
   real0m0.121s
   user0m0.002s
   sys 0m0.116s
 
 Above we see that reading 'b' the first time takes almost as long as
 'a'.
 The second reads are cached, so they finish two orders of magnitude
 faster.
 
 That suggests that deduplicated extents are read and cached as
 entirely
 separate copies of the data.  The sys time for the first read of 'b'
 would imply separate decompression as well.
 
 Compare the above result with a hardlink, which might behave more
 like
 what we expect:
 
   rm -f b
   ln a b
   sync; sysctl vm.drop_caches=1
 
   time cat a  /dev/null
   real0m20.262s
   user0m0.010s
   sys 0m3.376s
 
   time cat b  /dev/null
   real0m0.125s
   user0m0.003s
   sys 0m0.120s
 
   time cat a  /dev/null
   real0m0.103s
   user0m0.004s
   sys 0m0.097s
 
   time cat b  /dev/null
   real0m0.098s
   user0m0.002s
   sys 0m0.091s
 
 Above we clearly see that we read 'a' from disk only once, and use
 the
 cache three times.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in

btrfs deduplication and linux cache management

2014-10-30 Thread luvar
Hi,
I want to ask, if deduplicated file content will be cached in linux kernel just 
once for two deduplicated files.

To explain in deep:
 - I use btrfs for whole system with few subvolumes with some compression on 
some subvolumes.
 - I have two directories with eclipse SDK with slightly differences (same 
version, different config)
 - I assume that given directories is deduplicated and so two eclipse 
installations take place on hdd like one would (in rough estimation)
 - I will start one of given eclipse
 - linux kernel will cache all opened files during start of eclipse (I have 
enough free ram)
 - I am just happy stupid linux user:
1. will kernel cache file content after decompression? (I think yes)
2. cached data will be in VFS layer or in block device layer?
 - When I will lunch second eclipse (different from first, but deduplicated 
from first) after first one:
1. will second start require less data to be read from HDD?
2. will be metadata for second instance read from hdd? (I asume yes)
3. will be actual data read second time? (I hope not)

Thanks for answers,
have a nice day,
--
LuVar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs deduplication and linux cache management

2014-10-30 Thread Austin S Hemmelgarn

On 2014-10-30 05:26, lu...@plaintext.sk wrote:

Hi,
I want to ask, if deduplicated file content will be cached in linux kernel just 
once for two deduplicated files.

To explain in deep:
  - I use btrfs for whole system with few subvolumes with some compression on 
some subvolumes.
  - I have two directories with eclipse SDK with slightly differences (same 
version, different config)
  - I assume that given directories is deduplicated and so two eclipse 
installations take place on hdd like one would (in rough estimation)
  - I will start one of given eclipse
  - linux kernel will cache all opened files during start of eclipse (I have 
enough free ram)
  - I am just happy stupid linux user:
 1. will kernel cache file content after decompression? (I think yes)
 2. cached data will be in VFS layer or in block device layer?
  - When I will lunch second eclipse (different from first, but deduplicated 
from first) after first one:
 1. will second start require less data to be read from HDD?
 2. will be metadata for second instance read from hdd? (I asume yes)
 3. will be actual data read second time? (I hope not)

Thanks for answers,
have a nice day,


I don't know for certain, but here is how I understand things work in 
this case:
1. Individual blocks are cached in the block device layer, which means 
that the de-duplicated data would only be cached at most as many times 
as there are disks it is on (ie at most 1 time for a single device 
filesystem, up to twice for a multi-device btrfs raid1 setup).
2. In the vfs layer, the cache handles decoded inodes (the actual file 
metadata), dentries (the file's entry in the parent directory), and 
individual pages of file content (after decompression).  AFAIK, the vfs 
layer's cache is pathname based, so that would probably cache two copies 
of the data, but after the metadata look-up, wouldn't need to read from 
the disk cause of the block layer cache.


Overall, this means that while de-duplicated data may be cached more 
than once, it shouldn't need to be reread from disk if there is still a 
copy in cache.  Metadata may or may not need to be read from the disk, 
depending on what is in the VFS cache.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: btrfs deduplication and linux cache management

2014-10-30 Thread Zygo Blaxell
On Thu, Oct 30, 2014 at 10:26:07AM +0100, lu...@plaintext.sk wrote:
 Hi,
 I want to ask, if deduplicated file content will be cached in linux kernel 
 just once for two deduplicated files.
 
 To explain in deep:
  - I use btrfs for whole system with few subvolumes with some compression on 
 some subvolumes.
  - I have two directories with eclipse SDK with slightly differences (same 
 version, different config)
  - I assume that given directories is deduplicated and so two eclipse 
 installations take place on hdd like one would (in rough estimation)
  - I will start one of given eclipse
  - linux kernel will cache all opened files during start of eclipse (I have 
 enough free ram)
  - I am just happy stupid linux user:
 1. will kernel cache file content after decompression? (I think yes)
 2. cached data will be in VFS layer or in block device layer?

My guess based on behavior is the VFS layer.  See below.

  - When I will lunch second eclipse (different from first, but deduplicated 
 from first) after first one:
 1. will second start require less data to be read from HDD?

No.

 2. will be metadata for second instance read from hdd? (I asume yes)

Yes (how could it not?).

 3. will be actual data read second time? (I hope not)

Unfortunately, yes.

This is my test:

1.  Create a file full of compressible data that is big enough to take
a few seconds to read from disk, but not too big to fit in RAM:

yes $(date) | head -c 500m  a

2.  Create a deduplicated (shared extent) copy of same:

cp --reflink=always a b

(use filefrag -v to verify both files have same physical extents)

3.  Drop caches

sync; sysctl vm.drop_caches=1

4.  Time reading both files with cold and hot cache:

time cat a  /dev/null
time cat b  /dev/null
time cat a  /dev/null
time cat b  /dev/null

Ideally, the first 'cat a' would load the file back from disk, so it
will take a long time, and the other three would be very fast as the
shared extent data would already be in RAM.

That is what happens on 3.17.1:

time cat a  /dev/null
real0m18.870s
user0m0.017s
sys 0m3.432s

time cat b  /dev/null
real0m16.931s
user0m0.007s
sys 0m3.357s

time cat a  /dev/null
real0m0.141s
user0m0.001s
sys 0m0.136s

time cat b  /dev/null
real0m0.121s
user0m0.002s
sys 0m0.116s

Above we see that reading 'b' the first time takes almost as long as 'a'.
The second reads are cached, so they finish two orders of magnitude
faster.

That suggests that deduplicated extents are read and cached as entirely
separate copies of the data.  The sys time for the first read of 'b'
would imply separate decompression as well.

Compare the above result with a hardlink, which might behave more like
what we expect:

rm -f b
ln a b
sync; sysctl vm.drop_caches=1

time cat a  /dev/null
real0m20.262s
user0m0.010s
sys 0m3.376s

time cat b  /dev/null
real0m0.125s
user0m0.003s
sys 0m0.120s

time cat a  /dev/null
real0m0.103s
user0m0.004s
sys 0m0.097s

time cat b  /dev/null
real0m0.098s
user0m0.002s
sys 0m0.091s

Above we clearly see that we read 'a' from disk only once, and use the
cache three times.


signature.asc
Description: Digital signature


Re: BTRFS deduplication

2011-05-12 Thread Josef Bacik
On Thu, May 12, 2011 at 07:52:20AM +0200, Swâmi Petaramesh wrote:
 Hi again list,
 
 I've seen in a message dating back to january that offline deduplication
 has been implemented in BTRFS, but I can't find it in my btrfs-tools
 0.19+20100601-3ubuntu2
 
 Has it reached release, or not yet ? How could I give it a try ?
 
 I've seen a discussion about whether deduplication should be made
 offline or online ; my usage case it to backup a number of laptops
 having all about the same software and many files in common, to a single
 backup server using rsync, I would be very much interested in online
 deduplication - because I don't have n times the storage space, that
 offline dedup might temporarily need, and because performance isn't
 crucial for this application, as backups can be done overnight...
 
 Thanks in advance :-)
 

So the btrfs-progs patch only exists on the mailing list and the kernel patch is
sitting in my git tree.  This was more of a weekend project and less of a
serious attempt at an actual solution.  It could be cleaned up and actually
used, but I'm not at all interested in doing that :).  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS deduplication

2011-05-11 Thread Swâmi Petaramesh
Hi again list,

I've seen in a message dating back to january that offline deduplication
has been implemented in BTRFS, but I can't find it in my btrfs-tools
0.19+20100601-3ubuntu2

Has it reached release, or not yet ? How could I give it a try ?

I've seen a discussion about whether deduplication should be made
offline or online ; my usage case it to backup a number of laptops
having all about the same software and many files in common, to a single
backup server using rsync, I would be very much interested in online
deduplication - because I don't have n times the storage space, that
offline dedup might temporarily need, and because performance isn't
crucial for this application, as backups can be done overnight...

Thanks in advance :-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html